In class, before we discussed branch prediction, we discussed several alternatives to handle the problem of a conditional branch in a pipelined machine. One of the ways is the delayed branch which enjoyed some undeserved (in my opinion) popularity in the early 1980s. It was summarily abandoned in the definition of new ISAs in the 1990s.
We discussed in class the delayed branch in the context of a very shallow pipeline (2 stages, first stage:fetch, second stage:everything else). And, I pointed out that on a taken branch, sometimes we win, sometimes we break even, but we never lose. Some of you pushed back, saying, "What about the case of a not-taken branch?"
On the one hand, the issue is moot since no one is interested in piplines of depth two. On the other hand, it is a good intellectual exercise, and from simple examples, insights follow. Ergo, this assignment.
Assume we define a machine with "no branch prediction" to have the capability of implementing the required structures for a very simple compile-time always-not-taken branch predictor. Discuss how implementing "delayed branches' in the compiler for such a machine performs, compared to a baseline machine. The baseline machine is the same machine with branches taking effect immediately (i.e., the instruction that executes after the branch instruction depends on the direction of the branch).
In what scenarios (if any) do the two machines perform similarly?
In what scenarios (if any) do 'delayed branches' improve performance?
In what scenarios (if any) does the baseline machine perform better?
What if I augment the ISA so it has both delayed branch and non-delayed branch? Does that affect anything? Explain.