Thu, 13 Apr 2017, 01:31


A student writes, and I must confess, I got carried away with a long detailed
answers.  Hope it is helpful.

> Dear Dr. Patt,
> 
> I was going over my notes after class, and I feel like I didn't fully
> understand the difference between the instruction buffer and loop buffer.
> Are they the same mechanism but used for different purposes?
 
Answer to question 1:
====================
The Instruction Buffer of a pipelined machine holds the instruction just
fetched for one clock cycle.  The next clock cycle, the next instruction
will be in the instruction buffer.  The loop buffer was a structure in
the Cray I that could hold all the instructions simultaneously of the loop 
body of a for loop or while loop.  You can bring in the entire loop, or you
can bring in the loop body one instruction at a time.  If the latter, you 
then keep that instruction in the loop buffer (rather than overwrite it with
the next instruction as in an instruction buffer) until eventually you have
the entire loop.  From then on, instructions can be fetched from the loop
buffer much faster than from a cache and much, much faster than from memory.  



> I also had an unrelated question: How is the reciprocal approximate
> calculated by the floating point unit?  I didn't understand why it was
> faster than the divide instruction.

Non-answer to question 2:
========================
The truth is, I really do not know.  I do know that the reciprocal is not
computed exactly, and in fact the circuit to carry it out takes far fewer 
clock cycles to generate than the divide operation.  Therefore A/B takes 
a lot more time to execute than does A*(1/B).  More detail can probably
be obtained from Professor Swartzelander, who teaches a full course in
Computer Arithmetic.



> Finally, I had a question about Decoupled Access/Execute.  For this, I feel
> that I didn't really understand the underlying mechanism.  Suppose we have
> an instruction such as "ADD R1, R2, R3".  In this case, does the compiler
> pop off R2 and R3 off of the stack (replacing R2 and R3 with R7) and place
> R1 onto the stack (and replace it with R7 - but this time with the store
> queue)?

Very long answer to question 3:
==============================
No.  DAE uses registers in the same way you are used to.  The difference is
the realization that some source operands must come from memory and some
results of operations must be stored to memory.  In class today, I gave you
an example.  I will embellish a little the example I gave in class.

Suppose I wanted to execute the following code:

LD R1, A
LD R2, B
LD R3, C
ADD R4,R1,R2 ; the sources R1,R2 just came from memory
MUL R5,R4,R4 ; the sources and destination are all registers
ADD R6,R5,R3 : one source from reg R5, the other from mem. result to mem.
ST R6,D

The MUL works in the same way as you have always known.  In fact, all operates
operate as you understand them EXCEPT that DAE recognizes among its operate
instructions (in this case, ADD and MUL) the first use of each load or the last 
result before it is to be stored.  Instead of getting the first source from
the register the load puts it in, the load puts it in the queue and the
ADD or MUL gets it from the queue.  Similarly, the ST gets the result from the
queue so the producer of that result must put it in the queue.  The execute
unit uses R7 to designate sources obtained from the queue and results destined
to go to memory.

Putting this together, the code above then becomes:

LD A
LD B
LD C
ST D

for the access unit, and 

ADD R4,R7,R7
MUL R5,R4,R4
ADD R7,R5,R7

for the execute unit.

Note the first ADD gets its sources from the load queue, and puts its result
into R4 since it does not want to store R4 to memory.

Note the MUL gets its sources from registers, and puts its result in a register
because it does not want to store that result to memory.

Note that the final ADD gets one source from registers, and one source from
the load queue, but wants to store its result to memory.  Therefore it puts 
the result not in R6, but instead at the back of the store queue.

> Thank you,
> <>

Sorry for the long-winded response.  Hopefully the above makes sense.  If not
ask one of the TAs, or come see me.

Good luck on the exam next week.

Yale Patt