You are encouraged to work on the problem set in groups and turn in one problem set for the entire group. Remember to put all your names on the solution sheet. Also remember to put the name of the TA in whose discussion section you would like the problem set returned to you.
for(i = 0; i < 100; i++) A[i] = ((B[i] * C[i]) + D[i]) / 2;
LEA Ri, X (1 cycle) ; Ri <- address of X LD Ri, Rj, Rk (11 cycles) ; Ri <- MEM[Rj + Rk] ST Ri, Rj, Rk (11 cycles) ; MEM[Rj + Rk] <- Ri MOVI Ri, Imm (1 cycle) ; Ri <- Imm MUL Ri, Rj, Rk (6 cycles) ; Ri <- Rj * Rk ADD Ri, Rj, Rk (4 cycles) ; Ri <- Rj + Rk ADD Ri, Rj, Imm (4 cycles) ; Ri <- Rj + Imm RSHFA Ri, Rj, amount (1 cycle) ; Ri <- RSHFA (Rj, amount) BRcc X (1 cycle) ; Branch to X based on condition codesAssume it takes one memory location to store each element of the array. Also assume that there are 8 registers (R0-R7).
LD Vst, #n (1 cycle) ; Vst <- n LD Vln, #n (1 cycle) ; Vln <- n VLD Vi, X (11 cycles, pipelined) VST Vi, X (11 cycles, pipelined) Vmul Vi, Vj, Vk (6 cycles, pipelined) Vadd Vi, Vj, Vk (4 cycles, pipelined) Vrshfa Vi, Vj, amount (1 cycles) Vbrcc X (1 cycle)How many cycles does it take to execute the program on the following processors? Assume that memory is 16-way interleaved.
Note: VDR = Vector Destination Register, VSR = Vector Source Register
If IR[11:9] = 000, MOVI moves the unsigned quantity amount6 to Vector Stride Register (Vstride).
If IR[11:9] = 001, MOVI moves the unsigned quantity amount6 to Vector Length Register (Vlength).
This instruction has already been implemented for you.
VLD loads a vector of length Vlength from memory into VDR. VLD uses the opcode previously used by LDB. The starting address of the vector is computed by adding the LSHF1(SEXT(offset6)) to BaseR. Subsequent addresses are obtained by adding LSHF1(ZEXT(Vstride)) to the address of the preceding vector element.
VST writes the contents of VSR into memory. VST uses the opcode previously used by STB. Address calculation is done in the same way as for VLD.
If IR is a 1, VADD adds two vector registers (VSR1 and VSR2) and stores the result in VDR. If IR is a 0, VADD adds a scalar register (SR2) to every element of VSR and stores the result in VDR.
VLD, VST, and VADD do not modify the content of Vstride and Vlength registers.
The following five hardware structures have been added to LC-3b in order to implement LmmVC-3.
Vector Register File with eight 63-element Vector registers
Vector Length Register
Vector Stride Register
A third input to DRMUX containing IR[8:6]
Grey box A
Box labeled X
These structures are shown in the LmmVC-3 datapath. Click here for the LmmVC-3 datapath diagram.