### Department of Electrical and Computer Engineering The University of Texas at Austin

EE 360N, Fall 2005 Yale Patt, Instructor Aater Suleman, Linda Bigelow, Jose Joao, Veynu Narasiman, TAs Exam 2, November 16, 2005

Name:

| Problem 1 (20 points): |  |
|------------------------|--|
| Problem 2 (20 points): |  |
| Problem 3 (20 points): |  |
| Problem 4 (20 points): |  |
| Problem 5 (20 points): |  |
| Total (100 points):    |  |

Note: Please be sure that your answers to all questions (and all supporting work that is required) are contained in the space provided.

Note: Please be sure your name is recorded on each sheet of the exam.

# **GOOD LUCK!**

#### Problem 1 (20 points)

**Part a (6 points):** A 12 x 18 array of bytes is stored in row major order, starting at location 10,000 (decimal). Memory is byte-addressable. I wish to execute a vector load of the 15th column.

 What value must first be put in the Vstride register?

 What value must first be put in the Vlength register?

 What is the starting address of the vector load?

Part b (5 points): A snapshot of the taken/not-taken branch behavior of a processor is

# ... T T T T T T T N N T T N N T N N T

If the branch predictor used is the 2-bit saturating counter, how many of the last ten branches are predicted correctly?

|                                                                                  | ANSWER:               |     |
|----------------------------------------------------------------------------------|-----------------------|-----|
| <b>Part c (6 points):</b> The DMA mechanism allows information to travel between |                       | and |
| , thereby freeing up                                                             | for other activities. |     |
| Part d (3 points): Most interrupts and very few exceptions are                   | , which means t       | hey |

may not be serviced as expeditiously as is usually the case, or they may not be serviced at all.

#### Problem 2 (20 points)

Initially, the register file of the Tomasulo-like machine shown below contains the following data in the registers and the reservation stations. Note that one instruction has been fetched, decoded, and issued, but has not been executed yet.



Four more instructions are then fetched, decoded and issued in program order, none are executed. At that point the register file and reservation stations are as shown below:

|    | V | TAG | VALUE | <br>V | TAG | VALUE        | V | TAG | VALUE |   | V | TAG | VALUE        | V | TAG | VALUE |   |
|----|---|-----|-------|-------|-----|--------------|---|-----|-------|---|---|-----|--------------|---|-----|-------|---|
| R0 | 1 | π   | 17    | 1     | —   | 5            | 1 | _   | 7     | α | 0 | β   | -            | 0 | χ   | _     | π |
| R1 | 1 | ρ   | 13    | 1     | —   | 14           | 0 | α   | -     | β | 0 | χ   | —            | 0 | π   | -     | ρ |
| R2 | 1 | σ   | 14    | 1     | —   | 5            | 1 | -   | 16    | χ |   |     |              |   |     |       | σ |
| R3 | 1 | β   | 16    |       |     |              |   |     |       | δ |   |     |              |   |     |       | τ |
| R4 | 1 | β   | 5     |       |     |              |   |     |       | , |   |     |              |   |     |       | , |
| R5 | 0 | ρ   | 3     |       |     | $\backslash$ | - | Ľ   |       |   |   |     | $\backslash$ |   | ~   |       |   |
| R6 | 1 | α   | 7     |       |     |              | - | Т   |       |   |   |     |              |   | *   |       |   |
| R7 | 0 | π   | 8     |       |     |              |   |     |       |   |   |     |              |   |     |       |   |

**Part a** (14 points): Using only instructions of the form ADD Ri, Rj, Rk and MUL Ri, Rj, Rk, where Ri is the destination register and Rj and Rk are the source registers, show all five instructions in program order. (Hint: You might consider doing part b first and using that result to help you solve part a.)

NOTE: The instructions in this problem are unique, but two of them can be swapped. Both orderings will receive full credit.



Name:\_\_\_\_\_

# **Problem 2 continued**

Part b (6 points): Show the data flow graph of the code in part a.

#### Problem 3 (20 points)

A vector processor with 11 cycle memory latency and 16-way interleaved memory, 8 vector registers of length 64, supporting vector chaining, is used to execute the vector code resulting from compiling the following high-level code:

```
for (i=0; i<10; i++) {
    D[i] = B[i] * C[i];
    E[i] = A[i] + D[i];
}</pre>
```

**Part a (5 points):** Write the vector code to accomplish this. Both results D[i] and E[i] have to be stored to memory. You have available the following vector instructions:

```
MOVI VLEN, #n ; (1 cycle)
MOVI VSTR, #n ; (1 cycle)
VADD Vi,Vj,Vk ; Vj, Vk are sources, Vi is dest (4-stage pipelined adder)
VMUL Vi,Vj,Vk ; (6-stage pipelined multiplier)
VLD Vi,A ; Vi gets loaded with contents of memory, starting at A
VST Vi,A ; Contents of Vi gets stored in memory, starting at A
```

#### Problem 3 continued

Part b (6 points): If the memory has 2 load ports and 1 store port, how many cycles does the program take to finish?

| ANSWER: |  | cycles |
|---------|--|--------|
|---------|--|--------|

Part c (9 points): If this program has to finish in fewer than 60 cycles,

| the <b>minimum</b> number of load ports required is:  | ,                 |
|-------------------------------------------------------|-------------------|
| the <b>minimum</b> number of store ports required is: |                   |
| With this configuration, the program takes            | cycles to finish. |

### Problem 4 (20 points)

You are asked to implement the state machine for an asynchronous controller to handle both **arbitration and transaction** functions. The controller manages read and write traffic for the device attached to it.





In this problem, we have a **multiplexed** address/data bus. Furthermore, **transfers can be one bus-width or two**.

The bus master knows which of the four transactions is required and asserts a 2-bit signal TYPE in conjunction with MSYN at the start of the bus cycle to notify the slave which of the four transactions (write of width one, read of width one, write of width two, and read of width two) is about to occur.

The master and slave will each need one extra control signal to perform these transactions. Call them Extra-M (EM) and Extra-S (ES).

**Part a (12 points):** Complete the equivalent timing diagrams for the four cases by specifying the relevant control signals on every line where there is a "?". (Note: one "?" may imply more than one missing signal).



### Problem 4 continued

**Part b (8 points):** Complete the state machine for the asynchronous controller to handle **both arbitration and transaction** functions. Show all relevant input signals that cause transitions and output signals according to the Moore machine convention. You **only** need to specify the control signals BUT you do need to specify ALL relevant control signals.



#### Problem 5 (20 points)

Little Computer Inc. is extending the LmmVC-3 (Little "mickey mouse" Vector Computer 3) that you implemented in Problem Set 5. For your convenience, the modified data path that you constructed in Problem Set 5 is shown on page 12. The description on the remainder of this page repeats EXACTLY the specification given in Problem Set 5.

Little Computer Inc. is now planning to build a new computer that is more suited for scientific applications. LC-3b can be modified for such applications by replacing the data type Byte with Vector. The new computer will be called LmmVC-3 (Little "mickey mouse" Vector Computer 3). LmmVC-3 ISA will support all the scalar operations that LC-3b currently supports except the LDB and STB will be replaced with VLD and VST respectively. Our data path will need to support the following new instructions:

|      |                     | 15 14 13 12 | 11 10 9 | 8 7 6 | 5 | 4 | 3 | 2 1    | 0   |
|------|---------------------|-------------|---------|-------|---|---|---|--------|-----|
| MOVI | Vstride, amount6    | 1011        | 000     | 000   |   |   | a | mour   | ıt6 |
| ΜΟνι | Vlength, amount6    | 1011        | 001     | 000   |   |   | a | mour   | ıt6 |
| VLD  | VDR, BaseR, offset6 | 0010        | VDR     | BaseR |   |   | 0 | ffset6 |     |
| VADD | VDR, VSR1, VSR2     | 1010        | VDR     | VSR1  | 0 | 1 | 0 | VSR    | 2   |
| VADD | VDR, VSR1, SR2      | 1010        | VDR     | VSR1  | 0 | 0 | 0 | SR2    |     |
| VST  | VSR, BaseR, offset6 | 0011        | VSR     | BaseR |   |   | 0 | ffset6 |     |

Note: VDR = Vector Destination Register, VSR = Vector Source Register

## MOVI

If IR[11:9] = 000, MOVI moves the unsigned quantity amount6 to Vector Stride Register (Vstride). If IR[11:9] = 001, MOVI moves the unsigned quantity amount6 to Vector Length Register (Vlength). This instruction has already been implemented for you.

## VLD

VLD loads a vector of length Vlength from memory into VDR. VLD uses the opcode previously used by LDB. The starting address of the vector is computed by adding the LSHF1(SEXT(offset6)) to BaseR. Subsequent addresses are obtained by adding LSHF1(ZEXT(Vstride)) to the address of the preceding vector element.

#### VST

VST writes the contents of VSR into memory. VST uses the opcode previously used by STB. Address calculation is done in the same way as for VLD.

#### VADD

If IR[4] is a 1, VADD adds two vector registers (VSR1 and VSR2) and stores the result in VDR. If IR[4] is a 0, VADD adds a scalar register (SR2) to every element of VSR and stores the result in VDR.

VLD, VST, and VADD do not modify the content of Vstride and Vlength registers.

### **Problem 5 continued**

Since vector instructions take many cycles to complete, we want to be able to service interrupts in the middle of a vector instruction's execution. After the interrupt is serviced, the vector instruction should resume execution, but should not recompute any values that were computed prior to the interrupt.

**Part a (7 points):** In order to support this new feature, the interrupt initiation sequence needs to save three additional registers on the supervisor stack in addition to the PSR and PC. What are they?





Part b (3 points): Describe any changes you will need to make to the RTI instruction (in fewer than 20 words).

**Part c (3 points):** Assuming that the data path already includes all the support required to handle interrupts, describe the changes that you have to make to the data path shown on page 12 to implement interruptable vector instructions.

#### **Problem 5 continued**

**Part d (7 points):** We show the beginning of the state diagram necessary to implement VADD. Using the notation of the LC-3b State Diagram, add the states you need to implement VADD. Inside each state describe what happens in that state. You can assume that you are allowed to make any changes to the data path and microsequencer that you find necessary. You do not have to make/show these changes. You will need at least four states to implement VADD.

#### NOTES:

- 1. Your implementation must support servicing interrupts in the middle of VADD.
- 2. Clearly indicate in which state you check for interrupts.
- 3. Assume that the value of the Vindex register is 0 before the first vector instruction is ever executed.
- 4. As in the problem set, you are allowed to clobber the condition codes.
- 5. Make sure that your implementation works for V = 0.





Figure 1: Modified data path to implement vector instructions

| Signal Name                                                                          | Signal Values                                                                                                                |                                                                                                    |
|--------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------|
| LD.MAR/1:<br>LD.MDR/1:<br>LD.IR/1:<br>LD.BEN/1:<br>LD.REG/1:<br>LD.CC/1:<br>LD.PC/1: | NO(0), LOAD(<br>NO(0), LOAD(<br>NO(0), LOAD(<br>NO(0), LOAD(<br>NO(0), LOAD(<br>NO(0), LOAD(<br>NO(0), LOAD(<br>NO(0), LOAD( | 1)<br>1)<br>1)<br>1)<br>1)<br>1)                                                                   |
| GatePC/1:<br>GateMDR/1:<br>GateALU/1:<br>GateMARMUX/1:<br>GateSHF/1:                 | NO(0), YES(1)<br>NO(0), YES(1)<br>NO(0), YES(1)<br>NO(0), YES(1)<br>NO(0), YES(1)                                            |                                                                                                    |
| PCMUX/2:                                                                             | PC+2(0)<br>BUS(1)<br>ADDER(2)                                                                                                | ;select pc+2<br>;select value from bus<br>;select output of address adder                          |
| DRMUX/1:                                                                             | 11.9(0)<br>R7(1)                                                                                                             | ;destination IR[11:9]<br>;destination R7                                                           |
| SR1MUX/1:                                                                            | 11.9(0)<br>8.6(1)                                                                                                            | ;source IR[11:9]<br>;source IR[8:6]                                                                |
| ADDR1MUX/1:                                                                          | PC(0), BaseR(1                                                                                                               | )                                                                                                  |
| ADDR2MUX/2:                                                                          | ZERO(0)<br>offset6(1)<br>PCoffset9(2)<br>PCoffset11(3)                                                                       | ;select the value zero<br>;select SEXT[IR[5:0]]<br>;select SEXT[IR[8:0]]<br>;select SEXT[IR[10:0]] |
| MARMUX/1:                                                                            | 7.0(0)<br>ADDER(1)                                                                                                           | ;select LSHF(ZEXT[IR[7:0]],1)<br>;select output of address adder                                   |
| ALUK/2:                                                                              | ADD(0), AND                                                                                                                  | (1), XOR(2), PASSA(3)                                                                              |
| MIO.EN/1:<br>R.W/1:<br>DATA.SIZE/1:<br>LSHF1/1:                                      | NO(0), YES(1)<br>RD(0), WR(1)<br>BYTE(0), WOI<br>NO(0), YES(1)                                                               | RD(1)                                                                                              |

Table 1: Data path control signals

| Table 2: Microsequencer control signals |                                                                                    |                                                                |  |  |  |  |  |
|-----------------------------------------|------------------------------------------------------------------------------------|----------------------------------------------------------------|--|--|--|--|--|
| Signal Name                             | Signal Values                                                                      |                                                                |  |  |  |  |  |
| J/6:<br>COND/2:                         | $\begin{array}{c} \operatorname{COND}_1^\circ\\ \operatorname{COND}_2 \end{array}$ | ;Unconditional<br>;Memory Ready<br>;Branch<br>;Addressing Mode |  |  |  |  |  |
| IRD/1:                                  | NO, YES                                                                            |                                                                |  |  |  |  |  |
|                                         |                                                                                    |                                                                |  |  |  |  |  |



Figure 2: A state machine for the LC-3b



Figure 3: The LC-3b data path



Figure 4: The microsequencer of the LC-3b base machine