# Department of Electrical and Computer Engineering The University of Texas at Austin

EE 360N, Spring 2009 Yale Patt, Instructor Ramapriyan Chakravarthy, Khubaib, Vivekanand Venugopal, TAs Exam 1, March 11, 2009

Name:\_\_\_\_\_

 Problem 1 (20 points):

 Problem 2 (10 points):

 Problem 3 (10 points):

 Problem 4 (25 points):

 Problem 5 (15 points):

 Problem 6 (20 points):

 Total (100 points):

Note: Please be sure that your answers to all questions (and all supporting work that is required) are contained in the space provided.

Note: Please be sure your name is recorded on each sheet of the exam.

# **GOOD LUCK!**

# Problem 1 (20 points)

**Part a (4 points):** Consider memory that is protected by an even parity bit for every 8 bits of data. If we wish to transmit 10001001, we would instead transmit 100010011 where the last bit is the parity bit. Suppose the first and second bits were transmitted incorrectly, i.e., 010010011. Would the error be detected? Yes/No (circle one). If yes, explain how. If no, is this in general of serious concern? Use no more than 20 words.



**Part b (4 points):** In a 32 location content addressable memory, how many address bits are used to identify the location that one wishes to access? Explain in no more than 20 words.



**Part c (4 points):** "Critical Path design focuses on reducing the longest logic path among the various parts of the processor. Generally, the access time to on-chip memory (i.e., cache), the ALU path, and the microsequencer are the obvious culprits that one focuses on. In designing a version of the LC-3b with a single cycle cache, I first computed the delay for each of these three paths by measuring (a) the time it takes to load MDR, once the MAR is loaded, (b) the time it takes the ALU to read SR1 and SR2, perform an XOR, and load DR with the result, and (c) the time it takes to compute the address of the Control Store ROM, read its contents and latch the result into the microinstruction register. Since (a) was the longest, I focused on shortening that path."

What is wrong with the approach described above? Explain in no more than 20 words.

Name:\_\_\_\_\_

# Problem 1 continued

**Part d (4 points):** Is the size of the TLB part of the ISA or part of the microarchitecture? Explain in no more than 20 words.

**Part e (4 points):** The MIPS ISA originally allowed three-cycle MULs to be followed by single cycle ADDs (as back to backs) in a pipelined implementation without a hardware interlock. Explain how they did it.

Problem 2 (10 points)

| Part a (2 points): The LC-3b is litt | le endian. That means | STW R0,A   | means R0[15:8] gets stored into location |
|--------------------------------------|-----------------------|------------|------------------------------------------|
| and R0[7:0] gets stored              | in location           | . Assume A | is an even address.                      |

**Part b (8 points):** An enterprising engineer suggests that it would be worthwhile to produce a big endian LC-3b that could be used with data sets consisting of big endian values. Assume a compiler is available to produce code in big endian form.

What very small changes to the data path and/or microsequencer and/or state machine are necessary to change LC-3b to a big endian LC-3b? Show by means of diagrams all changes that are necessary:



# Problem 3 (10 points)

Suppose our ISA had two different add instructions, the classical load-store ADD Ri, Rj, Rk that is present in the LC-3b, and an ADDM Ri, Rj, Rk instruction where all operands are in memory. That is, the semantics of ADDM is:

 $M[Ri] \gets M[Rj] + M[Rk].$ 

**Part a (5 points):** A programmer wrote the high level language statement A[i] = B[i] + C[i] that is part of the loop body of a for loop. Should the compiler generate the ADD instruction or the ADDM instruction? Explain why.



**Part b (5 points):** A programmer wrote the high level language statement SUM = SUM + A[i] that is part of the loop body of a for loop. Should the compiler generate the ADD instruction or the ADDM instruction? Explain why.

#### Problem 4 (25 points)

Consider a byte-addressable memory system with two levels of virtual to physical translation (like the VAX).

Virtual Address Space: 256B, partitioned into equal regions of process space and system space. Bit[7] of the VA identifies the region, bit[7]=0 identifies process space. Page size: 4 bytes.

The system **does not** include a Translation Lookaside Buffer. The Page Table Entries are one byte each and the format is as follows:



For the Process P1:

PBR (process base register) is x90. PLR (process length register) is 9.

SBR (system base register) is x18 The SLR (system length register) is 8.

Shown below are the contents of some physical memory locations:

| x0 | x94 | x8 | x9B | x10 | x82 | x18 | x8A |
|----|-----|----|-----|-----|-----|-----|-----|
| x1 | x0C | x9 | xB8 | x11 | x62 | x19 | x85 |
| x2 | x01 | xA | x04 | x12 | xFF | x1A | x8C |
| x3 | x02 | xB | x91 | x13 | xF4 | x1B | x99 |
| x4 | x34 | xC | x1F | x14 | xF3 | x1C | x82 |
| x5 | x86 | xD | x82 | x15 | x94 | x1D | x05 |
| x6 | xAF | xE | x9A | x16 | x00 | x1E | x80 |
| x7 | x80 | xF | x12 | x17 | x10 | x1F | x09 |

Part a (8 points): Given Virtual Address 00001101 (x0D), what is the corresponding physical address?

Answer:

#### **Problem 4 continued**

Part b (3 points): How many pages of process page table and system page table are resident in physical memory?

| process page table: system page table: |  |
|----------------------------------------|--|
|----------------------------------------|--|

Part c (3 points): Which frame(s) contain the process page table and which frame(s) contain the system page table?

| process page table: |  | system page table: |  |
|---------------------|--|--------------------|--|
|---------------------|--|--------------------|--|

Part d (5 points): Which address ranges in process space will generate a page fault when accessed?



### Part e (6 points):

If page X of Process P1 and page Y of Process P2 are mapped to the same frame, page X of P1 is said to be shared with page Y of P2.

Is the virtual address 00000101(x05) of Process P2 in a page of P2 which is shared with a page of Process P1? If yes, what is the corresponding virtual address in Process P1? The PBR of Process P2 is x84 and the PLR of Process P2 is 4.

Answer:

# Problem 5 (15 points)

We wish to use memory chips having 30 nsec access time with a 100 MHz processor. Our objective is to design a 32KB byte-addressable physical memory. The system design restricts us to using exactly eight memory chips. The processor/memory data bus is 16 bits.

Part a (1 point): How many processor cycles does a single memory access take?



**Part b (2 points):** Can interleaving improve the performance of this memory for loads from sequential memory locations? If yes, what is the minimum number of banks required for optimal performance? If no, why not?

Part c (6 points): Three types of memory chips are available for our design:

A. 4K by 4 bits

B. 8K by 4 bits

C. 4K by 8 bits

Will 8 memory chips of type A satisfy the above performance and capacity design constraints? Explain.

### Problem 5 continued

Will 8 memory chips of type **B** satisfy the above performance and capacity design constraints? Explain.

Will 8 memory chips of type C satisfy the above performance and capacity design constraints? Explain.

#### Part d (3 points):

Next step is for you to design a full 32KB, optimal performance memory system to go with our 100 MHz processor, using whichever of the three types of memory chips you wish. Which type of memory chip have you selected?

Identify on the detailed address register shown below what each of the 15 address bits are used for (rank, chip address, interleave, byte on bus):



#### Part e (3 points):

The next step is to wire it up. Unfortunately, an Aggie was hired to do the job. After he finished, we noticed that instead of memory locations 2,3 being on bank 1, they were on bank 0, along with locations 4,5,6, and 7. Memory location 8 was on bank 1, memory location 16 on bank 2, etc.

On the address register shown below, identify what each of the 15 address bits were actually used for after the Aggie wired it up.



# Problem 6 (20 points)

We wish to use the unused opcode 1010 to implement a new instruction PRED\_OP, which conditionally executes the operation OP, depending on the state of the condition codes the same way that BR conditionally branches. OP can be ADD, AND, or XOR.

Semantically, the instruction is processed as:

```
IF([N AND n] OR [Z AND z] OR [P AND p])
DR/SR = DR/SR OP SR2
setcc();
```

PRED\_OP has the following format:

|         | 15 |   |   | 12 | 11 | 10 | 9 | 8   | 6   | 5 | 4 | 3 | 2 | 0  |
|---------|----|---|---|----|----|----|---|-----|-----|---|---|---|---|----|
| PRED_OP | 1  | 0 | 1 | 0  | n  | z  | р | DR, | /SR | 0 | O | 2 | S | R2 |

Note that bits[8:6] specifies a register that is both source and destination, like the two-address machine x86.

Bits[4:3] are encoded as follows:

ADD/00 AND/01 XOR/10 11 will never be used

# **Problem 6 continued**

## Changes to the state machine (10 points):

To implement PRED\_OP, minor changes to the microsequencer and data path of the LC-3b are required. The result is the updated state machine, as shown below.

Your job: Make the changes to the state machine, the microsequencer and the data path required to add PREP\_OP to the instruction set.

First describe what goes on in states A and B, and identify the binary representations for A,B,x,y,a,b,c,d,U,V,W in the state machine.

Note that the microinstruction at state 40 is the same as the microinstruction at states 18 and 19.



# **Problem 6 continued**

# Changes to the Microsequencer (6 points):

To have the state machine behave as described, you must do a 4-way branch from state B. One control signal plus a very simple logic block is all that is needed to make that happen. Add the logic block and control signal in the box provided to make that happen.



Name:\_\_\_\_\_

# **Problem 6 continued**

# Changes to the Data Path (4 points):

Only one small change is needed to the data path. Add the logic required to make that happen in the box provided.



|                  | 15 | 14      | 13 | 12   | 11 | 10  | 9    | 8 | 7    | 6   | 5     | 4    | 3    | 2          | 1         | 0 |
|------------------|----|---------|----|------|----|-----|------|---|------|-----|-------|------|------|------------|-----------|---|
| ADD⁺             |    | 00      | 01 | I    |    | DR  |      |   | SR 1 |     | 0     | -    | 0    |            | SR2       |   |
| ADD⁺             |    | 00      | 01 | 1    |    | DR  | 1    |   | SR 1 |     | 1     |      |      | 'nm        |           |   |
| AND⁺             |    | 01      | 01 | 1    |    | DR  |      |   | SR1  |     | 0     | -    | 0    |            | SR2       |   |
| AND⁺             |    | 01      | 01 |      |    | DR  |      |   | SR 1 |     | 1     |      | ir   | nm         |           |   |
| BR               |    | 00      | 00 | I    | n  | z   | р    |   |      |     |       | offs |      |            |           |   |
| JMP              |    | 11      | 00 | I    |    | 000 |      | B | ase  |     |       |      | 000  | 000        | <br> <br> |   |
| JSR              |    | 1       | 00 | I    | 1  |     |      |   |      | PCo | offse | et11 |      | 1          |           |   |
| JSRR             |    | 01      | 00 |      | 0  | 0   | 0    | B | ase  | R   |       | (    | nnn  | 000        |           |   |
| LDB⁺             |    | 1       | 10 | I    |    | DR  |      | B | ase  |     |       |      |      | set        |           |   |
| LDW <sup>+</sup> |    | 1       | 10 | 1    |    | DR  | 1    | В | ase  |     |       |      | 1    | et6        |           |   |
| $LEA^{+}$        |    |         | 10 | 1    |    | DR  |      |   |      |     | PC    | offs | et9  |            |           |   |
| NOT⁺             |    |         | 01 | 1    |    | DR  | I    |   | SR   |     | 1     |      | 1    | <b>111</b> | 1         |   |
| RET              |    | 1       | 00 | 1    |    | 000 | 1    |   | 111  | 1   |       |      |      | 000        |           |   |
| RTI              |    | 10      | 00 | 1    |    | 1   |      |   | 000  |     | 000   |      |      | 1          |           |   |
| $LSHF^{+}$       |    | 1       |    | 1    |    | DR  |      |   | SR   |     | 0     | 0    |      | imo        | unt       | 4 |
| $RSHFL^{+}$      |    | 11      | 01 | 1    |    | DR  |      |   | SR   |     | 0     | 1    | C    |            | unt       | 4 |
| $RSHFA^{+}$      |    | ่<br>11 |    | 1    |    | DR  |      |   | SR   | 1   | 1     | 1    |      |            | unt       | 4 |
| STB              |    | 00      | 11 | 1    |    | SR  |      |   | ase  | R   |       | k    | off  | set        | 5<br>1    |   |
| STW              |    | 01      | 11 | 1    |    | SR  |      | B | ase  | R   |       |      | offs | et6        |           |   |
| TRAP             |    | 11      | 11 | 1    |    | 00  | 00   |   |      |     | tre   | apv  | ec   | t8         |           |   |
| XOR⁺             |    | 10      | 01 | <br> |    | DR  |      |   | SR1  |     | 0     | 0    | 0    |            | SR2       |   |
| XOR⁺             |    | 10      | 01 | <br> |    | DR  |      |   | SR   |     | 1     |      | i    | mm         | 5         |   |
| not used         |    | 10      | 10 | I    |    |     |      |   |      |     |       |      |      |            |           |   |
| not used         |    | 10      | 11 | 1    |    |     | <br> |   |      |     | <br>  |      |      | <br>       |           |   |

Figure 1: LC-3b Instruction Encodings

| Signal Name                                                                          | Signal Values                                                                                                                |                                                                                                    |
|--------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------|
| LD.MAR/1:<br>LD.MDR/1:<br>LD.IR/1:<br>LD.BEN/1:<br>LD.REG/1:<br>LD.CC/1:<br>LD.PC/1: | NO(0), LOAD(<br>NO(0), LOAD(<br>NO(0), LOAD(<br>NO(0), LOAD(<br>NO(0), LOAD(<br>NO(0), LOAD(<br>NO(0), LOAD(<br>NO(0), LOAD( | 1)<br>1)<br>1)<br>1)<br>1)<br>1)                                                                   |
| GatePC/1:<br>GateMDR/1:<br>GateALU/1:<br>GateMARMUX/1:<br>GateSHF/1:                 | NO(0), YES(1)<br>NO(0), YES(1)<br>NO(0), YES(1)<br>NO(0), YES(1)<br>NO(0), YES(1)                                            |                                                                                                    |
| PCMUX/2:                                                                             | PC+2(0)<br>BUS(1)<br>ADDER(2)                                                                                                | ;select pc+2<br>;select value from bus<br>;select output of address adder                          |
| DRMUX/1:                                                                             | 11.9(0)<br>R7(1)                                                                                                             | ;destination IR[11:9]<br>;destination R7                                                           |
| SR1MUX/1:                                                                            | 11.9(0)<br>8.6(1)                                                                                                            | ;source IR[11:9]<br>;source IR[8:6]                                                                |
| ADDR1MUX/1:                                                                          | PC(0), BaseR(1                                                                                                               | )                                                                                                  |
| ADDR2MUX/2:                                                                          | ZERO(0)<br>offset6(1)<br>PCoffset9(2)<br>PCoffset11(3)                                                                       | ;select the value zero<br>;select SEXT[IR[5:0]]<br>;select SEXT[IR[8:0]]<br>;select SEXT[IR[10:0]] |
| MARMUX/1:                                                                            | 7.0(0)<br>ADDER(1)                                                                                                           | ;select LSHF(ZEXT[IR[7:0]],1)<br>;select output of address adder                                   |
| ALUK/2:                                                                              | ADD(0), AND(                                                                                                                 | (1), XOR(2), PASSA(3)                                                                              |
| MIO.EN/1:<br>R.W/1:<br>DATA.SIZE/1:<br>LSHF1/1:                                      | NO(0), YES(1)<br>RD(0), WR(1)<br>BYTE(0), WOF<br>NO(0), YES(1)                                                               |                                                                                                    |

Table 1: Data path control signals

| Signal Name         Signal Values           J/6:         ;Unconditional           COND/2:         COND <sub>0</sub> ;Unconditional           COND <sub>1</sub> ;Memory Ready           COND <sub>2</sub> ;Branch           COND <sub>3</sub> ;Addressing Mod | _ | Table 2: Microsequencer control signals |                                                                               |               |  |  |  |  |  |  |  |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---|-----------------------------------------|-------------------------------------------------------------------------------|---------------|--|--|--|--|--|--|--|
| COND/2: COND <sub>0</sub> ;Unconditional<br>COND <sub>1</sub> ;Memory Ready<br>COND <sub>2</sub> ;Branch                                                                                                                                                     |   | Signal Name                             | Signal Va                                                                     | lues          |  |  |  |  |  |  |  |
|                                                                                                                                                                                                                                                              |   | 0,01                                    | $\begin{array}{c} \operatorname{COND}_1 \\ \operatorname{COND}_2 \end{array}$ | ;Memory Ready |  |  |  |  |  |  |  |
| IRD/1: NO, YES                                                                                                                                                                                                                                               | _ | IRD/1:                                  | NO, YES                                                                       |               |  |  |  |  |  |  |  |



Figure 2: A state machine for the LC-3b



Figure 3: The LC-3b data path



Figure 4: The microsequencer of the LC-3b base machine