Department of Electrical and Computer Engineering
The University of Texas at Austin

EE 360N, Spring, 2001
Yale Patt, Instructor
Onur Mutlu, Kameswar Subramaniam, TAs
Exam 2, April 18, 2001

Name: ______________________________

Problem 1 (10 points); _______
Problem 2 (10 points); _______
Problem 3 (15 points); _______
Problem 4 (10 points); _______
Problem 5 (20 points); _______
Problem 6 (20 points); _______
Problem 7 (15 points); _______
Total (100 points); ________________

Note: Please be sure that your answers to all questions (and all supporting work that is required) are contained in the space provided.

Note: Please be sure your name is recorded on each sheet of the exam.

GOOD LUCK!
Name: __________________________________________

Problem 1 (10 points):

**Part a** (5 points):

To multiply 011101 by 011101 using Booth’s algorithm, how many steps are required? 

Show the contents of the temporary result register after each step:

- After step 1: __________
- After step 2: __________
- After step 3: __________
- After step 4: __________
- After step 5: __________

**Part b** (5 points): What benefit does predicated execution provide? How does it provide this benefit?
Problem 2 (10 points):

In the following code sequence, the destination register for each instruction is indicated first, followed by the two sources. One instruction can be fetched each cycle, decoded and renamed in the next cycle, and issued to the reservation station in the third cycle. Note that there is a single common reservation station for all functional units. Instructions sit in the reservation station until their source operands are valid, at which point they can be scheduled for execution. Assume that the reservation station contains 20 entries, numbered 0 to 19, and that all instructions are issued from the decode and aliasing stage to the next available reservation station entry. Note that this will require adding an operation field to each entry.

```
DIV R3, R1, R2
ADD R5, R4, R3
ADD R6, R4, R1
MUL R6, R8, R6
ADD R4, R3, R7
MUL R9, R5, R6
```

Assume an initially empty reservation station, and the DIV instruction is issued in the first cycle to reservation station 0. Assume the DIV takes 16 cycles to execute, ADD takes 4, and MUL takes 6.

Show the contents of the register file after MUL R9, R5, R6 has been issued to the reservation station.

<table>
<thead>
<tr>
<th>Initial Contents</th>
<th>Contents after MUL R9, R5, R6 issued</th>
</tr>
</thead>
<tbody>
<tr>
<td>V</td>
<td>tag</td>
</tr>
<tr>
<td>R0</td>
<td>1</td>
</tr>
<tr>
<td>R1</td>
<td>1</td>
</tr>
<tr>
<td>R2</td>
<td>1</td>
</tr>
<tr>
<td>R3</td>
<td>1</td>
</tr>
<tr>
<td>R4</td>
<td>1</td>
</tr>
<tr>
<td>R5</td>
<td>1</td>
</tr>
<tr>
<td>R6</td>
<td>1</td>
</tr>
<tr>
<td>R7</td>
<td>1</td>
</tr>
<tr>
<td>R8</td>
<td>1</td>
</tr>
<tr>
<td>R9</td>
<td>1</td>
</tr>
</tbody>
</table>
Problem 3 (15 points):

A 256 KB write-through cache has been designed to work with a computer that has a 32 MB physical address space. The cache is virtually indexed and physically tagged. The ISA specifies a 32-bit virtual address space and a page size of 4 KB. The cache is 2-way set associative, has a block size of 8 bytes, and uses a perfect LRU replacement policy. The cache is not sectored. How many bits of storage are required to implement the tag store?
Name: ____________________________

Problem 4 (10 points):

In class, we discussed synchronous distributed arbitration, and used as an example, a generic controller handed out in class. It is reproduced below.

We wish to change the protocol such that every bus transaction takes exactly one cycle. Augment the above figure to (a) incorporate this change, and (b) tailor this figure to handle the device of 3rd highest priority. That is, it requests the bus with the signal BR3. Show all the logic needed to implement the block labeled "Comb Logic" on your figure.
Name: ________________________________

Problem 5 (20 points):

**Part I.** If the vector stride register (Vst) has the value S, and the vector length register (Vln) has the value L, what does the following Cray-like assembly language instruction do?

VLD V0, A

**Part II.** Consider a two-dimensional square array with 16 elements (4 rows and 4 columns). The array is stored in row-major order in memory, in 16 consecutive locations starting from address A. Each element occupies one memory location.

We would like to find the sum of the elements in each row in the array and store the result in 4 consecutive locations starting from address B.

Write Craylike assembly code to perform this operation. Hint: You will need to load Vst and Vln with appropriate values. (Feel free to use “A + constant” as an addressing mode. For example, Vld V5, A+7)

**Part III.** How many cycles will this program take to execute on a vector processor with chaining which has 2 read ports and 1 write port to memory? (Assume a single memory access takes 11 cycles, a single multiply takes 6 cycles, and a single add takes 4 cycles. All the execution units are pipelined. Memory is 4-way interleaved and vector registers have 64 elements in each)
Problem 6 (20 points):

**Part I.** Consider the 32-bit IEEE floating point format with 1 sign bit, 8 bits of excess-127 exponent, and 23 bits of fraction. What are the smallest normal and sub-normal positive numbers that can be represented using this format?

**Part II.** We would like to convert this number into a 40-bit extended floating point format with 1 sign bit, 11 bits of excess-1023 exponent, and 28 bits of fraction. What are the smallest normal and sub-normal positive numbers that can be represented using this extended format?

**Part III.** How would you convert a 32-bit format normalized number to the 40-bit format?

**Part IV.** Describe the steps required to convert a subnormal 32-bit number to the 40-bit format. Will these numbers be represented as subnormal or normal numbers in the 40-bit format?

**Part V.** Express the smallest subnormal 32-bit number in the 40-bit format.
Problem 7 (15 points):

In Lab Assignment 3, we were interested in augmenting the microsequencer to allow microbranches to the states that initiate interrupt handling and page 0 access control violations.

In this problem, we are interested in augmenting the microsequencer to take an exception if an illegal trapvector is specified.

We have decided to restrict trapvectors to the range x20 to xFF. That is, the TRAP instruction with trapvectors from x00 to x1F would not be allowed. Your task is to augment the original LC-2 microsequencer to take an exception if an illegal trapvector is present.

Define ITV (for illegal trapvector) as follows:

ITV = 1; illegal trapvector
ITV = 0; not an illegal trapvector

**Part a.** In what state should ITV be tested? The result of the test is a branch to the state that initiates illegal trapvector processing, or to continue normally.

In what state should ITV be computed?

Specify the logic function that computes ITV.

**Part b.** Augment the original microsequencer to take an exception in the presence of an illegal trapvector. Draw the augmented microsequencer below. Show clearly what additional control signals are needed. Identify (i.e., specify the bit encoding of) the state that will be used to initiate exception handling for ITV.