# Department of Electrical and Computer Engineering The University of Texas at Austin 

EE 460N Fall 2016
Y. N. Patt, Instructor

Siavash Zangeneh, Ali Fakhrzadehgan, Steven Flolid, Matthew Normyle TAs
Exam 2
November 21, 2016

Problem 1 (25 points): $\qquad$

Problem 2 (15 points): $\qquad$

Problem 3 (15 points): $\qquad$
Problem 4 (20 points): $\qquad$

Problem 5 (25 points): $\qquad$

Total (100 points): $\qquad$

Note: Please be sure that your answers to all questions (and all supporting work that is required) are contained in the space provided.

Note: Please be sure your name is recorded on each sheet of the exam.

Please sign the following. I have not given nor received any unauthorized help on this exam.

Signature: $\qquad$

Name: $\qquad$

## Problem 1 (25 points)

Part a ( 5 points): A 2-way set-associative physically indexed, physically tagged write through cache implements LRU replacement. Memory space is $2^{28}$, memory is byte-addressable, and the physical address bits are assigned as follows:

| 15 | 7 |  |
| :---: | :---: | :---: |
| Tag bits | Index bits | Byte on block bits |

What are the total number of bits of storage required to implement a perfect LRU replacement policy for this cache.


Part b (5 points): If a cache is virtual, we avoid confusion as to what each address in the tag store means by one of two ways. Name one of them.

Part c (5 points): Interrupt priority is determined by the urgency of handling the event. Is any event more urgent than "loss of power." Explain.

$$
\begin{aligned}
& \text { Machine check: a processor that does not works correctly } \\
& \text { is morse than a processor that does not work. }
\end{aligned}
$$

Part d ( 5 points): IEEE Floating Point uses radix 2 for dealing with exponents. The old IBM 360, currently called IBM Z-series, uses radix 16. Two advantages of radix 2 over radix 16 are:


> The most significant bit does not have to be stored

Part e (5 points): Seymour Cray, in many ways the "father" of the vector computer, did not like the notion of cache memory, so the Cray I did not have caches. What is it about cache memory that he did not like?
non - determinism

Name:

Problem 2 ( 15 points)
Consider a floating point data type with all the IEEE characteristics except it contains 9 bits, rather than 32 or 64 bits. Your job: How many bits of exponent, how many bits of fraction? The bias is 4 .

We can represent $2 \frac{1}{4}$ and $\frac{1}{16}$ exactly.
If the rounding mode is round-toward-zero, computing $\left(\left(2 \frac{1}{4}+\frac{1}{16}\right)+\frac{1}{16}\right)$ produces the result $2 \frac{1}{4}$, and computing $\left(\left(\frac{1}{16}+\frac{1}{16}\right)+2 \frac{1}{4}\right)$ produces the result $2 \frac{3}{8}$.

Part a (3 points): Given that computing $\left(\left(2 \frac{1}{4}+\frac{1}{16}\right)+\frac{1}{16}\right)$ should produce the result $2 \frac{3}{8}$, why was the result computed as $2 \frac{1}{4}$ ?

rounding to $2 \frac{1}{4}$. Or, there are not enough fraction bits for intermediate results of $A D D_{s}$, which results in rounding errors.

Part b (7 points):

$$
\begin{aligned}
2 \frac{1}{4} & =19.01 \\
& =1.001 \times 2^{1}
\end{aligned}
$$

Number of bits of fraction


$$
\begin{aligned}
\frac{1}{16} & =1.000 \times 2^{-4} \\
& =0.00001 \times 2^{1}
\end{aligned}
$$

Part c (5 points):


Name: $\qquad$

## Problem 3 ( 15 points)

A processor/PAU and three device controllers are connected to an asynchronous bus as shown. Bus requests are of the form BR1 and BR2. BR2 is of higher priority.


In class, we showed timing diagrams for transactions on the bus. In this problem, we will examine the timing diagram for arbitration, as shown on the next page.
$\qquad$


At to, a bus master asserts BBSYo, allowing arbitration for the next bus cycle to begin. If you examine the entire timing diagram, can you tell which controller acquires the bus at 0 ? CNT2

The PAU asserts BG1 as shown on the diagram. The next four signals are unlabeled. Your job: provide the labels.
At some point, the current bus master completes the bus cycle and releases the bus. This causes controller 1 to provide signals A or controller 3 to provide signals B.

Does controller 1 want the bus for the next bus cycle? Yes No (Circle one). Explain.
2 points No, since it daisy chains BG1 to CNT 3

Does controller 3 want the bus for the next bus cycle? Yes No (Circle one). Explain.


Name: $\qquad$

## Problem 4. (20 points)

Suppose we have a byte-addressable machine with a virtually indexed, physically tagged cache. The cache is two way set associative and is initially empty. The cache uses perfect LRU replacement.

The machine has a 16 bit virtual address space, with a 128 byte page size. The physical address space is 11 bits. Assume the virtual address has the form:

| Leftover bits | Y Index bits | Z Byte on line bits |
| :--- | :--- | :--- |

Part a: How many cache lines are there in the 16 bit virtual address space? Solve in terms of Z .

## 3 points $2^{\text {Parts: How many cate lines are }}$

3 points Part b: Suppose you know one set of the cache is full, and nothing about the other sets. How many cache lines in
if Cadre line size following are consecutive memory accesses.



Part c: What range of virtual addresses can A be? Write your answer in binary.

6 points Part d: Which bits of the virtual address are used for indexing? VA $[8: 3]$
2 points Part e: Is B a hit or a miss? miss

Name: $\qquad$

## Problem 5 ( 25 points)

Consider an Out-of-Order exeuction/In-Order retirement processor. The ISA specifies 8 registers, R0 to R7. The microarchitecture contains one non-pipelined adder and one non-pipelined multiplier. Fetch and Decode take one cycle each. ADD execution takes 2 cycles, MUL execution takes 4 cycles, and in both cases one cycle is needed to write the result to a destination register. There is no data bypassing. The adder and multiplier each has a 3-entry reservation station. They are initially empty, and are filled from top to bottom. Each instruction remains in the reservation station until it retires.

A program consisting of five instructions takes 17 clock cycles to execute on this processor. The state of the register file is shown before execution starts (i.e., before cycle 1), at the end of cycle 7, and at the end of cycle 17.


The state of the reservation stations at the end of cycle 7 are shown below.


Reservation Stations at the end of cycle 7

Name:

The following chart shows the cycles when the adder and multiplier are in use.

|  | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Multiplier |  |  | E | E | E | E | E | E | E | E |  |  |  |  |  |  |  |
| Adder |  |  |  | E | E |  |  |  |  |  |  | E | E |  | E | E |  |

Part a ( $\mathbf{2 0}$ points): Specify the five-instruction program.
instruction

| 1 | $M U L$ | $R 2, R O, R 1$ |
| :--- | :--- | :--- |
| 2 | $A D D$ | $R 1, R S, R 6$ |
| 3 | $M U L$ | $R 3, R S, R 6$ |
| 4 | $A D D$ | $R 3, R 2, R 3$ |
| 5 | $A D D$ | $R 2, R 1, R 3$ |

$R 2 \leftarrow 20$
$R 1 \leftarrow 5$
$R 3 \leftarrow 6$
$R 3 \leftarrow 26$

Part b (5 points): : How big is the Reorder Buffer? Explain. Hint: Instructions are allocated to the reorder at the same time they are placed in a reservation station.

4 : Since the 5 th instruction does not exist in the ADD reservation station in the end of cycle 7, that means 5 th instruction could not have been decoded due to Reorder Buffer being full
timeline
$1234567391011121314151617 \longrightarrow$
(1) FDEEEER2)
(2) FDEEV R1
(3)

$$
\begin{array}{ll}
\text { Fo } & E \in E:[\mathbb{R 3}) \\
\text { Fo } & :(R 3)
\end{array}
$$

