Department of Electrical and Computer Engineering
The University of Texas at Austin

EE 360N
Spring 2001
Y. N. Patt, Instructor
Onur Mutlu, Kameswar Subramaniam, TAs
Problem Set 3
Due: April 2, 2001

1. In class, we discussed the finite state machine for the device controller of an input-output device within the context of a priority- arbitration system. Draw the state diagram for this device controller (as drawn in lecture), identify the input and output signals, and briefly explain the function of each input/output signal. As mentioned in class, the finite state machine presented has at least two race conditions. Identify two conditions and show what simple modifications can be made to eliminate them. One is a simple modification to the State Machine. The other is a simple modification to the logic in the PAU.

Hint: Is there anything in the design in class that could cause more than one controller to assert SACK during the same bus cycle?
Note: The logic equations for each of the BG signals should be:

        BG_i = NOT(SACK)AND(NOT(BR_i+1 OR BR_i+2 OR ... BR_i+n))AND(BR_i)
Hint: When will a BG signal be assserted? Is the propagation delay through the edge-triggered D flipflops in the PAU = 0?

2. The Address Control logic (in the LC-2 datapath) allows the state machine to control memory operations (LD/ST) and I/O operations (in/out). We use LD to do input, and ST to do output. On input, if the address is a device register, the contents of that device register is loaded into MDR using the same state that loads memory into MDR. Similarly, on output, if the address is a device register, the contents of MDR are written to the device register using the same state that stores MDR to memory.

The address control logic combined with the three control signals (MAR, M_EN, and R/W) produce the nine signals that actually control the Memory/I/O subsystem. {Note: M_EN would be better identified as MIO_EN.}

For this exercise, we want you to generate the five input, nine output truth table to control the memory and I/O subsystem. Five inputs, because we can decrease the 16-bit MAR to three bits: one bit to distinguish the address as a memory address or a device register address, and two bits to distinguish the four device registers shown in data path. Note for example that if MIO_EN is 0, no Memory/I.O activity is wanted this cycle. What do we do with the nine output signals for such a case? How many of the 32 input combinations are now done? Finish the truth table.
 

3. An ISA supports 8-bit, byte-addressable virtual address space. The corresponding physical memory has only 128 bytes. Each page contains 16 bytes. A simple, one-level translation scheme is used and the page table resides in physical memory. The initial contents of the frames of physical memory are shown below.
Frame 0 empty
Frame 1 Page 13
Frame 2 Page 5
Frame 3 Page 10
Frame 4 empty
Frame 5 Page 1
Frame 6 empty
Frame 7 Page Table

A three-entry Translation Lookaside Buffer that uses LRU replacement is added to this system. Initially, this TLB contains the entries for pages 0, 2, and 13. For the following sequence of references, put a circle around those that generate a TLB hit and put a rectangle around those that generate a page fault. What is the hit rate of the TLB for this sequence of references? (Note: LRU policy is used to select pages for replacement in physical memory.)

References (to pages):
14
14
13
14
6
6
13
14
15
14
15
13
10
3

At the end of this sequence, what three entries are contained in the TLB?

4. Please also see the "Clarifications" for this problem. Let's say we added a virtual memory system to the LC-2. Which instructions can possibly generate a page fault? What is the maximum number of page faults an instruction can possibly generate while it's being processed? Which instructions can possibly generate that maximum number of page faults?

5. Please also see the "Clarifications" for this problem. (Hamacher et al., p. 255, qn. 5.18) Assume that a computer has a primary and secondary cache. The cache block consists of 8 words. Assume that the hit rate is the same for both the caches and that it is equal to 0.95 for instructions and 0.90 for data. Assume also that the times needed to access an 8-word block in these caches are C1 = 1 cycle and C2 = 10 cycles.

a. What is the average access time experienced by the CPU if the main memory uses interleaving? Assume that the memory is built with DRAM chips that allow the first word to be accesses in 8 cycles, but subsequent words of the block are in 4 clock cycles per word.Also, one clock cycle is required to send one word to the cache.
b. What is the average access time if the main memory is not interleaved?
c. What is the improvement obtained with interleaving?

6. Calculate the number of cycles it takes to execute the given code on the following models:

a) straightforward execution (like the LC-2)
b) a pipeline with scoreboarding with one multiplier and one adder (pipeline model as discussed in class)
(c) Tomasulo's algorithm with one multiplier and one adder.
(d) Show the state of the reservation stations (node tables) and the register file (with the valid bits and the tags) in part (c) after the execution of each instruction. Assume that the adder and the multiplier both have 4 entries in their associated reservation stations. Use symbolic tags for renaming as Professor Patt did in class. The structure of the reservation stations and register file (register alias table) should be the same as explained in lecture.

MUL R3,  R1, R2
ADD R5,  R4, R3
ADD R6,  R4, R1
MUL R7,  R8, R9
ADD R4,  R3, R7
MUL R10, R5, R6

Each instruction is specified with destination register first. Multiplies take 4 cycles and additions take 2 cycles.

Note: Fetch, decode, and store result stages each take 1 cycle. Do not forget to list any assumptions you make about the pipeline structure (e.g. whether it uses data forwarding).

7. Prof. Patt's graduate students have decided to build a computer system using the LC-2. The system will have one LC-2 processor connected to physical memory and several disk units via a shared bus. The disk units have the ability to transfer data directly to/from the memory via the Direct Memory Access controller.

Every time a disk unit finishes a transfer, the LC-2 is interrupted, and the disk unit is given another transfer operation. The unit of transfer between the disk and the memory is a 2^12 byte page and the disk units are capable of maintaining a transfer rate of 2^18 bytes/sec. The bus itself is the latest technology and has infinite bandwidth.

After a few experiments, Prof. Patt's students found that the average disk transfer consisted of 2 pages of data. The disk interrupt handler on the LC-2 was known to take 5 msec of processing time per interrupt. The goal of their experimentation was to figure out how many disk units could be connected to the system and fully utilized. Help them out.