EE 360N
Spring 2001
Y. N. Patt, Instructor
Onur Mutlu, Kameswar Subramaniam, TAs
Problem Set 3
Due: April 2, 2001
1. In class, we discussed the finite state machine for
the device controller of an input-output device within the context of a priority-
arbitration system. Draw the state diagram for this device controller (as
drawn in lecture), identify the input and output signals, and briefly
explain the function of each input/output signal. As mentioned in class,
the finite state machine presented has at least two race conditions.
Identify two conditions and show what simple modifications can be made to
eliminate them. One is a simple modification to the State Machine. The
other is a simple modification to the logic in the PAU.
Hint: Is there anything in the design in class that could cause more than
one controller to assert SACK during the same bus cycle?
Note: The logic equations for each of the BG signals should be:
BG_i = NOT(SACK)AND(NOT(BR_i+1 OR BR_i+2 OR ... BR_i+n))AND(BR_i)
Hint: When will a BG signal be assserted? Is the propagation delay through
the edge-triggered D flipflops in the PAU = 0?
2. The Address Control logic (in the LC-2 datapath) allows the state machine to control memory operations (LD/ST) and I/O operations (in/out). We use LD to do input, and ST to do output. On input, if the address is a device register, the contents of that device register is loaded into MDR using the same state that loads memory into MDR. Similarly, on output, if the address is a device register, the contents of MDR are written to the device register using the same state that stores MDR to memory.
The address control logic combined with the three control signals (MAR, M_EN, and R/W) produce the nine signals that actually control the Memory/I/O subsystem. {Note: M_EN would be better identified as MIO_EN.}
For this exercise, we want you to generate the five input, nine output
truth table to control the memory and I/O subsystem. Five inputs, because
we can decrease the 16-bit MAR to three bits: one bit to distinguish the
address as a memory address or a device register address, and two bits
to distinguish the four device registers shown in data path. Note for example
that if MIO_EN is 0, no Memory/I.O activity is wanted this cycle. What
do we do with the nine output signals for such a case? How many of the
32 input combinations are now done? Finish the truth table.
3. An ISA supports 8-bit, byte-addressable virtual address space. The
corresponding physical memory has only 128 bytes. Each page contains 16
bytes. A simple, one-level translation scheme is used and the page table
resides in physical memory. The initial contents of the frames of physical
memory are shown below.
Frame 0 | empty |
Frame 1 | Page 13 |
Frame 2 | Page 5 |
Frame 3 | Page 10 |
Frame 4 | empty |
Frame 5 | Page 1 |
Frame 6 | empty |
Frame 7 | Page Table |
References (to pages):
14
14
13
14
6
6
13
14
15
14
15
13
10
3
At the end of this sequence, what three entries are contained in the TLB?
4. Please also see the "Clarifications" for this problem. Let's say we added a virtual memory system to the LC-2. Which instructions can possibly generate a page fault? What is the maximum number of page faults an instruction can possibly generate while it's being processed? Which instructions can possibly generate that maximum number of page faults?
5. Please also see the "Clarifications" for this problem. (Hamacher et al., p. 255, qn. 5.18) Assume that a computer has a primary and secondary cache. The cache block consists of 8 words. Assume that the hit rate is the same for both the caches and that it is equal to 0.95 for instructions and 0.90 for data. Assume also that the times needed to access an 8-word block in these caches are C1 = 1 cycle and C2 = 10 cycles.
a. What is the average access time experienced by the CPU if the main memory uses interleaving? Assume that the memory is built with DRAM chips that allow the first word to be accesses in 8 cycles, but subsequent words of the block are in 4 clock cycles per word.Also, one clock cycle is required to send one word to the cache.
b. What is the average access time if the main memory is not interleaved?
c. What is the improvement obtained with interleaving?
6. Calculate the number of cycles it takes to execute the given code on the following models:
a) straightforward execution (like the LC-2)Each instruction is specified with destination register first. Multiplies take 4 cycles and additions take 2 cycles.
b) a pipeline with scoreboarding with one multiplier and one adder (pipeline model as discussed in class)
(c) Tomasulo's algorithm with one multiplier and one adder.
(d) Show the state of the reservation stations (node tables) and the register file (with the valid bits and the tags) in part (c) after the execution of each instruction. Assume that the adder and the multiplier both have 4 entries in their associated reservation stations. Use symbolic tags for renaming as Professor Patt did in class. The structure of the reservation stations and register file (register alias table) should be the same as explained in lecture.MUL R3, R1, R2
ADD R5, R4, R3
ADD R6, R4, R1
MUL R7, R8, R9
ADD R4, R3, R7
MUL R10, R5, R6
Note: Fetch, decode, and store result stages each take 1 cycle. Do not forget to list any assumptions you make about the pipeline structure (e.g. whether it uses data forwarding).
7. Prof. Patt's graduate students have decided to build a computer system using the LC-2. The system will have one LC-2 processor connected to physical memory and several disk units via a shared bus. The disk units have the ability to transfer data directly to/from the memory via the Direct Memory Access controller.
Every time a disk unit finishes a transfer, the LC-2 is interrupted, and the disk unit is given another transfer operation. The unit of transfer between the disk and the memory is a 2^12 byte page and the disk units are capable of maintaining a transfer rate of 2^18 bytes/sec. The bus itself is the latest technology and has infinite bandwidth.
After a few experiments, Prof. Patt's students found that the average disk transfer consisted of 2 pages of data. The disk interrupt handler on the LC-2 was known to take 5 msec of processing time per interrupt. The goal of their experimentation was to figure out how many disk units could be connected to the system and fully utilized. Help them out.