Department of Electrical and Computer Engineering
The University of Texas at Austin
EE 360N, Fall 2004
Problem Set 5
Due: 8 November 2004, before class
Yale N. Patt, Instructor
Aater Suleman, Huzefa Sanjeliwala, Dam Sunwoo, TAs
You are encouraged to work on the problem set in groups and turn in
one problem set for the entire group. Remember to put all your names on
the solution sheet. Also remember to put the name of the TA in whose discussion
section you would like the problem set returned to you.
Explain the differences between exceptions and interrupts. Be concise
in your explanations.
Explain the similarities of exceptions and interrupts. Clearly
describe the steps required to handle an exception or an interrupt.
In class, we discussed two types of busses: "pending bus" and "split
transaction bus". What is the advantage of a split-transaction bus
over a pending bus?
In class, we discussed the asynchronous finite state machine for the
device controller of an input-output device within the context of a
priority arbitration system. Draw the state diagram for this device
controller (as drawn in lecture), identify the input and output
signals, and briefly explain the function of each input and output
As mentioned in class, the finite state machine has some race
conditions. Identify the race conditions and show what simple
modifications can be made to eliminate them.
A group of students have decided to build a computer system using the
LC-3b. The system will have one LC-3b processor connected to physical
memory and several disk units via a shared bus. The disk units have
the ability to transfer data directly to and from memory via the
Direct Memory Access controller.
Every time a disk unit finishes a transfer, the LC-3b is interrupted,
and the disk unit is given another transfer operation. The unit of
transfer between the disk and the memory is a 212 B page
and the disk units are capable of maintaining a transfer rate of
218 B/s. The bus itself is the fastest technology and is
able to keep up with the transfer rate of the disk units (i.e., the
bus does not slow down the transfer between disk and memory).
After a few experiments, the students found that the average disk
transfer consisted of 2 pages of data. The disk interrupt handler on
the LC-3b was known to take 5 ms of processing time per interrupt. The
goal of their experiment was to figure out how many disk units could
be connected to the system and fully utilized. Help them out.
- In class we discussed asynchronous buses with central arbitration. Our job in this problem is to design the state machine for a synchronous bus using distributed arbitration. Recall that with distributed arbitration, each device receives the Bus Request signals from all other devices, and determines whether or not it is the next Bus Master. Assume all bus transactions take exactly one cycle, and that no device may be the Bus Master for two consecutive cycles.
Assume four devices, having priorities 1, 2, 3, and 4 respectively. Their respective controllers request the bus via asserting BR1, BR2, BR3, and BR4 respectively. Priority 4 is the highest priority.
- Show the interconnections required for distributed arbitration for the four devices and their controllers connected to the bus. Be sure to label each signal line and designate by arrows whether the signals are input or output with respect to the device.
- Is it possible for starvation to occur in this configuration? Describe the situation where this can occur.
- Assume each I/O Controller is implemented using a clocked finite state machine. Draw a Moore model state machine for the controller operating at priority level 2. Label each state clearly. Label all necesary inputs and outputs. You do not need to show the clock signal on the state machine diagram. State transitions are synchronized to the clock.
Given the following code segment:
for(i = 0; i < 100; i++)
A[i] = (B[i] * C[i] + D[i]) / 2;
Write Cray-like assembly code to perform the calculation. Then compute the number of cycles required for the code segment to execute on the following machines:
Assume each machine has vector registers of length 64. The Multiply, Add, and Load units are pipelined and take 6, 4, and 11 cycles, respectively, to complete one operation. Memory is 16-way interleaved. For this problem assume stores take 11 cycles, shifts take 1 cycle and loading the vector length and vector stride registers each take 1 cycle.
- Scalar processor
- Vector processor without chaining, 1 port to memory (1 load or store / cycle)
- Vector processor with chaining, 1 port to memory
- Vector processor with chaining, 2 read ports and 1 write port to memory
Given the following code:
MUL R3, R1, R2
ADD R5, R4, R3
ADD R6, R4, R1
MUL R7, R8, R9
ADD R4, R3, R7
MUL R10, R5, R6
Note: Each instruction is specified with the destination register first.
Calculate the number of cycles it takes to execute the given code on the following models:
- A non-pipelined machine.
- A pipelined machine with scoreboarding with one multiplier and one adder.
Tomasulo's algorithm with one multiplier and one adder.
Note: For all machine models, use the basic instruction cycle as follows:
Fetch (one clock cycle)
Decode (one clock cycle)
Execute (MUL takes 6, ADD takes 4 clock cycles)
Write-back (one clock cycle)
Do not forget to list any assumptions you make about the pipeline
structure (e.g., data forwarding between pipeline stages).