Department of Electrical and Computer Engineering

The University of Texas at Austin

EE 360N, Spring 2007
Problem Set 4
Due: 2 April 2007, before class
Yale N. Patt, Instructor
Chang Joo Lee, Rustam Miftakhutdinov, Poorna Samanta, TAs

Instructions

You are encouraged to work on the problem set in groups and turn in one problem set for the entire group. Remember to put all your names on the solution sheet. Also remember to put the name of the TA in whose discussion section you would like the problem set returned to you.

Questions

  1. Explain the differences between exceptions and interrupts. Be concise in your explanations.

    Explain the similarities of exceptions and interrupts. Clearly describe the steps required to handle an exception or an interrupt.

  2. A computer has an 8KB write-through cache. Each cache block is 64 bits, the cache is 4-way set associative and uses a victim/next-victim pair of bits in each block for its replacement policy. Assume a 24-bit address space and byte-addressable memory. How big (in bits) is the tag store?

  3. An LC-3b system ships with a two-way set associative, write back cache with perfect LRU replacement. The tag store requires a total of 4352 bits of storage. What is the block size of the cache? Please show all your work.

    Hint: 4352 = 212 + 28.

  4. Based on Hamacher et al., p. 255, question 5.18. You are working with a computer that has a first level cache that we call L1 and a second level cache that we call L2. Use the following information to answer the questions.

    1. What is the average access time per instruction?
    2. What is the average access time per instruction if the main memory is 4-way interleaved?
    3. What is the improvement obtained with interleaving?
  5. Hamacher, pg.255, question 5.13. A byte-addressable computer has a small data cache capable of holding eight 32-bit words. Each cache block consists of one 32-bit word. When a given program is executed, the processor reads data from the following sequence of hex addresses:

    200, 204, 208, 20C, 2F4, 2F0, 200, 204, 218, 21C, 24C, 2F4

    This pattern is repeated four times.

    1. Show the contents of the cache at the end of each pass throughout this loop if a direct-mapped cache is used. Compute the hit rate for this example. Assume that the cache is initially empty.

    2. Repeat part (a) for a fully-associative cache that uses the LRU-replacement algorithm.

    3. Repeat part (a) for a four-way set-associative cache that uses the LRU replacement algorithm.

  6. In class, we discussed two types of busses: “pending bus” and “split transaction bus”. What is the advantage of a split-transaction bus over a pending bus?

  7. In class, we discussed the asynchronous finite state machine for the device controller of an input-output device within the context of a priority arbitration system. Draw the state diagram for this device controller (as drawn in lecture), identify the input and output signals, and briefly explain the function of each input and output signal.

    As mentioned in class, the finite state machine has some race conditions. Identify the race conditions and show what simple modifications can be made to eliminate them.

  8. In class we discussed asynchronous buses with central arbitration. Our job in this problem is to design the state machine for a synchronous bus using distributed arbitration. Recall that with distributed arbitration, each device receives the Bus Request signals from all other devices, and determines whether or not it is the next Bus Master. Assume all bus transactions take exactly one cycle, and that no device may be the Bus Master for two consecutive cycles.

    Assume four devices, having priorities 1, 2, 3, and 4 respectively. Their respective controllers request the bus via asserting BR1, BR2, BR3, and BR4 respectively. Priority 4 is the highest priority.

    1. Show the interconnections required for distributed arbitration for the four devices and their controllers connected to the bus. Be sure to label each signal line and designate by arrows whether the signals are input or output with respect to the device.

    2. Is it possible for starvation to occur in this configuration? Describe the situation where this can occur.

    3. Assume each I/O Controller is implemented using a clocked finite state machine. Draw a Moore model state machine for the controller operating at priority level 2. Label each state clearly. Label all necesary inputs and outputs. You do not need to show the clock signal on the state machine diagram. State transitions are synchronized to the clock.

  9. Below, we have given you four different sequences of addresses generated by a program running on a processor with a data cache. Cache hit ratio for each sequence is also shown below. Assuming that the cache is initially empty at the beginning of each sequence, find out the following parameters of the processor's data cache:

    Assumptions: all memory accesses are one byte accesses. All addresses are byte addresses.

    Address traces
    NumberAddress SequenceHit Ratio
    10, 2, 4, 8, 16, 320.33
    20, 512, 1024, 1536, 2048, 1536, 1024, 512, 00.33
    30, 64, 128, 256, 512, 256, 128, 64, 00.33
    40, 512, 1024, 0, 1536, 0, 2048, 5120.25

Additional Question

The following problem is meant to help you understand the cache structure. This problems does not need to be turned in.

  1. You will be given a cache simulator (just the executable) with a hard-coded configuration. Your job is to determine the configuration of the cache. The simulator takes a trace of memory addresses as input and provides a hit ratio as output. Find the following:

    Show the traces you used to determine each parameter of the cache. Assumptions: all memory accesses are one byte accesses. All addresses are byte addresses.

    Simulator

    The syntax for running the program is:

    ./cachesim   <trace.txt>

    The traces are just text files with one integer memory address per line. For example, the following trace would cause conflict misses in a direct-mapped, 256B cache:

    0
    256
    512
    0
    256

    Links to the simulator:

    After downloading the file, please do chmod 700 cachesim.linux or chmod 700 cachesim.solaris.