## Department of Electrical and Computer Engineering

### The University of Texas at Austin

EE 360N, Fall 2004
Problem Set 4
Due: 18 October 2004, before class
Yale N. Patt, Instructor
Aater Suleman, Huzefa Sanjeliwala, Dam Sunwoo, TAs

Instructions:
You are encouraged to work on the problem set in groups and turn in one problem set for the entire group. Remember to put all your names on the solution sheet. Also remember to put the name of the TA in whose discussion section you would like the problem set returned to you.

1. The virtual address of variable x is x3456789A. Using the VAX's virtual memory architecture, find the physical address of x.
```Remember that in VAX each Vitual Adress consists of

2 bits to specify the Address Space
21 bits to specify Virtual Page Number
9 bits to spcify the byte on the page
```

You will need to know the contents of P0BR: x8AC40000 and SBR: x000C8000.

You will also need to know the contents of the following physical memory locations:

x1EBA6EF0:    x80000A72
x0022D958:    x800F5D37
• What virtual page of P0 Space is x on?
• What is VA of the PTE of the page containing x?
• What virtual page of System Space is this PTE on?
• What is the PA of the PTE of this page of System Space?
• What is the PA of the PTE of the page containing x?

2.
3. (Hamacher, pg.255, question 5.13)  A byte-addressable computer has a small data cache capable of holding eight 32-bit words. Each cache block consists of one 32-bit word. When a given program is executed, the processor reads data from the following sequence of hex addresses:
`     200, 204, 208, 20C, 2F4, 2F0, 200, 204, 218, 21C, 24C, 2F4`
This pattern is repeated four times.
a. Show the contents of the cache at the end of each pass throughout this loop if a direct-mapped cache is used. Compute the hit rate for this example. Assume that the cache is initially empty.

b. Repeat part (a) for a fully-associative cache that uses the LRU-replacement algorithm.

c. Repeat part (a) for a four-way set-associative cache that uses the LRU replacement algorithm.

4.

5. A computer has an 8KB write-through cache. Each cache block is 64 bits, the cache is 4-way set associative and uses a victim/next-victim pair of bits in each block for its replacement policy. Assume a 24-bit address space and byte-addresable memory. How big (in bits) is the tag store?
6. An LC-3b system ships with a two-way set associative, write back cache with perfect LRU replacement. The tag store requires a total of 4352 bits of storage. What is the block size of the cache? Please show all your work.

Hint: 4352 = 2^12 + 2^8

7. (Hamacher et al., p. 255, question 5.18) You are working with a computer that has a primary and secondary cache. The cache block consists of 8 words. Make the following assumptions:
• The hit rate is the same for both the caches: 0.95 for instructions and 0.90 for data.
• The times needed to access an 8-word block in these caches are C1 = 1 cycle and C2 = 10 cycles.
• It takes one clock cycle to send an address to main memory.
• It takes one clock cycle to send one word from the memory to the cache.
• 30 percent of all instructions are loads and stores.
• It takes the time needed to access the cache to know if an access was a hit or miss. Also, start the memory access once you know that you miss in the two caches.
• The accesses to the caches and memory are done serially. So if you miss in the L1, and hit in the L2, it takes 11 cycles.
• When you receive data from memory, you can access the data as the writes to the caches are done in parallel.

1. What is the average access time experienced by the CPU if the main memory uses 4 way interleaving? Assume that the memory is built with DRAM chips that allow the first word to be accessed in 8 cycles, but subsequent words of the block are in 4 clock cycles per word. Also, one clock cycle is required to send one word to the cache.

2. What is the average access time if the main memory is not interleaved?

3. What is the improvement obtained with interleaving?

8. Below, we have given you four different sequences of addresses generated by a program running on a processor with a data cache. Cache hit ratio for each sequence is also shown below. Assuming that the cache is initially empty at the beginning of each sequence, find out the following parameters of the processor's data cache:
* Associativity (1, 2, or 4 ways)
* Block size (1, 2, 4, 8, 16, or 32 bytes)
* Total cache size (256B, or 512B)
* Replacement policy (LRU or FIFO)
Assumptions:All memory accesses are one byte accessses. All addresses are byte addresses.
sequence 1:  0, 2, 4, 8, 16, 32                                                    hit ratio: 0.33
sequence 2:  0, 512, 1024, 1536, 2048, 1536, 1024, 512, 0      hit ratio: 0.33
sequence 3:  0, 64, 128, 256, 512, 256, 128, 64, 0                    hit ratio: 0.33
sequence 4:  0, 512, 1024, 0, 1536, 0, 2048, 512                      hit ratio: 0.25

9. AND, for those who cannot get enough of this stuff, the following problem:
(Note: You can get full credit without doing this problem)

You will be given a cache simulator (just the executable) with a hard-coded configuration. Your job is to determine the configuration of the cache. The simulator takes a trace of memory addresses as input and provides a hit ratio as output. Find the following:

* Associativity (1, 2, 4, or 8 ways)
* Block size (1, 2, 4, 8, 16, or 32 bytes)
* Total cache size (256B, 512B, or 1024B)
* Replacement policy (LRU or Pseudo-LRU)
Show the traces you used to determine each parameter of the cache. Assumptions:All memory accesses are one byte accessses. All addresses are byte addresses.
Simulator
The syntax for running the program is:
%./cachesim   <trace.txt>
The traces are just text files with one integer memory address per line. For example, the following trace would cause conflict misses in a direct-mapped, 256B cache:
0
256
512
0
256
Simulator for Linux
Simulator for Solaris