EE 460N - Study Questions

Department of Electrical and Computer Engineering

The University of Texas at Austin

EE 460N, Fall 2014
Study Questions
Due date: Nov. 19
Yale N. Patt, Instructor
Stephen Pruett, Emily Bragg, Siavash Zangeneh, TAs

A computer has an 8KB write-through cache. Each cache block is 64 bits, the cache is 4-way set associative and uses a victim/next-victim pair of bits for each block for its replacement policy. Assume a 24-bit address space and byte-addressable memory. How big (in bits) is the tag store?
An LC-3b system ships with a two-way set associative, write back cache with perfect LRU replacement. The tag store requires a total of 4352 bits of storage. What is the block size of the cache? Please show all your work.

Hint: 4352 = 2¹² + 2⁸.
Based on Hamacher et al., p. 255, question 5.18. You are working with a computer that has a first level cache that we call L1 and a second level cache that we call L2. Use the following information to answer the questions.
- The L1 hit rate is 0.95 for instruction references and 0.90 for data references.
- The L2 hit rate is 0.85 for instruction references and 0.75 for data references.
- 30% of all instructions are loads and stores.
- The size of each cache block is 8 words.
- The time needed to access a cache block in L1 is 1 cycle and the time needed to access a cache block in L2 is 6 cycles.
- The accesses to the caches and memory are done sequentially. If there is a miss in the L1 and a hit in the L2 then the total latency is 7 cycles.
- Memory is accessed only if there is a miss in both caches.
- The width of the memory bus is one word.
- It takes one clock cycle to send an address to main memory.
- It takes 20 cycles to access the main memory.
- It takes one cycle to send one word from the memory to the processor. Thus the total latency to get a word from memory to the processor is 22 cycles.
- The bus allows sending a new address to memory in the same cycle that data is sent from memory to the processor.
- Assume the data is accessible to the processor only AFTER the whole cache block has been brought in from the memory, and buffered on the processor chip. The processor can then access the data independent of and during the cache fill.
1. What is the average access time per instruction (assume no interleaving)?
2. What is the average access time per instruction if the main memory is 4-way interleaved?
3. What is the average access time per instruction if the main memory is 8-way interleaved?
4. What is the improvement obtained with interleaving?
Hamacher, pg.255, question 5.13. A byte-addressable computer has a small data cache capable of holding eight 32-bit words. Each cache block consists of one 32-bit word. When a given program is executed, the processor reads data from the following sequence of hex addresses:
```
200, 204, 208, 20C, 2F4, 2F0, 200, 204, 218, 21C, 24C, 2F4
```
This pattern is repeated four times.
1. Show the contents of the cache at the end of each pass throughout this loop if a direct-mapped cache is used. Compute the hit rate for this example. Assume that the cache is initially empty.
2. Repeat part (a) for a fully-associative cache that uses the LRU-replacement algorithm.
3. Repeat part (a) for a four-way set-associative cache that uses the LRU replacement algorithm.

Below, we have given you four different sequences of addresses generated by a program running on a processor with a data cache. Cache hit ratio for each sequence is also shown below. Assuming that the cache is initially empty at the beginning of each sequence, find out the following parameters of the processor's data cache:

Associativity (1, 2, or 4 ways)
Block size (1, 2, 4, 8, 16, or 32 bytes)
Total cache size (256B, or 512B)
Replacement policy (LRU or FIFO)

Assumptions: all memory accesses are one byte accesses. All addresses are byte addresses.

Address traces
Number	Address Sequence	Hit Ratio
1	0, 2, 4, 8, 16, 32	0.33
2	0, 512, 1024, 1536, 2048, 1536, 1024, 512, 0	0.33
3	0, 64, 128, 256, 512, 256, 128, 64, 0	0.33
4	0, 512, 1024, 0, 1536, 0, 2048, 512	0.25

Postponed to problem set 6

Determine the decimal value of the following IEEE floating point numbers.
1. 1 10000000 10100000000000000000000
2. 0 00000000 01010000000000000000000
3. 1 11111111 00000000000000000000000
Using a residue number system with two moduli, represent all of the decimal values between 0 and 11 inclusive when the moduli are
1. 4 and 3
2. 6 and 2
Consider the following format for transmitting a byte of data using the ECC mechanism described in class (Hamming Code).
```
D7 D6 D5 D4 P8 D3 D2 D1 P4 D0 P2 P1
```
For the following bit pattern, indicate whether there is no error or a single error. If there is a single error, list the corrected bit pattern.
- 110010010111
In general, what happens when there are two bit errors with this mechanism?
Using the Booth Multiplication Algorithm, multiply the two unsigned 10-bit numbers 0011011110 and 0001110010. Show the intermediate results after each step.
Postponed to problem set 6

From Tanenbaum, 4th edition, Appendix B, 4.

The following binary floating-point number consists of a sign bit, an excess 63, radix 2 exponent, and a 16-bit fraction. Express the value of this number as a decimal number.

0 0111111 0000001111111111
Postponed to problem set 6

From Tanenbaum, 4th edition, Appendix B, 5.

To add two floating point numbers, you must adjust the exponents (by shifting the fraction) to make them the same. Then you can add the fractions and normalize the result, if need be. Add the single precision IEEE floating-point numbers 3EE00000H and 3D800000H and express the normalized result in hexadecimal. ['H' is a notation indicating these numbers are in hexadecimal]
Postponed to problem set 6

From Tanenbaum, 4th edition, Appendix B, 6.

The Tightwad Computer Company has decided to come out with a machine having 16-bit floating-point numbers. The model 0.001 has a floating-point format with a sign bit, 7-bit, excess 63 exponent and 8-bit fraction. Model 0.002 has a sign bit, 5-bit, excess 15 exponent and a 10-bit fraction. Both use radix 2 exponentiation. What are the smallest and largest positive normalized numbers on both models? About how many decimal digits of precision does each have? Would you buy either one?

Postponed to problem set 6

The following numbers are represented exactly with a 9-bit floating point representation, in the format of the IEEE Floating Point standard:

-infinity, -1, 0, 5/16, 19.5, 48.

How many bits are needed for the fraction?
What is the bias?

Write each number in in the 9-bit floating point representation, below:

Value	Representation
48
19.5
5/16
0
-1
-infinity