Department of Electrical and Computer Engineering

The University of Texas at Austin

EE 460N, Fall 2014
Problem Set 5 Solution
Yale N. Patt, Instructor
Stephen Pruett, Emily Bragg, Siavash Zaganeh, TAs
  1. An 8KB cache size with a 8B line size, in a 4-way set associative cache means there are 8KB ÷ (4 × 8B) = 256 sets in the cache.

    Since there are 256 or 28 sets, 8 bits are required to index into the correct set. Since there are 8B or 23 bytes in a cache line, 3 bits are required to find the correct byte within a block. Given a 24-bit address space, this leaves 24 - 8 - 3 = 13 bits left over for the tag store. Additionally, the tag store must hold 2 bits for the V/NV replacement policy and 1 valid bit. This means each cache line must have a 16-bit tag store associated with it. 2B of tag store times (256 × 4) cache lines in the cache means that the tag store, in total takes up 2048 bytes, which is 16384 bits

  2. The size of the tag store is 212 + 28. We know that the size of the tag store can be given as the product of number of sets and number of bits per set.

    We also know that address space is 16-bits. Hence, tag + index + bib = 16

    In order to find bib, we need to find index and tag. The following bits are necessary for each set of the tag store:

    Total number of bits per set is 5 + 2 × tag. An important conclusion that can be drawn is that number of bits per set will always be an odd number.

    We will also use the fact that number of sets is always a power of 2 (since it is indexed using an integer number of bits). Given the size of the tag store, index has to be a number less than 8 (the size of the tag store is indivisible by 29 or greater).

    The only value of index that fits both criterion is 8. Therefore:

    Number of sets = 28 = 256
    Bits per row = 4352 ÷ 256 = 17
    17 = 5 + 2 × tagtag = 6
    6 + 8 + bib = 16 ⇒ bib = 2

    Hence, the cache block size is 4 bytes

  3. average_access_time_per_instruction = 1 × instruction_access_time + 0.3 × data_access_time

    We can use the following equations in both part a and part b to compute the instruction and data access time.

    instruction_access_time = 1 + 0.05 × (6 + 0.15 × mem_latency)
    data_access_time = 1 + 0.10 × (6 + 0.25 × mem_latency)
    1. We need to calculate the memory latency. Since each cache block is 8 words, it will take 8 accesses to get a cache block from memory.

          
      (-)(...)(-)
              (-)(...)(-)
                      (-)(...)(-)
                              (-)(...)(-)
                                      (-)(...)(-)
                                              (-)(...)(-)
                                                      (-)(...)(-)
                                                              (-)(...)(-)
      
      |<---------------------- 169 cycles ----------------------------->|
      

      Using the equations, we get 2.5675 cycles for instruction_access_time and 5.825 for data_access_time.

      The average latency is 2.5675 + 0.3 × 5.825 = 4.315 cycles

    2. We again need to compute the memory latency:

      (-)(.............)(-)
         (-)(.............)(-)
            (-)(.............)(-)
               (-)(.............)(-)
                        (-)(...........)(-)
                           (-)(...........)(-)
                              (-)(...........)(-)
                                 (-)(...........)(-)
      
      |<--------21-----><1><-----20----><---- 4 --->|
      

      So the mem_latency is 46 cycles. Again using the equations, the instruction_access_time is 1.645 and data_acces_time is 2.75.

      The average latency is 1.645 + 2.75 × 0.3 = 2.47 cycles

    3. We need to compute memory latency for 8-way interleaving

      (-)(.............)(-)
         (-)(.............)(-)
            (-)(.............)(-)
               (-)(.............)(-)
                  (-)(...........)(-)
                     (-)(...........)(-)
                        (-)(...........)(-)
                           (-)(...........)(-)
      |--------21-------|--------8-----------|
      

      So memory latency is 29 cycles. Using the equations, we get the instruction_access_time as 1.517 and data_acces_time as 2.325.

      The average latency is 1.517 + .3 * 2.325 = 2.215

    4. The improvement is (4.315 - 2.47) ÷ 4.315 = 0.4276 Therefore, the average latency improves by 42.76%

  4. We assume that the address is 12 bits (we could also have assumed a 10-bit address)
    1. The address is divided into 3 portions:
      • address[1:0] (2 bits) for the byte in the block
      • address [4:2] (3 bits) for the cache index
      • address[11:5] (7 bits) for the tag

      The contents of the cache at the end of each pass are the same. They are shown below:

      ValidTagData (addresses are written inside each byte)
      10010 000203202201200
      10010 000207206205204
      10010 00020B20A209208
      10010 01024F24E24D24C
      10010 1112F32F22F12F0
      10010 1112F72F62F52F4
      10010 00021B21A219218
      10010 00021F21E21D21C

      The hit/miss information for each pass is shown below:

      Reference20020420820C2F42F020020421821C24C2F4
      Pass 1:MMMMMMHHMMMH
      Pass 2:HHHMHHHHHHMH
      Pass 3:HHHMHHHHHHMH
      Pass 4:HHHMHHHHHHMH

      Hit rate is 33/48

    2. The address is divided into two: 2 bits for identifying the byte in the block, 10 bits for the tag. No bits are needed for cache index. The following table shows the contents of the cache at the end of each pass (Valid bits and Tags are ignored, they should be obvious. Starting addresses of blocks in the cache are provided):

      PassWay 0 DataWay 1 DataWay 2 DataWay 3 DataWay 4 DataWay 5 DataWay 6 DataWay 7 Data
      120020424C20C2F42F021821C
      220020421C24C2F420C2F0218
      320020421821C2F424C20C2F0
      42002042F02182F421C24C20C

      The hit/miss information for each pass is shown below:

      Reference:20020420820C2F42F020020421821C24C2F4
      Pass 1:MMMMMMHHMMMH
      Pass 2:HHMMHMHHMMMH
      Pass 3:HHMMHMHHMMMH
      Pass 4:HHMMHMHHMMMH

      Hit rate is 21/48

    3. The address is divided into 3 portions: 2 bits for identifying the byte in the block, 1 bit for the cache index, 9 bits for the tag. The contents of the cache at the end of Pass 1:

      Way 0Way 1Way 2Way 3
      VTagDataVTagDataVTagDataVTagData
      10010 0000 0203-20010010 0000 120B-20810010 1111 02F3-2F010010 0001 121B-218
      10010 0000 0207-20410010 0100 124F-24C10010 1111 02F7-2F410010 0001 121F-21C

      The hit/miss information for each pass is shown below:

      Reference:20020420820C2F42F020020421821C24C2F4
      Pass 1:MMMMMMHHMMMH
      Pass 2:HHHMHHHHHMMH
      Pass 3:HHHMHHHHHMMH
      Pass 4:HHHMHHHHHMMH

      Contents of the second set of the cache (index equals 1) after pass 2, 3, and 4 (the first set remains the same):

      PassWay 0Way 1Way 2Way 3
      VTagDataVTagDataVTagDataVTagData
      210010 0000 0207-20410010 0001 121F-21C10010 1111 02F7-2F410010 0100 124F-24C
      310010 0000 0207-20410010 0100 124F-24C10010 1111 02F7-2F410010 0001 121F-21C
      410010 0000 0207-20410010 0001 121F-21C10010 1111 02F7-2F410010 0100 124F-24C

      Hit Rate is 30/48

  5. Block Size

    The following table lists hit ratios for different cache block sizes given the access sequence 1 (0, 2, 4, 8, 16, 32):

    Block SizeHit Ratio
    1B0/6
    2B0/6
    4B1/6
    8B2/6
    16B3/6
    32B4/6

    Since the hit ratio is reported as 0.33 for this sequence, the block size must be 8 bytes. Therefore, the accesses look like this:

    Address02481632
    Hit/MissMHHMMM

    Associativity

    Notice that all of the addresses of sequence 2 (0, 512, 1024, 1536, 2048, 1536, 1024, 512, 0) are multiples of 512. This means that they all map to the same set, since the size of the cache can only be 256 or 512 bytes. Therefore, the hit ratio would be 0/9 in case of a direct-mapped cache. In case of a 2-way cache, the accesses would behave as follows:

    Address0512102415362048153610245120
    Hit/MissMMMMMHMMM

    The resulting hit rate would be 1/9. Similarly, for 4-way cache:

    Address0512102415362048153610245120
    Hit/MissMMMMMHHHM

    The resulting hit rate would be 3/9, and thus the cache is 4-way set associative.

    Cache Size

    If the cache size were 256B, there would be 3 index bits and all of the adresses in sequence 3 (0, 64, 128, 256, 512, 256, 128, 64, 0) would map to the same set:

    Address064128256512256128640
    Hit/MissMMMMMHHHM

    The resulting hit ratio would be 3/9, and thus the cache size is 256B. For completeness, here's how the accesses would look like in case of a 512B cache (4 index bits):

    Address064128256512256128640
    Hit/MissMMMMMHHHH

    Replacement Policy

    In case of the FIFO replacement policy, the accesses of sequence 3 (0, 512, 1024, 0, 1536, 0, 2048, 512) would look as follows:

    Address051210240153602048512
    Hit/MissMMMHMHMH

    The resulting hit ration would be 3/8. However, in case of the LRU replacement policy the hit rate would be 0.25:

    051210240153602048512
    MMMHMHMM

    Thus the replacement policy is LRU.

  6. Question Postponed.
  7. 
       | 4  3 | 6  2
    -----------------
     0 | 0  0 | 0  0
     1 | 1  1 | 1  1
     2 | 2  2 | 2  0
     3 | 3  0 | 3  1
     4 | 0  1 | 4  0
     5 | 1  2 | 5  1
     6 | 2  0 | 0  0
     7 | 3  1 | 1  1
     8 | 0  2 | 2  0
     9 | 1  0 | 3  1
    10 | 2  1 | 4  0
    11 | 3  2 | 5  1
    

  8. Single Error. The corrected bit pattern:

    111010010111

    When there are two errors, there will definitely be a parity error. However, consider the case where P0 and P1 are the two errors. If we examine the parity errors with the intention of attempting to correct a single error as in the previous example, then we would erroneously think that the 3rd bit (D0) was in error. Clearly, this is not the case. It is also possible that after computing the parity errors, we determine, for example, that bit 15 is in error (all four parity functions evaluated incorrectly). However, the transmitted data only has 12 bits. Thus, if we see that an error points to bits 13 through 15, we know for sure that two errors occurred. However, as described earlier, it is definitely possible that two errors manifest themselves as a single error in one of the 12 bits transmitted.

    What happens if there are more than two errors?

    More than two errors will manifest itself as a single error, two errors, or possibly even no error at all. This scheme has no way to distinguish more than 2 errors from 2 errors or less.

  9. Multiplicand is 0011011110.
    Cycle Multiplier Temp register
    0
    0001110010
     0000000000----------
    1
    --00011100
     000110111100--------
    2
    ----000111
     00000110111100------
    3
    ------0010
     1111001111011100----
    4
    --------00
     000110001011011100--
    5
    ----------
     00000110001011011100
    The result is 00000110001011011100.
  10. Question Postponed.
  11. Question Postponed.
  12. Question Postponed.
  13. Question Postponed.