Department of Electrical and Computer Engineering

The University of Texas at Austin

EE 360N, Spring 2009
Problem Set 3 Solutions
Yale N. Patt, Instructor
Ramapriyan Chakravarthy, Khubaib, Vivekanand Venugopal,, TAs

Problem Set 3 Solutions

    1. The probability of a single bit flipping is pf = 10-7. Therefore, the probability of a bit remaining correct is pc = (1 - 10-7). The probability that a transmitted nine bit message will have zero flipped bits is pc9 = (1 - 10-7)9. Thus, the probability of at least one of the bits being flipped is 1 - pc9 = 1 - (1 - 10-7)9 ≈ 9 × 10-7.
    2. Parity check logic can detect an odd number of errors: 1, 3, 5, 7, and 9.
    3. Parity check logic cannot detect an even number of errors: 2, 4, 6, 8.
      • There are choose(9, 1) = 9 possible combinations that result in a single flipped bit. Thus, the probability of one bit being flipped is p1 = 9 × pf × pc8 ≈ 9 × 10-7.
      • There are choose(9, 2) = 36 possible combinations that result in two flipped bits. Thus, the probability of two bits being flipped is p2 = 36 × pf2 × pc7 ≈ 3.6 × 10-13.
      • There are choose(9, 3) = 84 possible combinations that result in three flipped bits. Thus, the probability of three bit being flipped is p3 = 84 × pf3 × pc6 ≈ 8.4 × 10-20.
    4. Ignoring the probability of three or more bit errors, the probability of a detected error is just the probability of a single bit error (calculated above). Thus, the rate of detected errors is p1 × 109 ÷ 9 ≈ 100 errors per second.
    5. Similarly, the probability of an undetected error is approximately the probability of a double error. Thus, the rate of undetected errors is p2 × 109 ÷ 9 ≈ 4 × 10-5 corrupt messages per second (or twice as many undetected bit errors, since we assume each undetected corrupt message contains two flipped bits), which is equivalent to one undetected corrupt message about every 7 hours.
      • Byte on bus Addr[1:0]
      • Interleave bits Addr[4:2]
      • Chip address Addr[7:5]
      • Row decode Addr[11:8]
    1. 577 Cycles. The first 8 memory accesses, A[0][0] to A[0][7], must occur sequentially with no overlap since they are all acesses to the same bank. Thus, it would take 80 cycles for the 1st 8 memory accesses, with the 8th access starting in cycle 70. Since the 8th and 9th memory accesses, A[0][7] and A[1][0], respectively, are to different banks, the accesses can overlap, and the 9th access can start in cycle 71 (70 cycles for the 1st 7 accesses plus 1 additional cycle of the 8th access). Continuing with this logic, the access to A[2][0] could start in cycle 142 (71x2). Finally, the access to A[7][0] could start in cycle 497 (71x7). Now all that remains are 8 more memory accesses, all to the same bank (A[7][0] to A[7][7]). This takes another 80 cycles, bringing the total to 577 cycles (497 + 80).

      If the memory were not interleaved, all 64 memory accesses must happen sequentially with no overlap, so it would take a total of 640 cycles (64*10). Therefore, we do gain some benefit from this interleaving scheme, but not that much.

    2. Yes, a change can be made. The new bits are:

      • Byte on bus Addr[1:0]
      • Interleave bits Addr[7:5]
      • Chip address Addr[4:2]
      • Row decode Addr[11:8]

      87 Cycles. With the new interleaving scheme, consecutive memory accesses are to different banks, so the accesses can overlap. The 1st access, A[0][0], would begin at cycle 0, the 2nd, A[0][1], at cycle 1, and so on. The 8th access, A[0][7], would start at cycle 7. However, the 9th access, A[1][0], cannot start at cycle 8. It would have to wait 2 more cycles for the 1st access to finish since it is on the same bank as the 1st access; therefore, it would start at cycle 10. Continuing this logic, the access to A[2][0] would start at cycle 20, and finally, the access to A[7][0] would start at cycle 70. Now, all that is left are 8 accesses, but they are all to different banks so they can start 1 cycle after each other. The access to A[7][1] would begin at cycle 71, A[7][2] at 72, and finally A[7][7], the last memory access, would begin at cycle 77 and, therefore, end at cycle 87.

    3. Only one line of code needs to be changed:

      sum = sum + A[i][j];
      to
      sum = sum + A[j][i];

      Alternatively, you could keep that line the same, but swap the variable (i/j) of the inner and outer loops as shown below.

      Original code:

           for(i = 0; i < 8; ++i){
             for(j = 0; j < 8; ++j){
               sum = sum + A[i][j];
             }
           }

      New code:

           for(j = 0; j < 8; ++j){
             for(i = 0; i < 8; ++i){
               sum = sum + A[i][j];
             }
           }

      87 Cycles, for similar reasons to the explanation provided in part (c).

  1. 3)

    In this problem, we assume that both of the rotators are right rotators. If the rotator for read is a right rotator and the rotator for write is a left rotator, PA[1:0] can be used as the control for both rotators.

    PA[1:0]SIZERD/WR1st/2nd LD.MDR[3:0]ROT[1:0]WE[3:0]
    00BRDX XXX1000000
    00BWRX XXX0000001
    00HRDX XX11000000
    00HWRX XX00000011
    00WRDX 1111000000
    00WWRX 0000001111
    01BRDX XXX1010000
    01BWRX XXX0110010
    01HRDX XX11010000
    01HWRX XX00110110
    01WRD1st X111010000
    01WRD2nd 1000010000
    01WWR1st 0000111110
    01WWR2nd 0000110001
    10BRDX XXX1100000
    10BWRX XXX0100100
    10HRDX XX11100000
    10HWRX XX00101100
    10WRD1st XX11100000
    10WRD2nd 1100100000
    10WWR1st 0000101100
    10WWR2nd 0000100011
    11BRDX XXX1110000
    11BWRX XXX0011000
    11HRD1st XXX1110000
    11HRD2nd XX10110000
    11HWR1st XX00011000
    11HWR2nd XX00010001
    11WRD1st XXX1110000
    11WRD2nd 1110110000
    11WWR1st 0000011000
    11WWR2nd 0000010111
    Legend
    B(yte)00
    H(alf word)01
    W(ord)10
    RD(read)0
    WR(write)1
    1st0
    2nd1
  • 4)

    Interleave into 64 banks in order to hide the latency in sequential accesses (note, minimum needed is 37 banks but one would really prefer to use a power of 2, therefore 64).

  • 5)  
    ReferenceTLB hitPage Fault
    0X
    13X
    5
    2
    14X
    14X
    13
    6X
    6X
    13X
    15X
    14
    15X
    13X
    4X
    3X

    TLB hit rate = 7/16.

    TLB contains entries for pages 3, 4, and 13.

    Solutions for the final contents of the frames of physical memory may differ slightly depending on what order the initially empty frames were allocated; however, no page should appear in more than one frame. Possible answers are shown below.

    Frame 0Page 14 (or 6 or 15)
    Frame 1Page 13
    Frame 2Page 3
    Frame 3Page 2
    Frame 4Page 6 (or 14 or 15)
    Frame 5Page 4
    Frame 6Page 15 (or 6 or 14)
    Frame 7Page Table
  • 6)

    Size of a page is 512 bytes.

    Number of bits of address required to calculate the offset within a page is 9.

    Number of frames in physical memory is (8K bytes) ÷ (512 bytes) = 213 ÷ 29 = 24.

    Size of PFN is 4 bits.

    Size of PTE equals 1 (Valid) + 1 (Modified) + 2 (access control) + 4 (PFN) = 8 bits = 1 byte.

    Number of virtual pages in System Space is (3 × 212) ÷ (29) = 24 pages.

    Size of System Page Table is 24 × 1 byte = 24 bytes = 24 × 8 bits = 192 bits.

  • 7)

    We can determine from the given information that:

    The breakdown for a Virtual Address (VA) must be:

    The breakdown for a Physical Address (PA) must be:

    1. Answer: x825C

      Given the VAX 2-level translation scheme, we know that this VA must be in system space. Therefore, the top 2 bits of the VA (VA[15:14]) must be 10 (in binary). We also know that the offset of the VA is the same as the offset of the PA, so the bottom 8 bits of the VA (VA[7:0]) must be x5C (01011100 in binary). Now, all we have to figure out is the 6 bit Virtual Page Number (VPN). It was given that the 1st access to physical memory (let's call this PA1) was at location x108. Once again, given the VAX 2-level translation scheme, we know that PA1 = SBR + (size of PTE in bytes) × VPN. Solving this equation for VPN we get VPN = (PA1 - SBR) ÷ (size of PTE in bytes).

      It was given that PA1 is x108, SBR is x100, and the size of a PTE is 4 bytes. Therefore, the VPN is x2 (000010 in binary). The complete VA is therefore 10 000010 01011100 (in binary) which is x825C.

    2. Answer: x80000009

      The contents of physical address x45C (the 2nd access to physical memory) is a PTE. We know that a PTE is a 32 bit value that consists of 1 valid bit (PTE[31]), and a 4-bit PFN (PTE[3:0]). All other bits of the PTE (PTE[30:4]) are 0. The contents of this PTE are used to form the address of the 3rd access to physical memory (x942). Therefore, the PFN bits of the PTE must be x9, and the valid bit must be 1. This implies that the PTE is x80000009.

    3. Answer x0342

      First, we must determine if this VA is in P0 space or P1 space. To determine this, we have to figure out if P0BR or P1BR was used to compute the virtual address x825C (the answer to part a). It was given that P0BR is x8250, and that P1BR is x8350. Since x8350 is greater than x825C, we know that we could not have used P1BR to compute x825C, and therefore we must have used P0BR which means the VA is in P0 space. Therefore, the top 2 bits of the virtual address (VA[15:14]) must be 00 (in binary). We also know that the offset of the VA is the same as the offset of the PA, so the bottom 8 bits of the VA (VA[7:0]) must be x42 (01000010 in binary). Now, all we have to figure out is the 6 bit Virtual Page Number (VPN). Once again, given the VAX 2-level translation scheme, we know that x825C = P0BR + ((size of PTE in bytes) × VPN ).

      Solving this equation for VPN we get VPN = (x825C - P0BR) ÷ (size of PTE in bytes).

      It was given that P0BR is x8250, and the size of a PTE is 4 bytes. Therefore, the VPN is x3 (000011 in binary). The complete VA is therefore 00 000011 01000010 (in binary) which is x0342.

  • 8)
    1. # virtual pages = virtual address space ÷ size of page = 212 Bytes ÷ 25 Bytes/page = 27 pages
    2. # physical frames = physical address space ÷ size of frame = 29Bytes ÷ 25 Bytes/frame = 24 frames. Therefore, 4 bits are needed to specify the PFN.

      Size of PTE = Valid bit + Modified bit + access control bits + PFN bits = 1 + 1 + 2 + 4 = 8 bits (1 Byte)

    3. User space = (3/4) × Virtual address space = (3/4) × 27 pages = 3 × 25 pages. Each page of user space will have a PTE in the user space page table.

      Size of user page table is # of entries × size of PTE = (3 × 25 entries) × 1 Byte/entry = 96 Bytes.

      # of pages = 96 Bytes ÷ 25 Bytes/page = 3 pages.

    4. We'll use the prefix “b” to indicate a binary number in this solution. Also, for clarity, we will call the virtual address x7AC the virtual address of X (VA_X).

      Virtual address VA_X = x7AC
      The three parts of this virtual address are:
      VA_X[11:10]: b01 (indicates that this is an address in user space)
      VA_X[11:5] (7 bits): Virtual Page Number = b0111101
      VA_X[4:0] Offset within page: b01100
      • X is on page x03D of user space
      • VA of the PTE of the page containing x is VA_PTE_X = PTBR + (x03D × 1) = x380 + x03D = x3BD.
      • Virtual page of System Space of PTE is VA_PTE_X[11:5] = x01D.
      • PA of the PTE of this page of System Space is PA_PTE_PTE = SBR + VA_PTE_X[11:5] × 1 = x1E0 + x01D = x1FD.
      • The PTE of this page of System Space is:
        PTE_PTE_X = Memory[x1FD] = xB8
        PFN_PTE_X = PTE_PTE_X[3:0] = x8
        PA of the PTE of the page containing X:
        PA_PTE_X = PFN_PTE_X concatenated with VA_PTE_X[4:0] = x11D
        PTE_X = Memory[x11D] = x83
        PFN_X = PTE_X[3:0] = x3
        PA_X = PFN_X concantenated with VA_X[4:0] = x06C
  • 9)

    We'll use the prefix “b” to indicate a binary number in this solution.

    Virtual address VA_X = x3456789A
    The three parts of this virtual address are:
    VA_X[31:30]: b00 (indicates that this is an address in P0 space)
    VA_X[29:9] (21 bits): Virtual Page Number = b 11 0100 0101 0110 0111 100 (x1A2B3C)
    VA_X[8:0] Offset within page: b010011010
  • 10)

    Including instruction fetch, every instruction can generate a page fault. Ignoring instruction fetch, LDB, LDW, STB, STW, TRAP, RTI can generate a page fault (If the trap vector table or system stack is always in physical memory, then the TRAP or RTI won't generate a page fault).

    Including the instruction fetch, RTI can generate the maximum number of page faults (3) and LDB, LDW, STB, STW, TRAP can generate the maximum number of page faults (2). (Points will not be detected for RTI)