Department of Electrical and Computer Engineering

The University of Texas at Austin

EE 460N, Spring 2015
Problem Set 2
Due: February 23, before class
Yale N. Patt, Instructor
Ben Lin, Kishore Punniyamurthy, Will Hoenig, TAs

Instructions: You are encouraged to work on the problem set in groups and turn in one problem set for the entire group. Remember to put all your names on the solution sheet. Also remember to put the name of the TA in whose discussion section you would like the problem set returned to you.

  1. The following program computes the square (k*k) of a positive integer k, stored in location 0x4000 and stores the result in location 0x4002. The result is to be treated as a 16-bit unsigned number.


            .ORIG X3000
            AND R0, R0, #0
            LEA R3, NUM
            LDW R3, R3, #0
            LDW R1, R3, #0        
            ADD R2, R1, #0
    LOOP    ADD R0, R0, R1
            ADD R2, R2, #-1
            BRP LOOP          
            STW R0, R3, #1
    NUM     .FILL x4000
    1. How many cycles does each instruction take to execute on the LC-3b microarchitecture described in Appendix C?
    2. How many cycles does the entire program take to execute? (answer in terms of k)
    3. What is the maximum value of k for which this program still works correctly? Note: Treat the input and output values as 16-bit unsigned values for part c. We will extend the problem to 2's complement values in part d.
    4. How will you modify this program to support negative values of k? Explain in less than 30 words.
    5. What is the new range of k?
    1. In which state(s) in the LC-3b state diagram should the LD.BEN signal be asserted? Is there a way for the LC-3b to work correctly without the LD.BEN signal? Explain.
    2. Suppose we want to get rid of the BEN register altogether. Can this be done? If so, explain how. If not, why not? Is it a good idea? Explain.
    3. Suppose we took this further and wanted to get rid of state 0. We can do this by modifying the microsequencer, as shown in the figure below. What is the 4-bit signal denoted as A in the figure? What is the 1-bit signal denoted as B?
    The modified microsequencer logic diagram
  2. We wish to use the unused opcode “1010” to implement a new instruction ADDM, which (similar to an IA-32 instruction) adds the contents of a memory location to either the contents of a register or an immediate value and stores the result into a register. The specification of this instruction is as follows:

    Assembler Formats

    ADDM DR, SR1, SR2
    ADDM DR, SR1, imm5


    ADDM instruction encoding


    if (bit[5] == 0)
        DR = Memory[SR1] + SR2;
        DR = Memory[SR1] + SEXT(imm5);
    1. We show below an addition to the state diagram necessary to implement ADDM. Using the notation of the LC-3b State Diagram, describe inside each “bubble” what happens in each state, and assign each state an appropriate state number (state A has been done for you). Also, what is the one-bit signal denoted as X in the figure? Note: Be sure your solution works when the same register is used for both sources and the destination (eg., ADDM R1, R1, R1).

      • Hint: states 24 26, 34, and 36-63 in the control store are available
      • Hint: to make ADDM work when the same register is used for both sources and destination, you will need to change the datapath. Part b asks you to show the necessary changes to the datapath

      Additional (blank) state sequence for the ADDM instruction
    2. Add to the Data Path any additional structures and any additional control signals needed to implement ADDM. Label the additional control signals ECS 1 (for “extra control signal 1”), ECS 2, etc.

    3. The processing in each state A,B,C,D is controlled by asserting or negating each control signal. Enter a 1 or a 0 as appropriate for the microinstructions corresponding to states A,B,C,D.

      • Clarification: for ease of grading, only fill in the control values that are non-zero; entries you leave blank will be assumed to be 0 when we grade
      • Clarification: for the encoding of the control signals, see table C.1 of Appendix C. For each control signal, assume that the 1st signal value in the list is encoded as 0, the the 2nd value encoded as a 1, etc.

    Four empty LC-3b microinstructions
  3. The Address Control Logic in the LC-3b datapath of Figure C.3 in Appendix C allows the LC-3b to support memory-mapped I/O. There are three inputs to this logic:

    The logic has five outputs:

    Your task is to draw the truth table for this Address Control Logic. Mark don't care values with “X” in your truth table. Use the conventions described above to denote the values of inputs and outputs. Please read Section C.6 in Appendix C on memory-mapped I/O before answering this question. Also, refer to table A.3 of Appendix A to find out the addresses of device registers.

  4. The LC-3b state diagram handed out in class contained errors in states 4, 20, and 21. We have posted both versions of the handout: wrong and corrected. Briefly explain the problem we have corrected.

  5. Answer the following short questions:

    1. A memory's addressability is 64 bits. What does that tell you about the sizes of the MAR and the MDR?

    2. We want to increase the number of registers that we can specify in the LC-3b ADD instruction to 32. Do you see any problem with that? Explain.

  6. Consider the following piece of code:

         for(i = 0; i < 8; ++i){
           for(j = 0; j < 8; ++j){
             sum = sum + A[i][j];

    The figure below shows an 8-way interleaved, byte-addressable memory. The total size of the memory is 4KB. The elements of the 2-dimensional array, A, are 4-bytes in length and are stored in the memory in column-major order (i.e., columns of A are stored in consecutive memory locations) as shown. The width of the bus is 32 bits, and each memory access takes 10 cycles.

    A more detailed picture of the memory chips in Rank 0 of Bank 0 is shown below.

    1. Since the address space of the memory is 4KB, 12 bits are needed to uniquely identify each memory location, i.e., Addr[11:0]. Specify which bits of the address will be used for:

      • Byte on bus
      • Interleave bits
      • Chip address
      • Rank bits
    2. How many cycles are spent accessing memory during the execution of the above code? Compare this with the number of memory access cycles it would take if the memory were not interleaved (i.e., a single 4-byte wide array).

    3. Can any change be made to the current interleaving scheme to optimize the number of cycles spent accessing memory? If yes, which bits of the address will be used to specify the byte on bus, interleaving, etc. (use the same format as in part a)? With the new interleaving scheme, how many cycles are spent accessing memory? Remember that the elements of A will still be stored in column-major order.

    4. Using the original interleaving scheme, what small changes can be made to the piece of code to optimize the number of cycles spent accessing memory? How many cycles are spent accessing memory using the modified code?

  7. The figure below illustrates the logic and memory to support 512 MB (byte addressable) of physical memory, supporting unaligned accesses. The ISA contains LDByte, LDHalfWord, LDWord, STByte , STHalfWord and STWord instructions, where a word is 32 bits. Bit 28 serves as a chip enable (active high). If this bit is high the data of the memory is loaded on the bus, otherwise the output of the memory chip floats (tri-stated).

    Note: the byte rotators in the figure are right rotators.

    Construct the truth table to implement the LOGIC block, having inputs SIZE, R/W, 1st or 2nd access, PHYS_ADDR[1:0] and the outputs shown in the above figure. Assume that the value of SIZE can be Byte (00), HalfWord (01), and Word (10). Clearly explain what function each output serves.

    For stores, you can assume that the data to be stored is already loaded into MDR prior to the 1st access.
  8. If the latency of a DRAM memory bank is 37 cycles, into how many banks would you interleave this memory in order to fully hide this latency when making sequential memory accesses?

  9. A bit-serial transmitter-receiver system operates at 1 GHz. It appends an even parity bit after every eight bits processed, as described in class. Therefore, for each byte of data, it transmits a nine bit message. Since the system is bit-serial, one message bit is transmitted each cycle. Assume that the probability of a bit being flipped while it is being transmitted is 10-7. In your solution, treat bit flips as statistically independent events.

    1. What is the probability that a transmitted nine bit message will have one or more flipped bits? Hint: what is the probability that a transmitted nine bit message will have zero flipped bits?

    2. If the parity check logic in the receiver detects an error in a message, how many bits may have been flipped in that message? (1, 2, 3, 4, 5, 6, 7, 8, 9 – circle all that apply).

    3. If the parity check logic in the receiver does not detect an error in a message, how many bits may have been flipped in that message? (1, 2, 3, 4, 5, 6, 7, 8, 9 – circle all that apply).

    4. What is the probability that a transmitted nine bit message will have exactly:

      • 1 bit flipped
      • 2 bits flipped
      • 3 bits flipped

      Notice that the probability of exactly three bits being flipped is negligible compared to the probabilities of one or two bits being flipped. Thus, for the rest of this problem, you may neglect the probabilities of three or more bits being flipped in one message.

    5. On average, how many detected bit errors per second will occur in the system?
    6. On average, how many undetected bit errors per second will occur in the system?

    Note: this course is not about probability theory, and the undergraduate probability course (EE351K) is not a prerequisite. Thus, if you have difficulty solving this problem, please see one of the TA's.

  10. Problem 11 has been postponsed to problem set 3

    An ISA supports an 8-bit, byte-addressable virtual address space. The corresponding physical memory has only 128 bytes. Each page contains 16 bytes. A simple, one-level translation scheme is used and the page table resides in physical memory. The initial contents of the frames of physical memory are shown below.

    Frame NumberFrame Contents
    0 empty
    1 Page 13
    2 Page 5
    3 Page 2
    4 empty
    5 Page 0
    6 empty
    7 Page Table

    A three-entry Translation Lookaside Buffer that uses LRU replacement is added to this system. Initially, this TLB contains the entries for pages 0, 2, and 13. For the following sequence of references, put a circle around those that generate a TLB hit and put a rectangle around those that generate a page fault. What is the hit rate of the TLB for this sequence of references? (Note: LRU policy is used to select pages for replacement in physical memory.)

    References (to pages): 0, 13, 5, 2, 14, 14, 13, 6, 6, 13, 15, 14, 15, 13, 4, 3.

    1. At the end of this sequence, what three entries are contained in the TLB?
    2. What are the contents of the 8 physical frames?
  11. We have been referring to the LC-3b memory as 2^16 bytes of memory, byte-addressable. This is the memory that the user sees, and may bear no relationship to the actual physical memory. Suppose that the actual physical address space is 8K bytes, and our page size is 512 bytes. What is the size of the PFN? Suppose we have a virtual memory system in which virtual memory is divided into User Space and System Space, and System Page Table remains resident in physical memory. System space includes trap vector table, interrupt vector table, operating system and supervisor stack as shown in Figure A.1 in Appendix A. The rest of the address space in Figure A.1 is user space. If each PTE contained, in addition to the PFN, a Valid bit, a modified bit, and two bits of access control, how many bits of physical memory would be required to store the System Page Table?

  12. A little-endian machine with 64KB, byte addressable virtual memory and 4KB physical memory has two-level virtual address translation similar to the VAX. The page size of this machine is 256 bytes. Virtual address space is partitioned into the P0 space, P1 space, system space and reserved space. The space a virtual address belongs to is specified by the most significant two bits of the virtual address, with 00 indicating P0 space, 01 indicating P1 space, and 10 indicating system space. Assume that the PTE is 32 bits and contains only the Valid bit and the PFN in the format V0000000..000PFN.

    For a single load instruction the physical memory was accessed three times, excluding instruction fetch. The first access was at location x108 and the value read from that location (x10B,x10A,x109,x108) was x80000004. Hint: What does this value mean?

    The second access was at location x45C and the third access was at location x942.

    If SBR = x100, P0BR = x8250 and P1BR = x8350,

    1. What is the virtual address corresponding to physical address x45C?
    2. What is 32 bit value read from location x45C?
    3. What is the virtual address corresponding to physical address x942?
  13. Note: In this problem, the user and system virtual address spaces are not sized equally (the system virtual address space is 1/4 of the total virtual address space, and the user virtual address space makes up the other 3/4). Thus you need to include the address region bits in your calculation of the user space virtual page number. To make it easier for the machine to index into the user space page table, PTBR points to 0x380, which is at an offset of -0x20 from the actual first entry in the user space page table at 0x3A0. To index into the user space page table, add (user space virtual page number * PTE size) to the PTBR. (Why does this work?)

    Consider a processor that supports a 9-bit physical address space with byte addressable memory. We would like the processor to support a virtual memory system. The features of the virtual memory system are:

        Virtual Memory Size : 4 Kbytes (12 bit address-space)
        Page Size           : 32 bytes
        PTBR                : 0x380
        SBR                 : 0x1E0

    The virtual memory is divided into two spaces: system space and user space. System space is the first kilobyte of the virtual address space (i.e., most significant two bits of the virtual address are 00). The rest of the virtual memory is user space. The system page table remains resident in physical memory. Each PTE contains, in addition to the PFN, a Valid bit, a modified bit and 2 bits for access control. The format of the PTE is

    Valid Modified Access Control PFN

    (Valid bit is the most significant bit of the PTE and the PFN is stored in the least significant bits.)

    1. How many virtual pages does the system accommodate?

    2. What is the size of the PFN? How big is the PTE?

    3. How many bytes are required for storing the entire user space pagetable? How many pages does this correspond to?

    4. Since the user space page table can occupy a significant portion of the the physical memory, this system uses a 2 level address translation scheme, by storing the user space Page Table in virtual memory (similar to VAX).

      Given the virtual address 0x7AC what is the Physical address?

      The following table shows the contents of the physical memory that you may need to do the translation:

      Address Data
      x1F8 xBA
      x1F9 xBB
      x1FA xBC
      x1FB xBD
      x1FC xBE
      x1FD xB8
      x1FE xB7
      x1FF xB6
      Address Data
      x118 x81
      x119 x72
      x11A x65
      x11B x34
      x11C x97
      x11D x83
      x11E xC6
      x11F xB2
  14. The virtual address of variable X is x3456789A. Find the physical address of X. Assume a Virtual Memory model similar to VAX.

    Remember that in VAX each Virtual Address consists of:

    You will need to know the contents of P0BR: x8AC40000 and SBR: x000C8000.

    You will also need to know the contents of the following physical memory locations:

    x1EBA6EF0:    x80000A72
    x0022D958:    x800F5D37

    Some intermediate questions to help you:

  15. An instruction is said to generate a page fault if a page fault occurs at any time during the processing of that instruction.

    Let's say we added a virtual memory system to the LC-3b. Which instructions can possibly generate a page fault? What is the maximum number of page faults an instruction can possibly generate while it is being processed? Which instructions can possibly generate that maximum number of page faults? Assume that the virtual memory system added uses a one-level translation scheme and the page table is always resident in physical memory.