## Department of Electrical and Computer Engineering The University of Texas at Austin EE 460N Spring 2017 Y. N. Patt, Instructor Chirag Sakhuja, Sarbartha Banerjee, Jonathan Dahm, Arjun Teh, TAs Final Exam May 12, 2017 | Name: Solution | |-----------------------------------------------------------------------------------------------------------------------------------------------| | | | Problem 1 (20 points): | | Problem 2 (10 points): | | Problem 3 (10 points): | | Problem 4 (20 points): | | Problem 5 (20 points): | | Problem 6 (25 points): | | Problem 7 (25 points): | | Total (130 points): | | | | | | Note: Please be sure that your answers to all questions (and all supporting work that is required) are contained in the space provided. | | Note: Please be sure your name is recorded on each sheet of the exam. | | | | Please read the following sentence, and if you agree, sign where requested: I have not given nor received any unauthorized help on this exam. | | | | Signature: | | | GOOD LUCK! | | : Answer the following questions | | lue stored in P1 divide it by 4 o | |-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------| | | Use as many instructions as you no | | lue stored in R1, divide it by 4, a | | | RSHFA RZ, | P1,#2 | | | | | | | | | | | | | Part b (5 points): A | physical cache read access require | s a TLB access, a Tag S | tore access, and a Data store acces | | In general we decrease | latency substantially by doing th | e Tag Store access at the | e same time as we do the Data Sto<br>associative cache. What is the maj | | | | | | | After the d | ata store access, H | ne 4 -way cad | he needs a mux | | After the d<br>to select the | who store access, He direct | ne 4 way cad | he needs a mux<br>e duesnt. | | After the d<br>to select the | euto store access, the direct | ne 4 way cad<br>mapped cach | he needs a mux<br>e duesnt. | | After the d<br>to select the | lata store access, the direct | ne 4 way cad<br>mapped cadh | he needs amux<br>eduesat. | | Part c (5 points): Se | veral device controllers are conne | cted to the asynchronou | us bus discussed in class. What tw | | Part c (5 points): Se things must be true fo | veral device controllers are conne<br>r a device controller to assert SA | cted to the asynchronou | he needs a mux eduesht. us bus discussed in class. What to rue for a device controller to subs | | Part c (5 points): Se things must be true fo quently negate SACK | veral device controllers are conne<br>r a device controller to assert SA( | cted to the asynchronou | ns bus discussed in class. What to rue for a device controller to subs | | Part c (5 points): Se | veral device controllers are conne<br>r a device controller to assert SA( | cted to the asynchronou<br>CK? What must all be t | is bus discussed in class. What to | | Part c (5 points): Se things must be true fo quently negate SACK | veral device controllers are conne<br>r a device controller to assert SA( | cted to the asynchronou<br>CK? What must all be t | is bus discussed in class. What to | | Part c (5 points): Se things must be true fo quently negate SACK. To assert SACK Part d (5 points): As | veral device controllers are conner a device controller to assert SAG | cted to the asynchronou<br>CK? What must all be to<br>To negate SACK | is bus discussed in class. What to | | Part c (5 points): Se things must be true fo quently negate SACK? To assert SACK Part d (5 points): As arithmetic? Why or when the same same same same same same same sam | veral device controllers are conner a device controller to assert SAG | cted to the asynchronou<br>CK? What must all be to<br>To negate SACK<br>in floating point arithmer. | BY35Y metic. Is it a problem in fixed po | 2 Name:\_\_\_\_\_ Problem 2 (10 points): Shown below are the addresses and contents of five memory locations. | Addr | Contents | |------|----------| | x00 | 00000000 | | x01 | 00000001 | | x02 | 00000010 | | x04 | 00000011 | | x1E | 00000100 | Memory address bits are broken down as follows: Part a (5 points): Your first job, identify the five locations specified above by putting their contents in the proper locations of the memory structure: Part b (5 points): How many clock cycles would it take to read the contents of the five memory locations specified above in sequential order, if we are restricted to access the memory one byte at a time? Assume it takes 3 cycles to open a row, 1 cycle to access an open row, and 2 cycles to close a row. $$(3+1)+(1)+(3+1)+(1)+(2+3+1)$$ 16 cycles **Problem 3 (10 points):** Three processors P1, P2, and P3; each has its own cache C1, C2, and C3. The caches are connected to the memory via a bus. Cache coherency is maintained by a Goodman Snoopy Cache protocol. Initially, cache lines A, B, and C are not contained in any of the three caches. P1, P2, and P3 access memory data from lines A, B, and C in the following order: | P1 | read | Α | |----|-------|---| | P2 | write | В | | P1 | write | Α | | P3 | read | В | | P2 | write | В | | P1 | write | С | | P3 | read | С | | P1 | write | Α | What is the state (Invalid, Valid, etc.) of each cache line in each cache after the 8 accesses have completed? | | CI | |---|---------| | A | Pirty | | В | Invalld | | С | Valid | | | C2 | |---|----------| | A | Invalid | | В | Reserved | | C | Invalid | | | C3 | |---|---------| | A | Invalid | | В | Invalid | | С | Valid | **Problem 4 (20 points):** Consider the LC-3b, augmented with a multiply instruction. In this problem, Both ADD and MUL are restricted to the single format OPCODE Ra,Rb,Rc. i.e., no immediates. ADD instructions take 1 cycle of execution, MUL instructions take 8 cycles of execution. The data path contains exactly one pipelined multiplier. The dataflow graph of a seven-instruction fragment of a program that is in the process of execution is shown below: Prior to fetching the first instruction, the Register file is as shown: | R0 | 7 | |----|----| | R1 | 2 | | R2 | 6 | | R3 | 6 | | R4 | -3 | | R5 | -4 | | R6 | -5 | | R7 | 1 | At the time that the seventh of the seven-instruction fragment is stored in a reservation station, the Reorder Buffer is as follows: | V | R | Ret | DR | Value | | |---|---|-----|----|-------|---| | 1 | 1 | 1 | R2 | 4 | | | 1 | 1 | 0 | R2 | 6 | _ | | 1 | 0 | 0 | R2 | _ | | | 1 | 0 | 0 | R2 | _ | | | 1 | 1 | 0 | R0 | 4 | | | 1 | 1 | 0 | R1 | -2 | | | 1 | 0 | 0 | R0 | - | | | 1 | 1 | 0 | R1 | -1 | | | 1 | 0 | 0 | R6 | _ | - | if the entry is in use i.e. V=0 in Each entry contains 3 bits to indicate its state. The Valid (V) bit indicates if the entry is in use, i.e. V=0 indicates an empty slot in the reorder buffer. The Ready (R) bit indicates that the entry corresponds to an instruction that has successfully completed execution. The Retirement bit (Ret) indicates that the executed instruction has retired. ## PROBLEM CONTINUES ON NEXT PAGE | Name: | | | |--------|--|--| | rvame. | | | Part a (5 points): What is the value of R2 when the Reorder Buffer is as shown on the previous page? 4 Part b (15 points): Completely specify below the seven instructions in the seven-instruction fragment. | MUL | R2 . | RI, RZ | |-----|------|--------| | | | R2, P6 | | ADD | po | Ro, R4 | | APO | pI, | RI, RE | | MUL | 126 | RZ, RO | | ADD | PI, | RL, RT | | MUL | R6, | RO, RI | Since the first instruction in the ROB that's in the fragment is not done, but future instructions are, the first instruction must be a multiply. The instructions that are done can be confirmed with their values. Name: Problem 5 (20 points): We wish to add a new instruction to the LC-3b, called ARRAYCMP, which compares two equal size, word arrays. If they are identical, the Z bit will be set to 1. If not, the Z bit will be set to 0. We will use unused opcode 1010. The instruction format for ARRAYCMP is: | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | |----|----|----|----|----|----|---|---|----|---|---|---|----|---|----|---| | | 10 | 10 | | | DR | | | SR | i | 0 | ( | 00 | | SR | 2 | SR1 and SR2 contain the starting addresses of the two arrays. DR contains the size of the arrays in words. Note the side effects: SR1, SR2, and DR are clobbered by ARRAYCMP. We implement ARRAYCMP by comparing the two arrays one word at a time. After each compare, we decrement DR. If the two arrays are identical, DR will equal 0, and the Z bit will be set. If a mismatch occurs, ARRAYCMP will stop execution with Z=0. A AND assure THES: 63 and TRE2:01 added to DRMUX The datapath is modified to implement ARRAYCMP, shown below in bold. We have also added a third input to SRIMUX to select IR[2:0]. Your iob: Complete the state machine and microscopic SR1MUX to select IR[2:0]. Your job: Complete the state machine and microsequencer and fill in the control store signals on the following two pages. | Name: | | | | |-------|--|--|--| | | | | | Part a (10 points): Complete the state machine to implement ARRAYCMP given the datapath modifications on the previous page. PROBLEM CONTINUES ON NEXT PAGE | Name: | | | | |-------|--|--|--| | | | | | | | | | | Part b (5 points): Add logic to the microsequencer to support the state machine from Part a. Put all your modifications in the bold box. You may use two additional control store signals called US1 and US2. You may also add additional inputs to the bold box. Part c (5 points): We have added several control signals for controlling the data path, as shown below: | Signal name | Signal values | | |-------------|---------------|-----------------------------| | LD.A/1: | NO, LOAD | | | LD.B/1: | NO, LOAD | | | AMUX/2: | SR2MUX | ;select output of SR2MUX | | | 2 | ;select the value 2 | | | -1 | ;select the value -1 | | | A | ;select the value in A | | BMUX/1: | SR1OUT | ;select the value in SR1OUT | | | В | ;select the value in B | Fill in the control signals needed to implement states A,B,C shown in bold in the state machine. | State | ALUK | LD.CC | LD.A | LD.B | AMUX | BMUX | GateALU | |-------|------|-------|------|------|------|----------|---------| | A | 10 | l | 0 | D | (1 | | | | В | 66 | 0 | 0 | J | 01 | 0 | 1 | | C | m 0 | | 0 | 0 | 10 | <b>O</b> | 1 | | Name: | | | | |-------|--|--|--| | rame. | | | | **Problem 6 (25 points):** Consider the LC-3b augmented with a direct-mapped, write-back cache that contains instructions and data. Line size is 8 bytes. The cache is initially empty. In the execution of a program, a number of accesses between memory and the cache occurred. The table below shows the first several cache line transfers between the cache and memory during execution of the program. Hint: Recall, in a write back cache, transfers go from memory to cache and from cache to memory. Hint: Recall, the LC-3b is little-endian. | Address | Cache Line | |---------|--------------------| | x3000 | xE00030006000E001 | | x3008 | xF0250BFDD0016200 | | x6010 | x00000000000000000 | | xC020 | x00000000000000000 | | x8040 | x00000000000000000 | | x0080 | x00000000000000000 | | x3000 | xE00030006000E000 | | x0100 | x00000000000000000 | | | | Part a (5 points): Shown below is part of the program that was executed. Your first job: Complete the table with the remaining instructions of the program. ...in assembly language, of course! | Label | Assembly | |-------|----------------| | | .ORIG x3000 | | | LEA R0, A | | | LDW MO, RD, HT | | A | STB RT RD #0 | | | LEA DØ, B | | 8 | LDW 41, 20, 40 | | | LSHE PP. PP. H | | | BRnp B | | | HALT | Part b (5 points): Why is there an access to x3000 after the access to x0080? Please answer in 20 words or fewer. When accessing line x100, it must first evict dirty line x3000 due to set conflict PROBLEM CONTINUES ON NEXT PAGE A Also valid to say eviction is due to | Name: | | | | |-------|--|--|--| | | | | | Part c (15 points): Finally, complete the specification of the cache, i.e., how many bits of index, how many bits of tag, how big is the data store? Please show your work. Size of data store: 256 Number of tag bits: Number of index bits: must be here because x 2000 did not cause a conflict, but ro 100 did # of sets hid (25)(8) = 28 = 256B inesize | Name: | | | |-------|--|--| | | | | **Problem 7 (25 points):** Consider a byte-addressable memory system with two levels of virtual to physical translation (like the VAX). Virtual Address Space:64KBPhysical Memory Size:4KBUser Space:0x0000-0x7FFFPage Size:128BSystem Space:0x8000-0xFFFFPTE Size:2 bytes A PTE is shown below: A PTE includes a Valid bit, a Modify bit, and a 4 bit PROT field. The low bits (the exact number is for you to determine) are used for the PFN. In user mode, the processor can read/write to all user space pages and does not have any access to system pages. The system pages can only be read/written in supervisor mode. To implement this protection, the encoding of the PROT bits is 1111 for user page PTEs and 1100 for system page PTEs. Each process, as you know, has its own user space page table. Process page tables are stored in contiguous system virtual memory as follows: Process1 user space page table is stored immediately after Process0 user space page table, and Process2 user space page table is stored immediately after Process1 user space page table. All user space page tables start at the beginning of a page. The system space page table starts at the beginning of a frame. Process0 user space Length Register (LR): 128 Process1 user space LR: 64 Process 2 user space LR: 64 The processor executes the following code: Note: context switches are required if the processor moves from executing code from one process to executing code from another process. ## PROBLEM CONTINUES ON NEXT PAGE The microarchitecture includes an 8-entry TLB. Its state, before the user code starts execution, is shown below. Note that the Process ID is included so the TLB is not flushed on a context switch. The TLB only contains PTEs for user space. | Process | Page Number | Frame Number | |---------|-------------|--------------| | - | | | | 1 | x000 | x04 | | - | | | | 2 | x003 | x05 | | 1 | x013 | x1B | | 2 | x042 | x08 | | - | | | | - | | | Part a (12 points): Fill in the following values: Part b (13 points): The following table shows successive entries for successive memory accesses due to execution of page tables the user code shown above. Memory operations due to the OS during context switches are not included. Are so contiguous Your job: Complete the table. Some entries may remain blank. | Process ID | VA | PA | Data | TLB Hit? | |------------|-----------|-------|--------|----------| | 0 | _ | xF80 | xB01B | | | 0 | xA004 | x084 | xBC02 | × | | 0 | x0104 | x lo4 | ×1042 | × | | 0 | x A 0 0 6 | x lob | xD651 | <b>✓</b> | | l | X D612 | x212 | ×1283 | / | | 2 | x01FE | ×2FE | x12A0 | <b>✓</b> | | 2 | | xF86 | x BOID | _ | | 2 | 881AX | xE88 | ×BW1 | × | | 2 | X 0 2.00 | x080 | x5483 | × | Figure 1: LC-3b Instruction Encodings Table 1: Data path control signals | Signal Name | Signal Values | | |------------------------|-------------------|-------------------------------------| | LD.MAR/1: | NO(0), LOAD( | 1) | | LD.MAR/1:<br>LD.MDR/1: | NO(0), LOAD( | | | LD.IR/1: | NO(0), LOAD( | | | LD.BEN/1: | NO(0), LOAD( | | | LD.REG/1: | NO(0), LOAD( | | | LD.CC/1: | NO(0), LOAD( | | | LD.PC/1: | NO(0), LOAD( | 1) | | GatePC/1: | NO(0), YES(1) | | | GateMDR/1: | NO(0), YES(1) | | | GateALU/1: | NO(0), YES(1) | | | GateMARMUX/1: | NO(0), YES(1) | | | GateSHF/1: | NO(0), YES(1) | | | PCMUX/2: | PC + 2(0) | ;select pc+2 | | | BUS(1) | ;select value from bus | | | ADDER(2) | select output of address adder | | DRMUX/1: | 11.9(0) | :destination IR[11:9] | | DIG.TOID !! | R7(1) | destination R7 | | SR1MUX/1: | 11.0(0) | | | SKIMUA/I: | 11.9(0)<br>8.6(1) | ;source IR[11:9]<br>;source IR[8:6] | | | 0.0(1) | ,source IK[8.0] | | ADDR1MUX/1: | PC(0), BaseR(1 | ) | | ADDR2MUX/2: | ZERO(0) | ;select the value zero | | | offset6(1) | ;select SEXT[IR[5:0]] | | | PCoffset9(2) | ;select SEXT[IR[8:0]] | | | PCoffset11(3) | ;select SEXT[IR[10:0]] | | MARMUX/1: | 7.0(0) | :select LSHF(ZEXT[IR[7:0]],1) | | | ADDER(1) | ;select output of address adder | | ALUK/2: | ADD(0), AND( | (1), XOR(2), PASSA(3) | | MIO.EN/1: | NO(0), YES(1) | | | R.W/1: | RD(0), WR(1) | | | DATA.SIZE/1: | BYTE(0), WOF | RD(1) | | LSHF1/1: | NO(0), YES(1) | | Table 2: Microsequencer control signals | Signal Name | Signal Values | | |-----------------|----------------------------------------------------------------------------------------------------------------------------------------|--| | J/6:<br>COND/2: | COND <sub>0</sub> ;Unconditional<br>COND <sub>1</sub> ;Memory Ready<br>COND <sub>2</sub> ;Branch<br>COND <sub>3</sub> ;Addressing Mode | | | IRD/1: | NO, YES | | Figure 2: A state machine for the LC-3b Figure 3: The LC-3b data path 17 Figure 4: The microsequencer of the LC-3b base machine