## Computer Architecture: Fundamentals, Tradeoffs, Challenges

## Chapter 3: Microarchitecture Basics

# Yale Patt The University of Texas at Austin

Austin, Texas Spring, 2023

#### **Outline**

- The job of the Microarchitecture
  - Changes the state of the machine, based on the instruction
- Wilkes' original diode matrix
- An example: The LC-3b
  - State machine, Data Path, Microsequencer, Control Signals
- Tradeoffs
- Pipelining

### Job of the Microarchitecture

The machine is in a state

Instruction

۱ \/

The machine is in a new state

## Wilkes original diode matrices















| Signal Name   | Signal Values                              |                                                                                                    |
|---------------|--------------------------------------------|----------------------------------------------------------------------------------------------------|
| LD.MAR/1:     | NO, LOAD                                   |                                                                                                    |
| LD.MDR/1:     | NO, LOAD                                   |                                                                                                    |
| LD.IR/1:      | NO, LOAD                                   |                                                                                                    |
| LD.BEN/1:     | NO, LOAD                                   |                                                                                                    |
| LD.REG/1:     | NO, LOAD                                   |                                                                                                    |
| LD.CC/1:      | NO, LOAD                                   |                                                                                                    |
| LD.PC/1:      | NO, LOAD                                   |                                                                                                    |
| GatePC/1:     | NO, YES                                    |                                                                                                    |
| GateMDR/1:    | NO, YES                                    |                                                                                                    |
| GateALU/1:    | NO, YES                                    |                                                                                                    |
| GateMARMUX/1: | NO, YES                                    |                                                                                                    |
| GateSHF/1:    | NO, YES                                    |                                                                                                    |
| PCMUX/2:      | PC+2<br>BUS<br>ADDER                       | ;select pc+2<br>;select value from bus<br>;select output of address adder                          |
| DRMUX/1:      | 11.9<br>R7                                 | ;destination IR[11:9]<br>;destination R7                                                           |
| SR1MUX/1:     | 11.9<br>8.6                                | ;source IR[11:9]<br>;source IR[8:6]                                                                |
| ADDR1MUX/1:   | PC, BaseR                                  |                                                                                                    |
| ADDR2MUX/2:   | ZERO<br>offset6<br>PCoffset9<br>PCoffset11 | ;select the value zero<br>;select SEXT[IR[5:0]]<br>;select SEXT[IR[8:0]]<br>;select SEXT[IR[10:0]] |
| MARMUX/1:     | 7.0<br>ADDER                               | ;select LSHF(ZEXT[IR[7:0]],1)<br>;select output of address adder                                   |
| ALUK/2:       | ADD, AND, XOR, PASSA                       |                                                                                                    |
| MIO.EN/1:     | NO, YES                                    |                                                                                                    |
| R.W/1:        | RD, WR                                     |                                                                                                    |
| DATA.SIZE/1:  | BYTE, WORD                                 |                                                                                                    |
| LSHF1/1:      | NO, YES                                    | -                                                                                                  |
|               | -,                                         |                                                                                                    |

Table C.1: Data path control signals

| Signal Name     | Signal Values                                                                    |                                                                |
|-----------------|----------------------------------------------------------------------------------|----------------------------------------------------------------|
| J/6:<br>COND/2: | COND <sub>0</sub><br>COND <sub>1</sub><br>COND <sub>2</sub><br>COND <sub>3</sub> | ;Unconditional<br>;Memory Ready<br>;Branch<br>;Addressing Mode |
| IRD/1:          | NO, YES                                                                          |                                                                |

Table C.2: Microsequencer control signals

#### **Tradeoffs**

- Vertical vs Horizontal Microcode
  - Selective (generally one micro-op per micro-op)
  - Constructive (many fields, each specified
- One level vs Two level Microcode
  - Example: the Nanodata QM-1 (360 bit nanoinstructions)
- CPI vs cycle time (or, IPC vs frequency)
- In-order vs out-of-order execution
- Issue width
- Use of chip real estate
  - Better branch predictor, accelerators, microcode
- Unified vs separate instruction and data caches
- On-chip interconnect (cost, contention, latency)
- Prefetching

## A Simple Pipeline

- Four stage pipeline (each stage takes one cycle)
  - Fetch (instruction fetched from on chip storage)
  - Decode (instruction decoded)
  - Address Generation & Execute (result latched)
  - Memory access (loads and stores to on chip storage)

```
Cycle1 Cycle2 Cycle3 Cycle4 Cycle5 Cycle6 Cycle7

Fetch1 Decode AG/EX Mem

Fetch2 Decode AG/EX Mem

Fetch3 Decode AG/EX Mem

Fetch4 Decode AG/EX Mem
```

## Pipeline Problems

#### Flow dependency

- Instruction 1: ADD R1,R2,R3 (R1 loaded at end of cycle 3
- Instruction 2: ADD R4,R1,R5 (R4 loaded at end of cycle 4)
- Instruction 3: LD R1,A (R1 loaded at end of cycle 6)
- Instruction 4: ADD R1,R1,R4 (R1 read at start of cycle 6)
  Incorrect data read from R1 !!!

#### Conditional branches

- Instruction 1: ADD R1,R2,R3 (cc loaded at end of cycle 3)
- Instruction 2: BRz A
- Instruction 3: Fetch address needed at start of cycle 3)
   Microarchitecturre does not know the fetch address !!!

