- VLIW vs ... - VLIW: compiler does it - Superscalar: part of the microarchitecture - 0,1,2,3 address machine (how many EXPLICIT) - LC-3b: 3 address - x86: 2 address - VAX: both 2 and 3 - Old days: one address (registers were expensive) - Stack machine: 0 address - Precise exceptions vs ... - Precise exceptions: today, everyone - IBM 360/91: NO = Special permission for IBM Hy. - Privilege modes - Most ISA have two supervisor and user - VAX had four - Almost everyone 41 41 41 Cydrone & Cydra (5) Multiple La vor. ran NOP - Help for the programmer vs help fo the uarchitect - Who gets the cushy job? - Unaligned accesses - Data Types - Addressing modes - Unaligned access - LC-3b does not allow unaligned access - DEC: PDP-11 (no), VAX (yes), Alpha (no) Same company - Data types (rich or lean) - Integers, floats of various sizes - Doubly-linked list, character string - Addressing modes (rich or lean) - Indirect addressing - Autoincrement, postdecrement - SIB byte in x86 - Compile time vs run time - MIPS initially had NO hardware interlocks - Most have fixed length, uniform decode - x86 has variable length, with prefixes - i432 had different bit size opcode - Word length what does wood length specif - VAX: 32 bits - x86: initially 16 bits, then 32 bits, today 64 bits - CRAY 1: 64 bits - DEC System 20: 36 bits (LISP car, cdr for Al processing) MIPS P = XXCPEX t - Memory address space (keeps growing!) - Memory addressability - Most memories: byte addressable (Data processing) - Scientific machines: 64 bits (size of normal fl.pt. operands) - Burroughs 1700: one bit (virtual machines) - Page Size (4KB vs more than one) Familie PhD dissuble Wasted space Guvenilia 2, h > - Longer access time - Too many PTEs Guvenilia TSCA 2010 - I/O architecture - Most today use memory mapped I/O - Old days, special I/O instructions - x86 still has both - Many machine have 32 32-bit registers - x86 now has 512-bit registers - Itanium has one-bit predicate registers ASPLOS 1982 - Condition codes vs using a general purpose register - MIPS, CDC6600 RS 6000 John Cocke IBM Rich instruction set vs Lean instruction set - Hewlett Packard's RISC: HPPA has 140 instructions - Orthogonal to RISC vs CISC Register Window US Set of GPLS - Load/Store vs Operate in the same instruction - LC-3b, Mare load/store - x86 is not load/store Local Store had a window of value. Out of order changed that. ## Tradeoffs (with examples) Dynamic static interface (The semantic gap) - EDITPC, INDEX, AOBLEQ, LDCTX, CALL, FF, - INSQUE/REMQUE, Triads, CHMD PROBE 1432, IBM System 38, Data General Fountainhead X86: CMOV ARM: inst[31:28] THUMB: IT block - ARM: T bit in the status register - VAX: Compatibility mode bit in the PSL MDOXY(1) # Important to note that SIMD can be either Vector Processors or Array Processors #### SIME Vector Processors, Array Processors SIMD Vector Processors, Array Processors # Vector processing example (continued) Vector Processor Timing Vector code (no vector chaining): 285 clock cycles Vector code (with chaining): 182 clock cycles Vector code (with 2 load, 1 store port to memory): ## Vector processing example The scalar code: for $$i=1,50$$ $A(i) = (B(i)+C(i))/2$ ; Vectorizable! The vector code: ## Vector processing example (continued) #### Baseline: with a Scalar Processor: - Loads/Stores take 11 cycles - Add takes 4 cycles - Shift takes 1 cycle - Iteration Control takes 2 cycles #### 50 iterations of (LD, LD, Add, Shift, Store, Iteration Ctl) - 50 x (Load, Load, Add, Shift, Store, Iteration\_Ctl) - $-50 \times (11 + 11 + 4 + 1 + 11 \ 2) = 50 \times 40 = 2000$ clock cycles #### Vector Architecture - Vector Registers - Each register has multiple components - Vector Instructions - Loads/Stores - .oads/Stores Multiple memory locations in one instruction encouhter = gr Fault - Length register defines the number of components - · (Stride) register defines distance between successive memory locations - Operates - Operates operate component by component - For example, C = A+B means Ci = Ai + Bi for all I - Instruction is ADD V3, V1, V2 Fig. 5. Block diagram of registers RUSSAL, COMM ACM A - XA 16 ## RISCV Characteristics (continued) #### RV32I - 47 distinct opcodes ( - · loads, stores, shifts, arith, logic, compare, branch, jump, synch, count - 32 GPRs (x0 to x31, x0=0, x1 used for call return linkage) - Also contains a PC - 32 bit instructions - Many do Hut - · Can be extended by a multiple of n bits - · Mixture allows for unaligned access - Also allows for 16 bit instructions, but then restricted to 8 registers #### 4 basic instruction formats - Little endian - · Load/Store - No predication Conditional branches use GPRs. not condition codes ## RISCV (characteristics) #### The subsets - Integer: 32-bit (RV32I), 64-bit (RV64I), 128-bit (RV128I) - Float extension: 32-bit (RV32F), 64-bit (RV64D), 128-bit (RV128Q) SEMENTIC - M extension: Integer MUL/DIV - A extension: Atomic instructions - L extension: Decimal float - C extension: Compressed - B extension: Bit manipulation - J extension: Dynamically translated - T extension: Transactional memory - P extension: Packed SIMD Why bolk together? - V extension: Vector operations - E extension: Embedded Controller (RV32E) - G extension: A system, really (IMAFD) #### **RISCV** - The 5<sup>th</sup> chip from Professor David Patterson's group - UC Berkeley - Nothing (really) in common with their other four risc chips 3. All smalltak 4. Add cechrerene - Mostly handled by Professor Krste Asonovic - · Major selling point: Open Source I what Jin Kella: Wy Rise V? - Overall structure - Multiple subset ISAs (Integer, Float, MUL/DIV, Atomic, etc.) - Designers build their own system, picking and choosing - MUST contain one of the Integer subsets (32-bit or 64-bit) - The rest (extensions) are up to the designer CUVENILIK Variable length instruction (one byte to 16 bytes) Opcode 2 ModR/M SIB Opcode Address Immed Prefix 3 Prefix 4 Prefix 1 Prefix 2 up to 4 bytes #### Characteristics - Rich set of addressing modes - Two-address machine - SSE extension -> Now AVX - Not load/store - Three page sizes (4KB, 2MB, 1GB) => Three TLBs Fach Governiling - Register sizes: 8b, 16b, 32b, 64b, 128b, ... - Example: AH, Ax, EAX, ... - Memory: Byte addressable, 64 bit address space One bit adhesel. ## Characteristics (the LC-3b), continued - Vector architecture (instructions, operands): no - Virtual memory specification: not yet! - Address space - Translation mechanism - Protection - Page size - System architecture - State to deal with: trap vector table, interrupt vector table - Interrupt, exception handling - Instructions for the O/S to use (RTI, CHMD) - NOT the instruction cycle (# is part of the uarch) ### The LC-3b Instruction Set | | 15 14 13 1 | 2 11 10 9 | 8 7 6 | 5 | 4 3 | 2 1 0 | | |--------------------|------------|-------------|----------------|-------------|--------|---------|--| | ADD* [ | 0001 | DR | SR1 | 0 | 00 | SR2 | | | ADD <sup>†</sup> | 0001 | DR | SR1 | ! imm5 | | | | | AND [ | 0101 | DR | SRI | 0 | 00 | SR2 | | | AND⁺ | 0101 | DR | SRI | 1 | | imm5 | | | BR | 0000 | n z p | | PCoffset9 | | | | | JMP [ | 1100 | 000 | BaseR | 000000 | | | | | JSR | 0100 | 1 | PC | PCoffset11 | | | | | JSRR | 0100 | 0 00 | BaseR | | 000000 | | | | LDB* | 0010 | DR | BaseR | boffselő | | | | | LDW* | 0110 | DR | BaseR | offselő | | | | | LEA* | 1110 | DR | | PCoffset9 | | | | | NOT T | 1001 | DR | SR | 1 | | 11111 | | | RET | 1100 | 000 | jui, | 000000 | | | | | RTI | 1000 | 00000000000 | | | | | | | LSHF | 1101 | DR | SR | 0 0 amount4 | | | | | RSHFL <sup>†</sup> | 1101 | DR | SR | 0 | 1 | amount4 | | | RSHFA <sup>*</sup> | 1101 | DR | SR | 1 | 1 | amount4 | | | STB | 0011 | SR | BaseR | boffsető | | | | | STW | 0111 | SR | BaseR | offset6 | | | | | TRAP | 1111 | 0000 | 0000 trapvect8 | | | | | | XOR* [ | 1001 | DR | SRI | 0 | 00 SR2 | | | | XOR <sup>+</sup> [ | 1001 | DR | SR | 1 | 1 Imm5 | | | | notused | 1016 | 100 | V 16. | | - | 7 7 7 | | | not used | 1011 | | | | | | | ## Characteristics (The LC-3b) - Processor State (memory, registers) - Memory addressability: byte - Memory address space: 2^16 - Registers: 8 GPR, Condition Codes N, Z, P - Word length: 16 bits - Privilege: 2 levels, supervisor, user - Priority: 8 levels Higher land? Empering CHECK POWER FAIL - Instruction format: fixed length, 16 bits - Endian-ness: little endian - Instructions (opcode, addressing mode, data type) - Opcode (14 opcodes, including XOR, SHF, LDB) - Addressing modes (PC-relative, Register + offset) - Data types (2's complement 16 bit integers, bit vector) - Three-address machine ## Another DIGRESSION (nugget) - The pure distinction between ISA vs uarchitecture - ISA is visible to the software - Microarchitecture is "underneath the hood" - If you let the compiler know how the ISA is implemented, - i.e., if you break the walls between the transformation levels, - You can produce better code for that implementation - ...at the expense of compatibility - PORTABILITY Today, with the impending demise of Moore's Law, - Computer Architecture is looking for ways to still be relevant - I have been preaching: Break the layers! - MIT recent white paper: "There is plenty of room at the top!" PDP-11 N, ×86 Milu Flynn #### **NOT Microarchitecture** #### Architecture - Software Visible - Address Space, Addressability - Opcodes, Data Types, Addressing Modes - Privilege, Priority - Support for Multiprocessors (e.g., TSET) - Support for Multiprogramming (e.g., LDCTX) #### Microarchitecture - Not Software Visible - Caches (although this has changed, ...sort of) - Branch Prediction - The instruction cycle - Pipelining DIGRESSION (nugget): You have a brilliant idea, and It requires a change to the ISA or to the uarchitecture. #### What is the ISA? - A specification - The interface between hardware and software - A contract - What the software demands - What the hardware agrees to deliver #### **Outline** - What is it? - The interface between hardware and software - Aspecification Contact - NOT microarchitecture - NOT just the instruction set - The Instruction - The atomic unit of processing - Changes the state of the machine - Characteristics - First, a simple example: The LC-3b - The x86, RISCV - Vector Architecture - Tradeoffs (with examples) #### Computer Architecture: Fundamentals, Tradeoffs, Challenges Chapter 2: The ISA Yale Patt The University of Texas at Austin Austin, Texas Spring, 2023