Spring 2007

Note: buzzwords marked with an asterisk (*) are annotated in the Fall 2005 Exam 1 Buzzwords.

My tongue in cheek phrase to emphasize the importance of tradeoffs to the discipline of computer architecture. Clearly, computer architecture is more art than science. Science, we like to think, involves a coherent body of knowledge, even though we have yet to figure out all the connections. Art, on the other hand, is the result of individual expressions of the various artists. Since each computer architecture is the result of the individual(s) who specified it, there is no such completely coherent structure. So, I opined if computer architecture is a science at all, it is a science of tradeoffs. In class, we keep coming up with design choices that involve tradeoffs. In my view, "tradeoffs" is at the heart of computer architecture.
levels of transformation *
latency
how long it takes something to happen. memory latency -- the time it takes from when the address is provided until the information stored at that address is available. We will talk about interrupt latency after the spring break -- the time it takes from the time some external device requests service until the time that device starts getting serviced.
PLA
Programming Logic Array. A combinational structure consisting of an array of AND gates all producing outputs which are fed to an array of OR gates. The general structure of a PLA sources all external inputs to each AND gate, making each AND gate able to identify one input combination (row of a truth table). Also, in the general structure, all AND gate outputs are available as input to every OR gate. Those actually connected as inputs to an OR gate are those that produce a 1 in the output column of the truth table of the relevant logic function. Thus, in principal, any arbitrary logic function of the external inputs can be implemented as the output of one of the OR gates. In practical structures, the fan-in (maximum number of inputs) to an OR gate is fixed, thereby limiting the ability of the PLA to implement all combinational logic functions.
Recall your experience with the multiplication of decimal numbers. We systematically multiply the multiplicand by each digit of the multiplier and write each corresponding temporary result right-aligned with the corresponding multiplier digit. Then we add up these temporary results, producing the product. In multiplying binary numbers, the only digits are 1 and 0 so the multiplication step consists of simply copying the multiplicand or writing all zeroes as each temporary result. If we are multiplying n bit numbers, we get n temporary results. To add these temporary results properly, we need to take into account their relative alignment. This can be done by n-1 steps where each step includes the sum of all temporary steps before it. That is, after step 1, we have correctly added the temporary results for bits 1 and 2. After step i, we have correctly added the temporary results for bits 1, 2, ... i+1. The (i+1)th step consists of SHIFTING right the (i+2)th temporary result one bit so as to align it with the sum of the first i steps and then ADDING the two quantities. Thus, the name shift and add to describe binary multiplication.
endianness *
algorithm *
ISA *
microarchitecture *
static instruction stream
The program as it sits in memory, or rather out on the disk. Each instruction generated by the compiler exists once in the static instruction stream.
dynamic instruction stream
The *sequence* of instructions as they are executed. A single instruction (located in a single address of memory) counts once in the static instruction stream. Its count vis-a-vis the dynamic instruction stream is the number of times it is executed. The dynamic instruction stream (often called a *trace*) gives a record of the order that the intructions were carried out.
state of machine
Computer processing is all about systematically changing the state of the machine in response to each instruction in the dynamic execution of the program. That is a program executes by the machine being in an initial state, and one by one each instruction execution changes the state in accordance with the semantics (specification) of the instruction. The machine state consists of the PC, the PSR, the general purpose registers, the condition codes (if the ISA specifies condition codes), the contents of each memory location, and the mapping registers that enable the memory management process. More sophisticated ISAs provide additional state.
design point
a specification of the criteria to be optimized in the design process. If we are interested in performance, or reliability, or availability, or low cost, we colloquially say our design point is performance, or reliability, or ...etc.
assembly language
a language very close in semantics to the ISA of the machine, with the advantage that bit patterns are replaced by mnemonic devices, and some additional useful features are often present. Nonetheless, the defining characteristic of an assembly language is that the level of control over changing the machine state is very much like what one would have if one programmed in 0s and 1s. Also, sometimes called a "low level" language to distinguish it from "high level" languages that are more tuned to the expressiveness of humans. High level languages (more than 1000 out there): C#, C++, Java, C, all the way back to antiquity, when Fortran and Cobol showed up. Assembly language programs are translated by an assembler which does little more than replace all the mnemonic devices with 0s and 1s. It would be a tough job to try to assemble a program written in the assembly language for ISA A into the 0s and 1s of ISA B. On the other hand, high level language programs are generally translated by a compiler. Since the instructions in the high level language are quite a bit removed from the intricasies of any ISA, it is not difficult to change the ISA that a compiler translates the high level language program into.
label *
opcode *
pseudo opcode *
symbol table *
two pass assembler *
instruction level simulator
A simulator that changes its state at the granularity of the instruction. That is, given the machine state and an instruction, the instruction level simulator produces the state required by the instruction. No within-instruction states are simulated.
cycle level simulator
A simulator that changes its state at the granularity of the clock cycle. That is, given the machine state and the control signals asserted during a clock cycle, the cycle level simulator produces the state required by those control signals.
steering bit
A bit in an instruction that specifies the interpretation of other bits in the instruction, depending on the value of the steering bit. For example, bit[5] in the LC-3b is a steering bit, since it determines the usage of bits[4:0] of the instruction.
vector operation
an operation that is performed on a set (or sets) of values (referred to as vectors). For example, a vector A consists of n values A1, A2, A3, ..An. And a vector B consists of n values B1, B2, ...Bn. A vector add of vectors A and B would produce the vector consisting of A1+B1, A2+B2, ... An+Bn. We often refer to this mode as data-parallel, since there are no dependences between the component adds, and they can all be executed in parallel.
prefetching
Fetching instructions or data early so that they are available quickly when they are needed. For example, most microprocessors have units of storage called L1 caches that can be accessed in one or two cycles. It is useful during instruction fetch or data fetch to be able to access this element in one or two cycles. This can be accomplished by obtaining the element from a much slower unit of storage before it is really needed. We call that access in advance of when needed *prefetching.*
natural word length
the generic size of elements processed in the machine. Generally, the size of the registers. Generally, the size of values processed by the ALU in a single cycle.
multi-dimensional array
a classical mathematical structure, having multiple dimensions.
dynamic/static interface
the interface between when the compilation process ends, and the run time process begins. The compiler translates the source program producing 0s and 1s that specify the work to be done. The hardware carries out the work of those 0s and 1s at run time. Although it is now changing, the compiler never used to have run-time behavior available to it to make its decisions. It did not know which cache accesses missed, which branches were actually taken, how long certain operate instructions took. It is true that compilers have for a long time, as part of the compilation process, observed what happened when the program ran on some sample data. This is called *profiling.* But that data is not the *real* data, and in fact can lead to decisions that could be counterproductive, depending on how close to the sample data the real data is. We refer to the DSI as the interface between when all activity is done without real knowledge of real run-time data, and when all activity has that knowledge.
fixed/variable length instruction *
uniform decode *
data type *
An addressing mode, where the address (A) of the source data is contained in a memory location (B), the address of B is computed by the addressing mode. We say that B is a pointer variable because its contents is not data but rather the address A of the data.
unaligned access *
arithmetic/logic shift *
condition codes *
virtual machine (emulation)

An ISA is implemented in hardware. Each instruction in that ISA is fetched, decoded, etc. and at each step of the instruction cycle, control signals are generated to carry out the phases of that instruction. What if you wanted a program compiled into the ISA of machine A to be executed on a machine that has a different ISA, say ISA B. Clearly, the 0s and 1s of the ISA of A have no meaning with respect to the machine of ISA B. We call the ISA of machine A a virtual machine because it does not correspond to real hardware in our situation. What is needed is a program (usually called an emulator) that translates each instruction of ISA A into a sequence of instructions from the ISA B that can then be carried out on the machine of ISA B and produce the same result as it would have if the ISA A instruction had been carried out on a machine of ISA A. We often refer to ISA B as the host machine since that is where the results are actually being produced. And ISA A the target machine since that is the code produced by the compiler from the original program. The concept has been around for a very long time, although it is lately gaining a lot of popularity both in research universities and product development centers. An early example was the Burroughs 1700 of the 1970s. Programs were compiled to three different virtual machines, depending on whether the nature of the code was scientific, business, or system level in nature. The resulting three different ISAs were then emulated on the host B1700 machine. More recently, there have been plenty of Java programs that have been compiled into an ISA referred to as the Java Virtual Machine. The result is that other companies have produced software emulator to translate each JVM instruction into the ISA of its own host machine. So, for example, a Pentium M can run JVM instructions if it wishes by emulating each JVM instruction into a sequence of x86 instructions that does that job.

An analogy to virtual memory is in order. Like virtual memory, a virtual machine exists in the eye of the beholder. Recall a virtual address has to be translated to a physical address before it can be accessed. A virtual machine instruction has to be translated to a sequence of instructions in the host ISA before it can be carried out.

VLIW *
flow dependency
the dependency between a result produced by one instruction that is needed by another instruction. For example the expression A(B+C). We can not do the multiply until we do the add. We say there is a flow dependency between the add and the multiply. A popular well-meaning textbook refers to this as a RAW hazard, meaning that we can not read the source of the multiply until after we write the result of the add. I personally do not like this terminology because it totally distorts the meaning of the word hazard. We will deal with this after Spring break, so you can ignore this for now.
CISC/RISC
In 1980, the term RISC was coined to characterize simple ISAs. The term was an acronym for Reduced Instruction Set Computer, but was ill-advised on a lot of levels, which we may go into late in the semester if there is time. Suffice it to say right now, that the intent was to have each instruction do a single operation in a very streamlined way. SPARC, MIPS, HPPA, Alpha, Motorola 88000 Power-PC are all examples of ISAs that attempted more or less to follow that mantra. x86, Motorola 68000, VAX IBM Zseries are all examples of ISAs that did not. The acronym czars referred to the "other" ISAs as Complicated Instruction Set Computers. For example, the x86 which can load from memory, add to a register and store back to memory in a single x86 instruction -- three distinct operations. ...which led their advocates to shoot back: CISC stood for Comprehensive Instruction Set Computer.
The term refers to ISAs that require separate load and store instructions to move data to and from memory. That is a load/store ISA does not allow operate instructions to operate on data loaded from memory in the same instruction. Nor, does it allow the result of an operate instruction to be written to memory in the same instruction. So-called RISC ISAs were in general Load/Store. So-called CISC ISAs were in general not.
semantic gap *
control signals
signals that control the processing within the data path, the memory, and the microsequencer. The set of control signals that are effective during a single clock cycle are often characterized as a microinstruction.
datapath *
critical path (speed path) design *
microsequencer *
control store *
microinstruction (microcode) *
pipelining (intra-instruction parallelism) *
interrupt *
exception *
cache memory *
atomic unit
a unit that can not be further partitioned without losing its intrinsic identity. It takes its meaning from chemistry where an atom consists of protons, neutrons, and electrons, but once partitioned into these particles, ceases to have the identifying characteristics of the atom. Oxygen is no longer oxygen when the 6 electrons and 6 protons (if I remember my high school chemistry correctly) are lopped off. We say an instruction is the atomic unit of processing since we execute code at that lowest granularity. That is, we never execute a piece of an instruction. Either the whole instruction or none of it. That is, we do not change the machine state corresponding to a piece of an instruction and leave the machine state corresponding to another piece of the same instruction unchanged.
SRAM *
DRAM *
page mode *
RAS *
CAS *
byte write
Although memory is usually byte addressable, the memory bus is usually multiple bytes wide. A byte write is a memory access which stores one byte of information into a designated byte-addressable location while ignoring the rest of the bytes on the memory bus. Implementation requires write enable signals of byte granularity.
soft error (Updated: 03/29/2007)

A soft error is an error that is not hard. Duh.

So, what does that mean? We distinguish errors as transients errors and permanent errors. A transient error is one that occurs due to some event that for purposes of the computer is random. Someone started up a washing machine in the same room, and there is not enough noise margin. Or, some alpha particle got in the way, which gets more and more common as the clock gets more and more into the GHz range. The idea is that if you replay the program, the error will not show up the second time.

The other kind of error is the permanent error. Here the device is broken and always give you the bad result.

Permanent errors are called hard errors. Transient errors are called soft errors.

parity *
ECC *
Hamming Code *
checksum
the bits used to detect whether a burst error has occurred within a large transfer of information, where the errors are not statistically independent. Suppose I wish to transfer N bits of information from X to Y. A common implemenation of checksum error detection is to input the N bits sequentially into an n-bit linear feedback shift register at X concurrently with transferring them to Y. After the transfer is complete, the n bits remaining in the shift register are then sequentially transferred to Y as well. At the destination Y, the process is repeated and the contents of the n-bit shift register after N bits have been input is compared with the last n bits received at Y. If they match, the transfer is assumed to have gone error free. Typical values for N is in the thousands of bits, typical values for n is 16 or 32.
barrel shifter/rotator (shift matrix rotator)
a large combinational network that allows an n-bit value to be rotated (or simply shifted) an arbitrary number of bit positions. Example, if n is 32, the 32-bit value can be shifted 0, 1, 2, ... 31 bits. The combinational structure could be implemented with 32 32-to-1 muxes, but is more efficiently implemented in multiple layers as a shift tree, as described in class. Inputs to a barrel shifter is the n-bit source and a log n bit encoding of the shift amount. Output is the rotated result.
interleaving *
memory bank *
privilege *
protection *
user space *
system space *
CAM *
virtual memory *
physical memory *
chip enable *
write enable
A signal that must be asserted if a storage structure is to be written. In the technology we use in 360N, if the signal is asserted throughout the cycle, the storage device is written at the end of the cycle.
page *
frame *
mapping virtual to physical addresses *
page fault *
working set *
balance set *
page table *
PSR *
PTE *
PFN *
thrashing *
length register
A register that is part of the virtual memory management structure, identifying how many pages are described by the corresponding page table. This is needed to protect against false translations wherein the page number computed from the virtual address may correspond to a page that is not part of the virtual address space that is mapped. For example, if an executable image consists of 6 pages, the page table will contain 6 PTEs, and the length register will contain the value 5 (pages 0, 1, ...5). Any virtual address whose bits that identify a page greater than 5 must therefore be a bad address. This is easy to detect by simply comparing the page- number bits to the contents of the length register.
page table base register *
valid bit *
modified bit *
reference bit *
resident *
access control violation * translation not valid *
context switch *
TLB *