Exam 1 Annotated Buzzwords

Spring 2009

Note: buzzwords marked with an asterisk (*) are annotated in the Spring 2007 Exam 1 Buzzwords and Fall 2005 Exam 1 Buzzwords.

Embarrassingly Parallel
A term given to applications with such regular structure that it is easy to keep as many processors as one has in the system busy most of the time working on an application. The most common set of problems that have this property are scientific problems and graphics problems. For example a program that contains the statement: for i=1 to infinity, ... will take the branch backward a humongous number of times, and branch prediction becomes trivial. If the loop body of each iteration are independent of each other, then we can do as many loop bodies concurrently as we have processors available to do so. Inverting a large matrix is an example this. The more processors, the less time to do the entire job.
Instruction Cycle
The time it takes to process one instruction, from fetching the instruction to storing the result.
Fall-through Path
For conditional branches, the path that follows a branch not taken.
Dual Fetch
An instruction fetch mechanism that can fetch two instructions each clock cycle. Generally combined with dual decode, dual rename and lots of functional units, so the dual fetch does not create a bottleneck waiting for functional units (adders, multipliers, and the like).
Dynamic Recompilation
Compiling that happens after run time has commenced, which enables the compiler to revisit decisions based on the code running the actual (not profiled) data.
Microcode vs Micro-op
Microcode is a generic term to describe all control signals at the microarchitecture level. A micro-op is usually a single control signal. Sometimes called micro-command. A microinstruction is the set of micro-ops that are active during a clock cycle.
Two instructions (usually operates) located sequentially in the I stream, wherein the result of the first is a source of the second.
Propagation Delays
A propagation delay of a structure is the time it takes the output of a structure to reflect a change in values at the inputs of that structure. The term can be applied to an individual gate. The term can also be applied to the collection of gates that operate during a clock cycle. In fact, logic design involves keeping track of the propagation delay of each gate in order to ensure that the propagation delay of the entire circuit is less than the clock cycle time.
Row Buffer
We access a DRAM with the address of the datum we want. We generally do this in two steps, first latching the high order bits of the desired address (accompanied by the RAS signal), then latching the low order bits of the desired address (accompanied by the CAS signal). As a result of latching the high order address bits, data at all addresses having these high order address bits are latched into a structure. That structure is called a row buffer.
Row Buffer Hit
An access that hits in the row buffer. That is, an access to the same row as the previous access.
Bit Line
In memory, a line that ties together the same bit of all words. For example, a bit line ties together A[28], B[28], C[28] of memory words A,B,C.
Word Line
In memory, a word line consists of all bits that make up the contents of a word of memoy.
Destructive Reading
Reading that destroys the value read, and therefore needs to be restored. For example, in a DRAM cell, "reading" discharges a capacitor. Therefore, it must be combined with subsequent logic that recharges the capacitor to the point it was at before the read was initiated.
Memory Rank
A set of memory chips whose addresses all agree in the high bit positions. For example, consider one GB of physical memory, made up of 2MB chips. Assume it is 16 way interleaved, and accessible via a 64 bit bus. The 30 address bits address the memory chips, as follows: Bits[2:0] don't address the memory chips. Bits[6:3] determine the bank of memory. They can enable the tri-state drivers. Bits[27:7] identify the address of each individual chip. That leaves bits[29:28] to Chip Enable the appropriate "row" of memory chips. In this case there are 4 such rows. We call the "row" in this context the "rank" of the memory. Again, all locations in the same rank have the same value for address bits [29:28].
Streaming Workload
Streaming generally refers to large units of data that are loaded sequentially from memory, then operated on, after which they are typically sequentially stored back to memory.
With n bits, one can uniquely identify 2^n items. Suppose one wants to uniquely identify more than 2^n things. One could resort to overlays, as follows: One could identify 2^n items whose overlay vaule had a value of 0, an additional 2^n items whose overlay is 1, a third 2^n items whose overlay value is 2, etc. That is, if the overlay register can have k distinct values, then with overlays, one can identify k * 2^n items with n bits.
SIB Byte
A byte in the x86 ISA that extends the ISA's addressing mode capability. You know that you can compute an effective address of an operand by adding an offset to a register. With an SIB byte, one can augment that address computation by adding the contents of a Base register and/or the contents of an Index register (after multiplying the contents of the index register by a scaled amount). The modR/M byte specifies whether an SIB (Scale-Index-Base) byte is included in the x86 instruction. If yes, the effective address computation includes the augmentation described above.
floating point coprocessor
A large piece of circuitry that is used to perform floating point operations. In the old days, the number of transistors on a chip was sufficiently small that to perform floating point arithmetic in hardware required a separate chip. Separate from the processor chip --> co-processor chip. By the time the 486 was introduced (late 1980s), there were enough transistors on the chip to have the floating point operations performed on the same chip and not require a separate chip. From that point on, people have more and more forsaken the name floating point coprocessor in favor of "floating point unit."
condition code 'duals'
Two condition code tests that disagree in each position. For example, p=n=1 and z=0 on the one hand, and p=n=0 and z=1. The first yields TRUE if the two source operands are unequal; the second yields TRUE if the two source operands are equal.
An operand that has a single value. For example, the integer 5. Scalar operations are performed by scalar functional units on scalar values.
An operand that has more than one component value. For example the sequence 5,3,7,9,7. If this was the value of a vector V, then V1=5, V2=3, V3=7, etc. Vector operations are performed by vector functional units on vector values. For example, the vector add of 5,3,7,9,7 and 4,-3,2,-7,1 in a vector adder is 9,0,9,2,8.
Read Only Memory. Memory that can not be written into. It can only be read. Each bit has been hard-wired to the value 0 or 1.
Single error detect. Applies to a code that can only detect single bit errors. For example, parity protection is SED.
A code (often referred to as Hamming Codes or ECC codes) that will identify single bit errors, thereby allowing them to be corrected, and will also allow two bit errors to be detected, although not corrected. The acronym stands for single error correct, double error detect.
Process Control Block
A data structure that identifies all the state information about a single process. This data structure is stored by the operating sytem, and loaded when a process gets control of the processor, and saved when a process loses control of the processor - "monotonically non-decreasing level of access" "monotonically" means always in the same directcion. "non-decreasing" means never getting smaller. Ergo, monitonically nondecreasing mean "increasing or staying the same." In the context of the usage in class, as privilege goes from lowest privilege level to highest privilege level, the allowed accesses never get lessened.
Horizontal Microcode
Microcode in which the fields in a microinstruction are mostly independent. Microinstructions are generally made up of many more bits than is the case for vertical microinstructions. The independence of the fields generally allows for a richer set of control options for an individual microinstruction, plus the ability to specify more than one concurrent operation in a single microinstruction. This generally results in far fewer microinstructions to implement a specific task than would be required for vertical microcode.
Vertical Microcode
Microcode in which the fields in a microinstruction are mostly interdependent and combine to perform a single micro-op. This generally results in many more microinstructions to implement a specific task. Vertical microinstructions resemble machine instructions in the sense that each microinstruction generally performs a single micro-command.
Edge-triggered flipflop, Master/Slave flipflop, Transparent Latch
Flip-flops have the property that they can be read and written in the same cycle. During the cycle, the value stored in the flipflop is what is read, while at the end of the cycle the value to be written is actually written. Transparent latches, on the other hand, do not generally allow values to be read and written during the same cycle, since values are written as soon as they get to the latch (i.e., they do not wait until the end of the cycle) which means that subsequent reads of the latch will read the value just written. Two common types of flipflops are the edge-triggered flipflop and the master/slave flipflop. The edge-triggered flipflop behaves as described above: Throughout the clock cycle, the current value is read, and nothing is written. At the end of the clock cycle (on the clock edge), the value to be written during that cycle is actually written, making it available to be read in the next clock cycle. The master/slave flipflop consists of two transparent latches. Call them A and B. During the clock cycle, B is read and A is written. That is, the combinational logic that is processing the contents of the master/slave flipflop sources the output of B, and the result of the combinational logic is written to A. To make this work, we gate A with CLK and we gate B with NOT-CLK. Thus, during the first half of the clock cycle, the combinational logic sources B (which cannot change since NOT-CLK=0) and produces a result which is written into A (since CLK=1). In the second half of the clock cycle, the value of A produced in the first half of the clock cycle is written into B (since NOT-CLK=1). This could change the output of the combinational logic, but since CLK=0 during the second half of the clock cycle, that result can not be written into A. Thus, CLK and NOT-CLK isolates the reads and writes so that we never have the case that a value written can then be read in the same clock cycle.
Supervisor Stack, Kernel Stack, Interrupt Stack
Many ISAs provide a system stack for use of the operating system, and per process stacks for each privilege level that a process can run in. The system stack is sometimes referred to as the Interrupt Stack since its most important function is push/pop state information on the system stack during the process of handling interrupts. If the ISA specifies two privilege levels (like Unix), there is often implemented a Supervisor stack and a User stack. If the ISA specifies four levels of privilege (like VAX), then the microarchitecture often implements four separate stacks. One reason for the multiple stacks is to prevent unauthorized access to data remaining on a higher level stack after that data has been popped, but not removed.
science of tradeoff *
levels of transformation *
latency *
shift and add multiplication algorithm *
addressibility *
endianness *
algorithm *
microarchitecture *
static instruction stream *
dynamic instruction stream *
state of machine *
design point *
assembly language *
label *
opcode *
pseudo opcode *
symbol table *
two pass assembler *
instruction level simulator *
cycle level simulator *
address space *
addressing mode *
steering bit *
vector operation *
prefetching *
natural word length *
multi-dimensional array *
dynamic/static interface *
fixed/variable length instruction *
uniform decode *
load/store ISA *
0,1,2,3 address machine *
data type *
indirect addressing *
unaligned access *
arithmetic/logic shift *
condition codes *
virtual machine (emulation) *
flow dependency *
Load/Store architecture *
semantic gap *
control signals *
datapath *
critical path (speed path) design *
microsequencer *
control store *
microinstruction (microcode) *
pipelining (intra-instruction parallelism) *
interrupt *
exception *
cache memory *
atomic unit
a unit that can not be further partitioned without losing its intrinsic identity. It takes its meaning from chemistry where an atom consists of protons, neutrons, and electrons, but once partitioned into these particles, ceases to have the identifying characteristics of the atom. Oxygen is no longer oxygen when the 6 electrons and 6 protons (if I remember my high school chemistry correctly) are lopped off. We say an instruction is the atomic unit of processing since we execute code at that lowest granularity. That is, we never execute a piece of an instruction. Either the whole instruction or none of it. That is, we do not change the machine state corresponding to a piece of an instruction and leave the machine state corresponding to another piece of the same instruction unchanged.
page mode *
byte write *
soft error *
parity *
Hamming Code *
checksum *
barrel shifter/rotator (shift matrix rotator) *
interleaving *
memory bank *
privilege *
protection *
user space *
system space *
virtual memory *
physical memory *
chip enable *
write enable *
page *
frame *
address translation *
mapping virtual to physical addresses *
page fault *
working set *
balance set *
page table *
thrashing *
length register *
page table base register *
valid bit *
modified bit *
reference bit *
resident *
access control violation * translation not valid *
context switch *