Exam 1 Annotated Buzzwords
Note: buzzwords marked with an asterisk (*) are annotated in the
Spring 2007 Exam 1 Buzzwords
and Fall 2005 Exam 1 Buzzwords.
- Embarrassingly Parallel
A term given to applications with such regular structure that it is easy to
keep as many processors as one has in the system busy most of the time
working on an application. The most common set of problems that have this
property are scientific problems and graphics problems. For example a
program that contains the statement: for i=1 to infinity, ... will take
the branch backward a humongous number of times, and branch prediction
trivial. If the loop body of each iteration are independent of each other,
then we can do as many loop bodies concurrently as we have processors
to do so. Inverting a large matrix is an example this. The more processors,
the less time to do the entire job.
- Instruction Cycle
The time it takes to process one instruction, from fetching the instruction
storing the result.
- Fall-through Path
For conditional branches, the path that follows a branch not taken.
- Dual Fetch
An instruction fetch mechanism that can fetch two instructions each clock
cycle. Generally combined with dual decode, dual rename and lots of
functional units, so the dual fetch does not create a bottleneck waiting
for functional units (adders, multipliers, and the like).
- Dynamic Recompilation
Compiling that happens after run time has commenced, which enables the
to revisit decisions based on the code running the actual (not profiled)
- Microcode vs Micro-op
Microcode is a generic term to describe all control signals at the
microarchitecture level. A micro-op is usually a single control signal.
Sometimes called micro-command. A microinstruction is the set of micro-ops
that are active during a clock cycle.
Two instructions (usually operates) located sequentially in the I stream,
wherein the result of the first is a source of the second.
- Propagation Delays
A propagation delay of a structure is the time it takes the output of a
structure to reflect a change in values at the inputs of that structure.
The term can be applied to an individual gate. The term can also be applied
to the collection of gates that operate during a clock cycle. In fact, logic
design involves keeping track of the propagation delay of each gate in order
to ensure that the propagation delay of the entire circuit is less than the
clock cycle time.
- Row Buffer
We access a DRAM with the address of the datum we want. We generally do
this in two steps, first latching the high order bits of the desired address
(accompanied by the RAS signal), then latching the low order bits of the
desired address (accompanied by the CAS signal). As a result of latching the
high order address bits, data at all addresses having these high order
bits are latched into a structure. That structure is called a row buffer.
- Row Buffer Hit
An access that hits in the row buffer. That is, an access to the same row
as the previous access.
- Bit Line
In memory, a line that ties together the same bit of all words. For example,
a bit line ties together A, B, C of memory words A,B,C.
- Word Line
In memory, a word line consists of all bits that make up the contents of a
word of memoy.
- Destructive Reading
Reading that destroys the value read, and therefore needs to be restored.
example, in a DRAM cell, "reading" discharges a capacitor. Therefore, it
be combined with subsequent logic that recharges the capacitor to the point
it was at before the read was initiated.
- Memory Rank
A set of memory chips whose addresses all agree in the high bit positions.
For example, consider one GB of physical memory, made up of 2MB chips.
it is 16 way interleaved, and accessible via a 64 bit bus. The 30 address
address the memory chips, as follows: Bits[2:0] don't address the memory
Bits[6:3] determine the bank of memory. They can enable the tri-state
Bits[27:7] identify the address of each individual chip. That leaves
bits[29:28] to Chip Enable the appropriate "row" of memory chips. In this
there are 4 such rows. We call the "row" in this context the "rank" of the
memory. Again, all locations in the same rank have the same value for
address bits [29:28].
- Streaming Workload
Streaming generally refers to large units of data that are loaded
from memory, then operated on, after which they are typically sequentially
stored back to memory.
With n bits, one can uniquely identify 2^n items. Suppose one wants to
uniquely identify more than 2^n things. One could resort to overlays, as
follows: One could identify 2^n items whose overlay vaule had a value of 0,
an additional 2^n items whose overlay is 1, a third 2^n items whose
overlay value is 2, etc. That is, if the overlay register can have k
values, then with overlays, one can identify k * 2^n items with n bits.
- SIB Byte
A byte in the x86 ISA that extends the ISA's addressing mode capability.
You know that you can compute an effective address of an operand by adding
an offset to a register. With an SIB byte, one can augment that address
computation by adding the contents of a Base register and/or the contents
of an Index register (after multiplying the contents of the index register
by a scaled amount). The modR/M byte specifies whether an SIB
(Scale-Index-Base) byte is included in the x86 instruction. If yes, the
effective address computation includes the augmentation described above.
- floating point coprocessor
A large piece of circuitry that is used to perform floating point operations.
In the old days, the number of transistors on a chip was sufficiently small
that to perform floating point arithmetic in hardware required a separate
chip. Separate from the processor chip --> co-processor chip. By the time
486 was introduced (late 1980s), there were enough transistors on the chip to
have the floating point operations performed on the same chip and not require
a separate chip. From that point on, people have more and more forsaken the
name floating point coprocessor in favor of "floating point unit."
- condition code 'duals'
Two condition code tests that disagree in each position. For example, p=n=1
z=0 on the one hand, and p=n=0 and z=1. The first yields TRUE if the two
source operands are unequal; the second yields TRUE if the two source
An operand that has a single value. For example, the integer 5.
Scalar operations are performed by scalar functional units on scalar
An operand that has more than one component value. For example the sequence
5,3,7,9,7. If this was the value of a vector V, then V1=5, V2=3, V3=7, etc.
Vector operations are performed by vector functional units on vector values.
For example, the vector add of 5,3,7,9,7 and 4,-3,2,-7,1 in a vector adder
Read Only Memory. Memory that can not be written into. It can only be read.
Each bit has been hard-wired to the value 0 or 1.
Single error detect. Applies to a code that can only detect single bit
For example, parity protection is SED.
A code (often referred to as Hamming Codes or ECC codes) that will identify
single bit errors, thereby allowing them to be corrected, and will also allow
two bit errors to be detected, although not corrected. The acronym stands
single error correct, double error detect.
- Process Control Block
A data structure that identifies all the state information about a single
process. This data structure is stored by the operating sytem, and loaded
when a process gets control of the processor, and saved when a process
loses control of the processor
- "monotonically non-decreasing level of access"
"monotonically" means always in the same directcion. "non-decreasing" means
never getting smaller. Ergo, monitonically nondecreasing mean "increasing or
staying the same." In the context of the usage in class, as privilege goes
from lowest privilege level to highest privilege level, the allowed accesses
never get lessened.
- Horizontal Microcode
Microcode in which the fields in a microinstruction are mostly independent.
Microinstructions are generally made up of many more bits than is the case
for vertical microinstructions. The independence of the fields generally
allows for a richer set of control options for an individual
plus the ability to specify more than one concurrent operation in a single
microinstruction. This generally results in far fewer microinstructions to
implement a specific task than would be required for vertical microcode.
- Vertical Microcode
Microcode in which the fields in a microinstruction are mostly interdependent
and combine to perform a single micro-op. This generally results in many
microinstructions to implement a specific task. Vertical microinstructions
resemble machine instructions in the sense that each microinstruction
performs a single micro-command.
- Edge-triggered flipflop, Master/Slave flipflop, Transparent Latch
Flip-flops have the property that they can be read and written in the same
cycle. During the cycle, the value stored in the flipflop is what is read,
while at the end of the cycle the value to be written is actually written.
Transparent latches, on the other hand, do not generally allow values to be
read and written during the same cycle, since values are written as soon as
they get to the latch (i.e., they do not wait until the end of the cycle)
which means that subsequent reads of the latch will read the value just
Two common types of flipflops are the edge-triggered flipflop and the
master/slave flipflop. The edge-triggered flipflop behaves as described
above: Throughout the clock cycle, the current value is read, and nothing
is written. At the end of the clock cycle (on the clock edge), the value
to be written during that cycle is actually written, making it available to
be read in the next clock cycle. The master/slave flipflop consists of two
transparent latches. Call them A and B. During the clock cycle, B is read
and A is written. That is, the combinational logic that is processing the
contents of the master/slave flipflop sources the output of B, and the result
of the combinational logic is written to A. To make this work, we gate A
with CLK and we gate B with NOT-CLK. Thus, during the first half of the
clock cycle, the combinational logic sources B (which cannot change since
NOT-CLK=0) and produces a result which is written into A (since CLK=1).
In the second half of the clock cycle, the value of A produced in the first
half of the clock cycle is written into B (since NOT-CLK=1). This could
change the output of the combinational logic, but since CLK=0 during the
second half of the clock cycle, that result can not be written into A. Thus,
CLK and NOT-CLK isolates the reads and writes so that we never have the case
that a value written can then be read in the same clock cycle.
- Supervisor Stack, Kernel Stack, Interrupt Stack
Many ISAs provide a system stack for use of the operating system, and
per process stacks for each privilege level that a process can run in.
The system stack is sometimes referred to as the Interrupt Stack since
its most important function is push/pop state information on the system stack
during the process of handling interrupts. If the ISA specifies two
levels (like Unix), there is often implemented a Supervisor stack and a User
stack. If the ISA specifies four levels of privilege (like VAX), then the
microarchitecture often implements four separate stacks. One reason for
the multiple stacks is to prevent unauthorized access to data remaining on
a higher level stack after that data has been popped, but not removed.
- science of tradeoff *
- levels of transformation *
- latency *
- PLA *
- shift and add multiplication algorithm *
- addressibility *
- endianness *
- algorithm *
- ISA *
- microarchitecture *
- static instruction stream *
- dynamic instruction stream *
- state of machine *
- design point *
- assembly language *
- label *
- opcode *
- pseudo opcode *
- symbol table *
- two pass assembler *
- instruction level simulator *
- cycle level simulator *
- address space *
- addressing mode *
- steering bit *
- vector operation *
- prefetching *
- natural word length *
- multi-dimensional array *
- dynamic/static interface *
- fixed/variable length instruction *
- uniform decode *
- load/store ISA *
- 0,1,2,3 address machine *
- data type *
- indirect addressing *
- unaligned access *
- arithmetic/logic shift *
- condition codes *
- virtual machine (emulation) *
- VLIW *
- flow dependency *
- CISC/RISC *
- Load/Store architecture *
- semantic gap *
- control signals *
- datapath *
- critical path (speed path) design *
- microsequencer *
- control store *
- microinstruction (microcode) *
- pipelining (intra-instruction parallelism) *
- interrupt *
- exception *
- cache memory *
- atomic unit
- a unit that can not be further partitioned without losing its
intrinsic identity. It takes its meaning from chemistry where an atom consists
of protons, neutrons, and electrons, but once partitioned into these particles,
ceases to have the identifying characteristics of the atom. Oxygen is no
longer oxygen when the 6 electrons and 6 protons (if I remember my high school
chemistry correctly) are lopped off. We say an instruction is the atomic unit
of processing since we execute code at that lowest granularity. That is, we
never execute a piece of an instruction. Either the whole instruction or none
of it. That is, we do not change the machine state corresponding to a piece
of an instruction and leave the machine state corresponding to another piece
of the same instruction unchanged.
- SRAM *
- DRAM *
- page mode *
- RAS *
- CAS *
- byte write *
- soft error *
- parity *
- ECC *
- Hamming Code *
- checksum *
- barrel shifter/rotator (shift matrix rotator) *
- interleaving *
- memory bank *
- privilege *
- protection *
- user space *
- system space *
- CAM *
- virtual memory *
- physical memory *
- chip enable *
- write enable *
- page *
- frame *
- address translation *
- mapping virtual to physical addresses *
- page fault *
- working set *
- balance set *
- page table *
- PSR *
- PTE *
- PFN *
- thrashing *
- length register *
- page table base register *
- valid bit *
- modified bit *
- reference bit *
- resident *
- access control violation *
translation not valid *
- context switch *
- TLB *