Final Exam Annotated Buzzwords

Spring 2009

Note: buzzwords marked with an asterisk (*) are annotated in the Spring 2007 and Fall 2005 Buzzwords .

Superlinear speedup
On a plot of speedup vs number of processors, the 45 degree line reflects all points such that with k processors, we get speedup = k. Superlinear speedup refers to points above the 45 degree line -- that is points such that with k processors we get speedup greater than k. ...which begs the question: what are you forgetting or what are you hiding.
Omega network
an interconnection network that is a "half-way house" in some sense between a bus which has maximum contention and minimum cost and a full crossbar which has minimum contention and maximum cost. The omega network consists of an array of k by k crossbar switches. cost is proportional to the number of switch connections. Ergo, a k by k crossbar has a cost = k2. To implement an n by n omega network with k by k crossbars, the array has n/k rows of log-base-k(n) columns. Each column decodes log-base-2(k) bits of the address, and forwards the access request to the next column of k by k crossbars.
Buzbee's observation
Bill Buzbee observed that the time required to execute a task with p processors was not simply aT1/p + (1-a)T1, as specified by Amdahl's Law. He introduced an additional term sigma(p), such that Tp = aT1/p + (1-a)T1 + sigma(p). The extra term reflected the extra overhead required for coordinating the activities of the p processors.
Efficiency of a multiprocessor
A multiprocessor keeps p processors on call for Tp units of time. It may not use all p every cycle, but it ties them up nonetheless. A uniprocessor ties up one processor for T1 units of time. Thus, Efficiency is the degree to which the multiprocessor EFFECTIVELY utilizes its processors. That is, T1/(pTp).
Utilization of a multiprocessor
A multiprocessor ties up p processors for Tp units of time. Utilization is a measure of the fraction of time that the processors are actually doing useful work. That is, Op/pTp.
Standard Performance Equation
Performance = 1/T = 1/(l x cpi x cycle_time).
performance metric
The basic equation for performance is:
1 ÷ (N × CPI × t)
N is the number of instructions in the program being measured. CPI is the average number of cycles it takes to complete (retire) each instruction. t is the cycle time of the clock. Note that if make the clock smaller, I probably increase CPI. CPI vs. t is a fundamental tradeoff. Note also that t is 1/freq, and CPI is 1/IPC. Most machines quote their IPC values (bigger is better), although the microarchitect is more interested in CPI, since the microarchitect is interested in eliminating some of the cycles that contribute the CPI of an instruction. Recall, when we talked about virtual memory, I mentioned that the VAX 11-780 had a CPI of 10.5, that each VAX instruction on average made one memory reference, and that the page table translation mechanism took 22 cycles. The reason the CPI is only 10.5 is because there is a TLB on that machine which takes no extra time when it hits. Since it hits 95% of the time, only 5% of the time does the 22 cycle translation take place. 5% of 22 cycles is approximately one cycle. That is, approximately one cycle of the 10.5 CPI is due to virtual to physical memory translation.
SMP (Symmetric multiprocessor)
A multiprocessor, wherein any of the processors is capable of taking control of the kernel of the operating system. Co-processor: also called an attached processor. What we might call today an accelerator. A processor that can not do substantial work on its own. No PC, generally can not even do full decode. Usually just does the operation required by the instruction after being supplied with the source operands.
Functional languages
Non-procedural languages. Chacterized NOT by programs that execute step by step sequences. Characterized by single assignment. Variables can be assigned a value once and keeps that value for the duration. Thus, x=x+1 has no place in a single-assignment language, for example.
Fixed point arithmetic *
2's complement *
1's complement *
signed-magnitude *
word length *
short integer *
long integer *
Binary Coded Decimal (BCD) *
Shift and Add multiplier *
Booth's algorithm *
residue *
modulus *
pairwise relatively prime *
signed magnitude *
SIMD (data parallel) *
SPMD (single procedure multiple data set) *
array processor *
granularity of concurrency *
multiprocessor *
tightly coupled MP
see multiprocessor.
shared global memory
see multiprocessor.
loosely coupled MP
see multiprocessor.
message passing
see multiprocessor.
superscalar *
IPC (instructions per cycle) *
CPI (clock cycles per instruction) *
Flynn's bottleneck *
superpipelined *
scalability *
speedup *
baseline *
Amdahl's Law *
sequential (serial) bottleneck
see Amdahl's Law.
see Amdahl's Law. That part that can be executed in parallel.
NUMA (non-uniform memory access) *
VLIW (very large instruction word) *
lock step *
decoupled access/execute (DAE) *
dataflow node *
fine-grain dataflow *
coarse-grain dataflow *
copy node *
tokens *
firing rule *
conditional nodes *
relational nodes *
interconnection network *
hypercube (cosmic cube) *
bus *
full cross-bar switch *
tree *
mesh *
contention *
cost *
latency *