Final Exam Annotated Buzzwords
Spring 2009
Note: buzzwords marked with an asterisk (*) are annotated in the
Spring 2007 and Fall 2005 Buzzwords
.
- Superlinear speedup
- On a plot of speedup vs number of processors, the
45 degree line reflects all points such that with k processors, we get
speedup = k. Superlinear speedup refers to points above the 45 degree
line -- that is points such that with k processors we get speedup
greater than k. ...which begs the question: what are you forgetting or
what are you hiding.
- Omega network
- an interconnection network that is a "half-way house"
in some sense between a bus which has maximum contention and minimum
cost and a full crossbar which has minimum contention and maximum cost.
The omega network consists of an array of k by k crossbar switches.
cost is proportional to the number of switch connections. Ergo, a
k by k crossbar has a cost = k2. To implement an n by n omega network
with k by k crossbars, the array has n/k rows of log-base-k(n) columns.
Each column decodes log-base-2(k) bits of the address, and
forwards the access request to the next column of k by k crossbars.
- Buzbee's observation
- Bill Buzbee observed that the time required to
execute a task with p processors was not simply aT1/p + (1-a)T1, as
specified by Amdahl's Law. He introduced an additional term sigma(p),
such that Tp = aT1/p + (1-a)T1 + sigma(p). The extra term reflected the
extra overhead required for coordinating the activities of the p
processors.
- Efficiency of a multiprocessor
- A multiprocessor keeps p processors on call
for Tp units of time. It may not use all p every cycle, but it ties them
up nonetheless. A uniprocessor ties up one processor for T1 units of time.
Thus, Efficiency is the degree to which the multiprocessor EFFECTIVELY
utilizes its processors. That is, T1/(pTp).
- Utilization of a multiprocessor
- A multiprocessor ties up p processors for
Tp units of time. Utilization is a measure of the fraction of time that
the processors are actually doing useful work. That is, Op/pTp.
- Standard Performance Equation
- Performance = 1/T = 1/(l x cpi x
cycle_time).
- performance metric
- The basic equation for performance is:
1 ÷ (N × CPI × t)
N is the number of instructions in the program being measured. CPI is
the average number of cycles it takes to complete (retire) each instruction.
t is the cycle time of the clock. Note that if make the clock smaller,
I probably increase CPI. CPI vs. t is a fundamental tradeoff. Note also
that t is 1/freq, and CPI is 1/IPC. Most machines quote their IPC values
(bigger is better), although the microarchitect is more interested in CPI,
since the microarchitect is interested in eliminating some of the cycles
that contribute the CPI of an instruction. Recall, when we talked about
virtual memory, I mentioned that the VAX 11-780 had a CPI of 10.5, that each
VAX instruction on average made one memory reference, and that the page table
translation mechanism took 22 cycles. The reason the CPI is only 10.5 is
because there is a TLB on that machine which takes no extra time when it hits.
Since it hits 95% of the time, only 5% of the time does the 22 cycle translation
take place. 5% of 22 cycles is approximately one cycle. That is, approximately
one cycle of the 10.5 CPI is due to virtual to physical memory translation.
- SMP (Symmetric multiprocessor)
- A multiprocessor, wherein any of the
processors is capable of taking control of the kernel of the operating
system.
Co-processor: also called an attached processor. What we might call today
an accelerator. A processor that can not do substantial work on its own.
No PC, generally can not even do full decode. Usually just does the
operation
required by the instruction after being supplied with the source
operands.
- Functional languages
- Non-procedural languages. Chacterized NOT by
programs
that execute step by step sequences. Characterized by single assignment.
Variables can be assigned a value once and keeps that value for the
duration.
Thus, x=x+1 has no place in a single-assignment language, for example.
- Fixed point arithmetic *
- 2's complement *
- 1's complement *
- signed-magnitude *
- word length *
- short integer *
- long integer *
- Binary Coded Decimal (BCD) *
- Shift and Add multiplier *
- Booth's algorithm *
- residue *
- modulus *
- pairwise relatively prime *
- signed magnitude *
- SIMD (data parallel) *
- SPMD (single procedure multiple data set) *
- SISD *
- MIMD *
- array processor *
- granularity of concurrency *
- multiprocessor *
- tightly coupled MP
- see multiprocessor.
- shared global memory
- see multiprocessor.
- loosely coupled MP
- see multiprocessor.
- message passing
- see multiprocessor.
- superscalar *
- IPC (instructions per cycle) *
- CPI (clock cycles per instruction) *
- Flynn's bottleneck *
- superpipelined *
- scalability *
- speedup *
- baseline *
- Amdahl's Law *
- sequential (serial) bottleneck
- see Amdahl's Law.
- parallelizable
- see Amdahl's Law. That part that can be executed in parallel.
- NUMA (non-uniform memory access) *
- VLIW (very large instruction word) *
- lock step *
- decoupled access/execute (DAE) *
- dataflow node *
- fine-grain dataflow *
- coarse-grain dataflow *
- copy node *
- tokens *
- firing rule *
- conditional nodes *
- relational nodes *
- interconnection network *
- hypercube (cosmic cube) *
- bus *
- full cross-bar switch *
- tree *
- mesh *
- contention *
- cost *
- latency *