Exam 1 Annotated Buzzwords
Note: buzzwords marked with an asterisk (*) are annotated in the
Fall 2005 Exam 1 Buzzwords.
- science of tradeoff
- My tongue in cheek phrase to emphasize the importance
of tradeoffs to the discipline of computer architecture. Clearly, computer
architecture is more art than science. Science, we like to think, involves
a coherent body of knowledge, even though we have yet to figure out all the
connections. Art, on the other hand, is the result of individual expressions
of the various artists. Since each computer architecture is the result of the
individual(s) who specified it, there is no such completely coherent structure.
So, I opined if computer architecture is a science at all, it is a science of
tradeoffs. In class, we keep coming up with design choices that involve
tradeoffs. In my view, "tradeoffs" is at the heart of computer architecture.
- levels of transformation *
- how long it takes something to happen. memory latency -- the time it
takes from when the address is provided until the information stored at that
address is available. We will talk about interrupt latency after the spring
break -- the time it takes from the time some external device requests
service until the time that device starts getting serviced.
- Programming Logic Array. A combinational structure consisting of an
array of AND gates all producing outputs which are fed to an array of OR gates.
The general structure of a PLA sources all external inputs to each AND gate,
making each AND gate able to identify one input combination (row of a truth
table). Also, in the general structure, all AND gate outputs are available
as input to every OR gate. Those actually connected as inputs to an OR gate
are those that produce a 1 in the output column of the truth table of the
relevant logic function. Thus, in principal, any arbitrary logic function
of the external inputs can be implemented as the output of one of the OR gates.
In practical structures, the fan-in (maximum number of inputs) to an OR gate
is fixed, thereby limiting the ability of the PLA to implement all
combinational logic functions.
- shift and add multiplication algorithm
- Recall your experience with the
multiplication of decimal numbers. We systematically multiply the multiplicand
by each digit of the multiplier and write each corresponding temporary result
right-aligned with the corresponding multiplier digit. Then we add up these
temporary results, producing the product. In multiplying binary numbers, the
only digits are 1 and 0 so the multiplication step consists of simply copying
the multiplicand or writing all zeroes as each temporary result. If we are
multiplying n bit numbers, we get n temporary results. To add these temporary
results properly, we need to take into account their relative alignment.
This can be done by n-1 steps where each step includes the sum of all
temporary steps before it. That is, after step 1, we have correctly added the
temporary results for bits 1 and 2. After step i, we have correctly added
the temporary results for bits 1, 2, ... i+1. The (i+1)th step consists of
SHIFTING right the (i+2)th temporary result one bit so as to align it with the
sum of the first i steps and then ADDING the two quantities. Thus, the name
shift and add to describe binary multiplication.
- addressibility *
- endianness *
- algorithm *
- ISA *
- microarchitecture *
- static instruction stream
- The program as it sits in memory, or rather out
on the disk. Each instruction generated by the compiler exists once in the
static instruction stream.
- dynamic instruction stream
- The *sequence* of instructions as they are
executed. A single instruction (located in a single address of memory)
counts once in the static instruction stream. Its count vis-a-vis the
dynamic instruction stream is the number of times it is executed. The
dynamic instruction stream (often called a *trace*) gives a record of the
order that the intructions were carried out.
- state of machine
- Computer processing is all about systematically changing
the state of the machine in response to each instruction in the dynamic
execution of the program. That is a program executes by the machine being
in an initial state, and one by one each instruction execution changes the
state in accordance with the semantics (specification) of the instruction.
The machine state consists of the PC, the PSR, the general purpose registers,
the condition codes (if the ISA specifies condition codes), the contents of
each memory location, and the mapping registers that enable the memory
management process. More sophisticated ISAs provide additional state.
- design point
- a specification of the criteria to be optimized in the
design process. If we are interested in performance, or reliability, or
availability, or low cost, we colloquially say our design point is
performance, or reliability, or ...etc.
- assembly language
- a language very close in semantics to the ISA of the
machine, with the advantage that bit patterns are replaced by mnemonic
devices, and some additional useful features are often present. Nonetheless,
the defining characteristic of an assembly language is that the level of
control over changing the machine state is very much like what one would
have if one programmed in 0s and 1s. Also, sometimes called a "low level"
language to distinguish it from "high level" languages that are more tuned
to the expressiveness of humans. High level languages (more than 1000 out
there): C#, C++, Java, C, all the way back to antiquity, when Fortran and
Cobol showed up. Assembly language programs are translated by an assembler
which does little more than replace all the mnemonic devices with 0s and 1s.
It would be a tough job to try to assemble a program written in the assembly
language for ISA A into the 0s and 1s of ISA B. On the other hand, high level
language programs are generally translated by a compiler. Since the
instructions in the high level language are quite a bit removed from the
intricasies of any ISA, it is not difficult to change the ISA that a compiler
translates the high level language program into.
- label *
- opcode *
- pseudo opcode *
- symbol table *
- two pass assembler *
- instruction level simulator
- A simulator that changes its state at the
granularity of the instruction. That is, given the machine state and an
instruction, the instruction level simulator produces the state required by
the instruction. No within-instruction states are simulated.
- cycle level simulator
- A simulator that changes its state at the granularity
of the clock cycle. That is, given the machine state and the control signals
asserted during a clock cycle, the cycle level simulator produces the state
required by those control signals.
- address space *
- addressing mode *
- steering bit
- A bit in an instruction that specifies the interpretation of
other bits in the instruction, depending on the value of the steering bit.
For example, bit in the LC-3b is a steering bit, since it determines the
usage of bits[4:0] of the instruction.
- vector operation
- an operation that is performed on a set (or sets) of values
(referred to as vectors). For example, a vector A consists of n values A1,
A2, A3, ..An. And a vector B consists of n values B1, B2, ...Bn. A vector
add of vectors A and B would produce the vector consisting of A1+B1, A2+B2,
... An+Bn. We often refer to this mode as data-parallel, since there are
no dependences between the component adds, and they can all be executed in
- Fetching instructions or data early so that they are available
quickly when they are needed. For example, most microprocessors have units
of storage called L1 caches that can be accessed in one or two cycles. It is
useful during instruction fetch or data fetch to be able to access this element
in one or two cycles. This can be accomplished by obtaining the element from
a much slower unit of storage before it is really needed. We call that access
in advance of when needed *prefetching.*
- natural word length
- the generic size of elements processed in the machine.
Generally, the size of the registers. Generally, the size of values processed
by the ALU in a single cycle.
- multi-dimensional array
- a classical mathematical structure, having multiple
- dynamic/static interface
- the interface between when the compilation process
ends, and the run time process begins. The compiler translates the source
program producing 0s and 1s that specify the work to be done. The hardware
carries out the work of those 0s and 1s at run time. Although it is now
changing, the compiler never used to have run-time behavior available to it
to make its decisions. It did not know which cache accesses missed, which
branches were actually taken, how long certain operate instructions took.
It is true that compilers have for a long time, as part of the compilation
process, observed what happened when the program ran on some sample data.
This is called *profiling.* But that data is not the *real* data, and in
fact can lead to decisions that could be counterproductive, depending on how
close to the sample data the real data is. We refer to the DSI as the
interface between when all activity is done without real knowledge of real
run-time data, and when all activity has that knowledge.
- fixed/variable length instruction *
- uniform decode *
- load/store ISA *
- 0,1,2,3 address machine *
- data type *
- indirect addressing
- An addressing mode, where the address (A) of the source
data is contained in a memory location (B), the address of B is computed by
the addressing mode. We say that B is a pointer variable because its contents
is not data but rather the address A of the data.
- unaligned access *
- arithmetic/logic shift *
- condition codes *
- virtual machine (emulation)
An ISA is implemented in hardware. Each
instruction in that ISA is fetched, decoded, etc. and at each step of the
instruction cycle, control signals are generated to carry out the phases of
that instruction. What if you wanted a program compiled into the ISA of
machine A to be executed on a machine that has a different ISA, say ISA B.
Clearly, the 0s and 1s of the ISA of A have no meaning with respect to the
machine of ISA B. We call the ISA of machine A a virtual machine because
it does not correspond to real hardware in our situation. What is needed
is a program (usually called an emulator) that translates each instruction
of ISA A into a sequence of instructions from the ISA B that can then be
carried out on the machine of ISA B and produce the same result as it would
have if the ISA A instruction had been carried out on a machine of ISA A.
We often refer to ISA B as the host machine since that is where the results
are actually being produced. And ISA A the target machine since that is the
code produced by the compiler from the original program. The concept has been
around for a very long time, although it is lately gaining a lot of popularity
both in research universities and product development centers. An early
example was the Burroughs 1700 of the 1970s. Programs were compiled to
three different virtual machines, depending on whether the nature of the code
was scientific, business, or system level in nature. The resulting three
different ISAs were then emulated on the host B1700 machine. More recently,
there have been plenty of Java programs that have been compiled into an ISA
referred to as the Java Virtual Machine. The result is that other companies
have produced software emulator to translate each JVM instruction into the ISA
of its own host machine. So, for example, a Pentium M can run JVM instructions
if it wishes by emulating each JVM instruction into a sequence of x86
instructions that does that job.
An analogy to virtual memory is in order. Like virtual memory, a virtual
machine exists in the eye of the beholder. Recall a virtual address has to
be translated to a physical address before it can be accessed. A virtual
machine instruction has to be translated to a sequence of instructions
in the host ISA before it can be carried out.
- VLIW *
- flow dependency
- the dependency between a result produced by one instruction
that is needed by another instruction. For example the expression A(B+C).
We can not do the multiply until we do the add. We say there is a flow
dependency between the add and the multiply. A popular well-meaning textbook
refers to this as a RAW hazard, meaning that we can not read the source of the
multiply until after we write the result of the add. I personally do not like
this terminology because it totally distorts the meaning of the word hazard.
We will deal with this after Spring break, so you can ignore this for now.
- In 1980, the term RISC was coined to characterize simple ISAs.
The term was an acronym for Reduced Instruction Set Computer, but was
ill-advised on a lot of levels, which we may go into late in the semester
if there is time. Suffice it to say right now, that the intent was to have
each instruction do a single operation in a very streamlined way. SPARC,
MIPS, HPPA, Alpha, Motorola 88000 Power-PC are all examples of ISAs that
attempted more or less to follow that mantra. x86, Motorola 68000, VAX
IBM Zseries are all examples of ISAs that did not. The acronym czars referred
to the "other" ISAs as Complicated Instruction Set Computers. For example,
the x86 which can load from memory, add to a register and store back to memory
in a single x86 instruction -- three distinct operations. ...which led their
advocates to shoot back: CISC stood for Comprehensive Instruction Set Computer.
- Load/Store architecture
- The term refers to ISAs that require separate load
and store instructions to move data to and from memory. That is a load/store
ISA does not allow operate instructions to operate on data loaded from memory
in the same instruction. Nor, does it allow the result of an operate
instruction to be written to memory in the same instruction. So-called RISC
ISAs were in general Load/Store. So-called CISC ISAs were in general not.
- semantic gap *
- control signals
- signals that control the processing within the data path, the
memory, and the microsequencer. The set of control signals that are effective
during a single clock cycle are often characterized as a microinstruction.
- datapath *
- critical path (speed path) design *
- microsequencer *
- control store *
- microinstruction (microcode) *
- pipelining (intra-instruction parallelism) *
- interrupt *
- exception *
- cache memory *
- atomic unit
- a unit that can not be further partitioned without losing its
intrinsic identity. It takes its meaning from chemistry where an atom consists
of protons, neutrons, and electrons, but once partitioned into these particles,
ceases to have the identifying characteristics of the atom. Oxygen is no
longer oxygen when the 6 electrons and 6 protons (if I remember my high school
chemistry correctly) are lopped off. We say an instruction is the atomic unit
of processing since we execute code at that lowest granularity. That is, we
never execute a piece of an instruction. Either the whole instruction or none
of it. That is, we do not change the machine state corresponding to a piece
of an instruction and leave the machine state corresponding to another piece
of the same instruction unchanged.
- SRAM *
- DRAM *
- page mode *
- RAS *
- CAS *
- byte write
- Although memory is usually byte addressable, the memory bus is
usually multiple bytes wide. A byte write is a memory access which stores
one byte of information into a designated byte-addressable location while
ignoring the rest of the bytes on the memory bus. Implementation requires
write enable signals of byte granularity.
- soft error (Updated: 03/29/2007)
A soft error is an error that is not hard. Duh.
So, what does that mean? We distinguish errors as transients errors
and permanent errors. A transient error is one that occurs due to some
event that for purposes of the computer is random. Someone started up
a washing machine in the same room, and there is not enough noise margin.
Or, some alpha particle got in the way, which gets more and more common
as the clock gets more and more into the GHz range. The idea is that if
you replay the program, the error will not show up the second time.
The other kind of error is the permanent error. Here the device is broken
and always give you the bad result.
Permanent errors are called hard errors.
Transient errors are called soft errors.
- parity *
- ECC *
- Hamming Code *
- the bits used to detect whether a burst error has occurred within
a large transfer of information, where the errors are not statistically
independent. Suppose I wish to transfer N bits of information from X to Y.
A common implemenation of checksum error detection is to input the N bits
sequentially into an n-bit linear feedback shift register at X concurrently
with transferring them to Y. After the transfer is complete, the n bits
remaining in the shift register are then sequentially transferred to Y as
well. At the destination Y, the process is repeated and the contents of
the n-bit shift register after N bits have been input is compared with the
last n bits received at Y. If they match, the transfer is assumed to have
gone error free. Typical values for N is in the thousands of bits, typical
values for n is 16 or 32.
- barrel shifter/rotator (shift matrix rotator)
- a large combinational network
that allows an n-bit value to be rotated (or simply shifted) an arbitrary
number of bit positions. Example, if n is 32, the 32-bit value can be shifted
0, 1, 2, ... 31 bits. The combinational structure could be implemented with
32 32-to-1 muxes, but is more efficiently implemented in multiple layers as
a shift tree, as described in class. Inputs to a barrel shifter is the n-bit
source and a log n bit encoding of the shift amount. Output is the rotated
- interleaving *
- memory bank *
- privilege *
- protection *
- user space *
- system space *
- CAM *
- virtual memory *
- physical memory *
- chip enable *
- write enable
- A signal that must be asserted if a storage structure
is to be written. In the technology we use in 360N, if the signal is
asserted throughout the cycle, the storage device is written at the
end of the cycle.
- page *
- frame *
- address translation *
- mapping virtual to physical addresses *
- page fault *
- working set *
- balance set *
- page table *
- PSR *
- PTE *
- PFN *
- thrashing *
- length register
- A register that is part of the virtual memory management
structure, identifying how many pages are described by the corresponding
page table. This is needed to protect against false translations wherein
the page number computed from the virtual address may correspond to a page
that is not part of the virtual address space that is mapped. For example,
if an executable image consists of 6 pages, the page table will contain 6
PTEs, and the length register will contain the value 5 (pages 0, 1, ...5).
Any virtual address whose bits that identify a page greater than 5 must
therefore be a bad address. This is easy to detect by simply comparing the
page- number bits to the contents of the length register.
- page table base register *
- valid bit *
- modified bit *
- reference bit *
- resident *
- access control violation *
translation not valid *
- context switch *
- TLB *