Department of Electrical and Computer Engineering
The University of Texas at Austin

EE 360N
Spring 2001
Y. N. Patt, Instructor
Kameswar Subramaniam, Onur Mutlu, TAs
Problem Set 4
Due: April 16, 2001
 

1. Given the following code segment:

for (i=1; i<=100; i++)
A[i] = (B[i] * C[i] + D[i])/2;
Write Cray-like assembly code to perform the calculation. Then compute the number of cycles required for the code segment to execute on the following machines:
a) Scalar processor
b) Vector processor without chaining, 1 port to memory (1 load or store / cycle)
c) Vector processor with chaining, 1 port to memory
d) Vector processor with chaining, 2 read ports and 1 write port to memory
Assume each machine has vector registers of length 64. The multiply, add, and load units are pipelined and take 6, 4, and 11 cycles, respectively, to complete one operation. Memory is 16-way interleaved.
 
 

2. (Tanenbaum, 4th edition: Appendix A, 10,11,12) Signed decimal numbers consisting of n digits can be represented in n+1 digits without a sign.Positive numbers have 0 as the leftmost digit. Negative numbers are formed by subtracting each each digit from 9. Such numbers are called nine's complement numbers and are analogous to one's complement binary numbers.Express the following as three-digit nine's complement numbers: 6, -2, 100, -14, -1,0.

Determine the rule for addition of nine's complement numbers and then perform the following additions.

 0001  0001  9997  9241
+9999 +9998 +9996 +0802
----- ----- ----- -----


Ten's complement is analogous to two's complement.A ten's complement negative number is formed by adding 1 to the corresponding nine's complement number, ignoring the carry. What is the rule for ten's complement addition?
 

3. (Tanenbaum, 4th edition: Appendix B, 4,5,6) The following binary floating-point numbers consist of a sign bit, an excess 63, radix 2 exponent, an a 16-bit fraction. Normalize them.

a. 0 1000000 0001010100000001
b. 0 0111111 0000001111111111
c. 0 1000011 1000000000000000
To add two floating point numbers, you must adjust the exponents(by shifting the fraction) to make them the same. Then you can add the fractions and normalize the result if need be.Add the single precision IEEE numbers 3EE00000H and 3D800000H and express the normalized result in hexadecimal.

The Tightwad Computer Company has decided to come out with a machine having 16-bit floating point numbers. The model 0.001 has a floating-point format with a sign bit, 7 bit, excess 63 exponent and 8-bit fraction.Model 0.002 has a sign bit, 5-bit excess 15 exponent and a 10-bit fraction.Both use radix 2 exponentiation. Assume that the all 0s and all 1s exponents are reserved similar to the IEEE standard. What are the smallest and largest positive normalized numbers on both models? About how many decimal digits of precision does each have? Would you buy either one?
 

4. Booth's algorithm for multiplication, speeds up the shift/add process by consuming 2 bits during each shift/add step. But why stop at 2 bits? Your task: design the data path and control (like the 2bit Booth's algorithm) for consuming up to 4 bits at each step. Each step will be one or two shifts combined with either an add or a subtract.

Control inputs to the data path are: number of bits to be shifted before the add/subtract, whether to add, subtract, or add 0, and the number of bits to be shifted after the add/subtract/zero operation. Also whether to set or clear the *carry-control* bit. Complete the truth table below. Each input combination corresponds to the low-order four bits of the multiplier at the time of the iteration step.

 Carry-ctl mpy bits | carry-ctl pre-sh +/-/0 post-sh
--------------------|--------------------------------------
             0 0000 | Clear        0     0      4
             0 0001 | Clear        0     +      4
             0 0010 | Clear        1     +      3
             0 0011 | Set          0     -      2
             0 0100 | Clear        2     +      2
             0 0101 | Clear        0     +      2
             0 0110 | Set          1     -      2
             0 0111 |
             0 1000 |
             0 1001 |
             0 1010 |
             0 1011 |
             0 1100 |
             0 1101 |
             0 1110 | Set          1     -      3
             0 1111 | Set          0     -      4
             1 0000 | Clear        0     +      4
             1 0001 |
             1 0010 |
             1 0011 |
             1 0100 |
             1 0101 |
             1 0110 |
             1 0111 |
             1 1000 |
             1 1001 |
             1 1010 |
             1 1011 |
             1 1100 |
             1 1101 |
             1 1110 |
             1 1111 | Set          0     0      4
Your job:
--------
Step 1: Design the data path for this functional unit.
Step 2: Complete the output columns of the truth table above, which provides the control signals.
Step 3: Compute the average number of bits consumed on each iteration of an integer multiply instruction. (Hint: more than 3)

5. Write an assembly program to add two 32-bit integers. Assume that the contents of memory are as follows:
x4000: Low 16 bits of first integer
x4001: High 16 bits of first integer
x4002: Low 16 bits of second integer
x4003: High 16 bits of second integer
x4004: Low 16 bits of the result
x4005: High 16 bits of the result
Assume the LC-2 architecture has the following features:

a. A carry bit in addition to the condition codes.
b. A new instruction in the instruction set namely,
ADDC R1, R2, R3
which adds the contents of R1 and R2 together with the carry of the previous operation to generate the sum in R1 along with the carry resulting from the operation.

6.Suppose we have the following loop executing on a pipelined LC-2 machine.

DOIT     STR   R1, R6, #0
         ADD   R6, R6, #1
         AND   R3, R1, R2
         BRz   EVEN
         ADD   R1, R1, #3
         ADD   R5, R5, #-1
         BRp   DOIT
EVEN     ADD   R1, R1, #1
         ADD   R7, R7, #-1 
         BRp   DOIT

Assume that before the loop starts, the registers are initialized to the following integer values:
R1 <- 0
R2 <- 1
R5 <- 49
R6 <- 4000
R7 <- 49

"Fetch" takes 1 cycle, "Decode" takes 1 cycle, "Execute" stage takes variable number of cycles depending on the type of instruction (see below), and "Store Result" stage takes 1 cycle.

All execution units (including the load/store unit) are fully pipelined and the following instructions that use these units take the indicated number of cycles:

STR: 3 cycles
ADD: 3 cycles
AND: 2 cycles
BR : 1 cycle

For example, the execution of an ADD instruction followed by a BR would look like:

ADD       F | D | E1 | E2 | E3 | ST
BR            F | D  | -  | -  | E1  | ST
TARGET                                 F  | D

This figure shows several things about the structure of the pipeline:

Also, you are given the following information about the pipeline and the ISA:

Answer the following questions: