Department of Electrical and Computer Engineering
University of Texas

EE 360N
Fall, 1999
Y. N. Patt, Instructor
Francis Tseng, Seema Prasad, TAs
Problem Set 4
Due: November 12, 1999

Problem 1. Given the following code segment:

for (i=1; i<=100; i++)
A[i] = (B[i] * C[i] + D[i])/2;

Write Cray-like assembly code to perform the calculation. Then compute the number of cycles required for the code segment to execute on the following machines:

a) Scalar processor
b) Vector processor without chaining, 1 port to memory (1 load or store / cycle)
c) Vector processor with chaining, 1 port to memory
d) Vector processor with chaining, 2 loads and 1 store per cycle

Assume each machine has vector registers of length 64. The multiply, add, and load units are pipelined and take 6, 4, and 11 cycles, respectively, to complete one operation.

Problem 2. Calculate the execution time for the given code on the following models:

a) straightforward execution (like the LC-1)
b) pipeline with scoreboard with one multiplier and one adder
c) Tomasulo's algorithm with sufficient functional units
MUL R3, R1, R2
ADD R5, R4, R3
ADD R6, R4, R1
MUL R7, R8, R9
ADD R4, R3, R7
MUL R10, R5, R6

Each instruction is specified with destination register first. Multiplies take 4 cycles and additions take 2 cycles.

Note: fetch and decode together take one cycle extra.

Problem 3. (Tanenbaum, 4th edition: Appendix A, 10,11,12) Signed decimal numbers consisting of n digits can be represented in n+1 digits without a sign.Positive numbers have 0 as the leftmost digit. Negative numbers are formed by subtracting each each digit from 9. Such numbers are called nine's complement numbers and are analogous to one's complement binary numbers.Express the following as three-digit nine's complement numbers: 6, -2, 100, -14, -1,0.

Determine the rule for addition of nine's complement numbers and then perform the following additions.

0001 0001 9997 9241
+9999 +9998 +9996 +0802
----- ----- ----- -----

Ten's complement is analogous to two's complement.A ten's complement negative number is formed by adding 1 to the corresponding nine's complement number, ignoring the carry. What is the rule for ten's complement addition?

Problem 4: (Tanenbaum, 4th edition: Appendix B, 4,5,6) The following binary floating-point numbers consist of a sign bit, an excess 64, radix 2 exponent, an a 16-bit fraction. Normalize them.

a. 0 1000000 0001010100000001
b. 0 0111111 0000001111111111
c. 0 1000011 1000000000000000

To add two floating point numbers, you must adjust the exponents(by shifting the fraction) to make them the same. Then you can add the fractions and normalize the result if need be.Add the single precision IEEE numbers 3EE00000H and 3D800000H and express the normalized result in hexadecimal.

The Tightwad Computer Company has decided to come out with a machine having 16-bit floating point numbers. The model 0.001 has a floating-point format with a sign bit, 7 bit, excess 64 exponent and 8-bit fraction.Model 0.002 has a sign bit, 5-bit excess 16 exponent and a 10-bit fraction.Both use radix 2 exponentiation.What are the smallest and largest positive normalized numbers on both models? About how many decimal digits of precision does each have? Would you buy either one?

Problem 5:

Booth's algorithm for multiplication, speeds up the shift/add process by consuming 2 bits during each shift/add step. But why stop at 2 bits? Your task: design the data path and control (like the 2bit Booth's algorithm) for consuming up to 4 bits at each step. Each step will be one or two shifts combined with either an add or a subtract.

Control inputs to the data path are: number of bits to be shifted before the add/subtract, whether to add, subtract, or add 0, and the number of bits to be shifted after the add/subtract/zero operation. Also whether to set or clear the *carry-control* bit. Complete the truth table below. Each input combination corresponds to the low-order four bits of the multiplier at the time of the iteration step.

Carry-ctl mpy bits | carry-ctl pre-sh +/-/0 post-sh
--------------------|--------------------------------------
             0 0000 | Clear        0     0      4
             0 0001 | Clear        0     +      4
             0 0010 | Clear        1     +      3
             0 0011 | Set          0     -      2
             0 0100 | Clear        2     +      2
             0 0101 | Clear        0     +      2
             0 0110 | Set          1     -      2
             0 0111 |
             0 1000 |
             0 1001 |
             0 1010 |
             0 1011 |
             0 1100 |
             0 1101 |
             0 1110 | Set          1     -      3
             0 1111 | Set          0     -      4
             1 0000 | Clear        0     +      4
             1 0001 |
             1 0010 |
             1 0011 |
             1 0100 |
             1 0101 |
             1 0110 |
             1 0111 |
             1 1000 |
             1 1001 |
             1 1010 |
             1 1011 |
             1 1100 |
             1 1101 |
             1 1110 |
             1 1111 | Set          0     0      4

Your job:
--------

Step 1: Design the data path for this functional unit.
Step 2: Complete the output columns of the truth table above, which provides the control signals.
Step 3: Compute the average number of bits consumed on each iteration of an integer multiply instruction. (Hint: more than 3)

Problem 6:

Write an assembly program to add two 32-bit integers. Assume the LC-2 architecture has the following features:

a. A carry bit in addition to the condition codes.
b. A new instruction in the instruction set namely,
ADDC R1, R2, R3

which adds the contents of R1 and R2 together with the carry of the previous operation to generate the sum in R1 along with the carry resulting from the operation.