EE 460N - Problem Study Questions Solutions

Department of Electrical and Computer Engineering

The University of Texas at Austin

EE 460N, Fall 2014
Study Questions Solutions
Yale N. Patt, Instructor
Stephen Pruett, Emily Bragg, Siavash Zangeneh, TAs

The truth table to implement the logic is as follows

4LSB of Multiplier	1st stage LSHF	operation performed	2nd stage LSHF	Borrow-Out	Total Bits processed
0000	0	N/A	4	0	4
0001	0	ADD	4	0	4
0010	1	ADD	3	0	4
0011	0	SUB	2	1	2
0100	2	ADD	2	0	4
0101	0	ADD	2	0	2
0110	1	SUB	2	1	3
0111	0	SUB	3	1	3
1000	3	ADD	1	0	4
1001	0	ADD	3	0	3
1010	1	ADD	2	0	3
1011	0	SUB	2	1	2
1100	2	SUB	2	1	4
1101	0	ADD	2	0	2
1110	1	SUB	3	1	4
1111	0	SUB	4	1	4

Average number of bits processed in a cycle = (4+4+4+2+4+2+3+3+4+3+3+2+4+2+4+4)/16 = 3.25

1. -3.25
2. 1.25 x 2^(-128)
3. Negative Infinity

The exponenet field has a value 63. Therefore the exponent is 63-63=0. The value is 1.0000001111111111 * 2^0. This value is 1.0156.

The bias for the 1-8-23 IEEE format is 127.

  3EE00000h = 0 01111101 11000000000000000000000 = 1.11 * 2^-2
  3D800000h = 0 01111011 00000000000000000000000 = 1.0 * 2^-4

Adjusting exponents, we get

        1.11 * 2^(-2)
        0.01 * 2^(-2)


Since the exponents are now the same, we can add the two
fractions. The result has the same exponent as the operands. The
result is: 10.00 * 2^(-2). When normalized to IEEE format, this
becomes 1.00 * 2^(-1),

IEEE format: 0 01111110 00000000000000000000000 which in hex is
3F000000h.

Model 0.001: 1-7-8
Smallest positive normalized number is 0 0000001 00000000 = 1.0 * 2^(1-63) = 2^-62
Largest postive normalized number is 0 1111110 11111111 = 1.11111111 * 2^(126-63) = 1.11111111 * 2^63

This model has (8+1) binary bits of precision. Since 1024 (2^10) is
approximately 1000 (10^3); 10 binary bits of precision correspond to
about 3 decimal digits of precision. Hence, this model has 9 * (3/10)
= 2.7 decimal digits of precision

Model 0.002: 1-5-10
Smallest positive normalized number is 0 00001 0000000000 = 1.0 * 2^(1-15) = 2^-14
Largest postive normalized number is 0 11110 1111111111 = 1.1111111111 * 2^(30-15) = 1.1111111111 * 2^15

This model has (10+1) binary bits of precision. Hence, this model has
11 * (3/10) = 3.3 decimal digits of precision

The model chosen would depend on the application - if a higher range
is desired, then model 0.001 is better; if precision is more
important, then model 0.002 will be preferred.

5 bits
1

 
      48       0 110 10000
      19.5     0 101 00111
      5/16     0 000 01010
      0        0 000 00000
     -1        1 001 00000
   -infinity   1 111 00000

a) Single processor system - using Horner's rule

(((((ax + b)x + c)x + d)x + e)x + f)x + g

Number of operations = 12 (6 multiplies & 6 adds)

Number of time-steps = 12

b) Using 4 processors: time steps = 5, operations = 15.

One way of doing this is:

Step 1	x * x	b * x	d * x	f * x
Step 2	x² * x²	a * x²	dx + e	bx + c
Step 3	ax² * x⁴	(bx +c) * x⁴	(dx + e) * x²	fx + g
Step 4	ax⁶ + bx⁵+ cx⁴	dx³ + ex²+ fx + g
Step 5	ax⁶ + bx⁵+ cx⁴ + dx³ + ex²+ fx + g

An intuitive proof for why it can't be done in less than 5 time steps:

Lets assume it takes x time steps. In the xth time step, only one operation can be performed (either an add or a multiply) to get the final result. In the (x-1)th time step, at the most two operations can be performed, one each to get each of the operands for the operation in step x. In the (x-2)nd time step & other below, possibly 4 units of work can be found. If x = 4, then theres no way to compute all the operands such that only 3 operations are left to be performed after 2 time steps.

c) Speedup = T₁/ T_p = 12/5 = 2.4

The table is shown below:

		k = 2	k = 4	k = 8	k = 64
Number of switches in each level	n/k	n/2	n/4	n/8	n/64
Number of levels (= latency)	log_kn	log₂n	log₄n	log₈n	log₆₄n
Cost	k²* n/k * log_kn = nklog_kn	2nlog₂n	4nlog₄n	8nlog₈n	64nlog₆₄n

Reasonable choice depends on whether lower cost, latency, or contention is desired.

For n=64, Using k=64 yields the lowest latency, and k=2 or k=4 yields the lowest cost. If we would like low latency we would choose k to be 64. If we would like lowest cost, we would choose k to be 2 or 4. If we would like to minimize contention, we would choose k to be 64.

Another approach could be to minimize the cost-latency product, in which case we would choose k=8 for n=64.