4LSB of Multiplier | 1st stage LSHF | operation performed | 2nd stage LSHF | Borrow-Out | Total Bits processed |
0000 | 0 | N/A | 4 | 0 | 4 |
0001 | 0 | ADD | 4 | 0 | 4 |
0010 | 1 | ADD | 3 | 0 | 4 |
0011 | 0 | SUB | 2 | 1 | 2 |
0100 | 2 | ADD | 2 | 0 | 4 |
0101 | 0 | ADD | 2 | 0 | 2 |
0110 | 1 | SUB | 2 | 1 | 3 |
0111 | 0 | SUB | 3 | 1 | 3 |
1000 | 3 | ADD | 1 | 0 | 4 |
1001 | 0 | ADD | 3 | 0 | 3 |
1010 | 1 | ADD | 2 | 0 | 3 |
1011 | 0 | SUB | 2 | 1 | 2 |
1100 | 2 | SUB | 2 | 1 | 4 |
1101 | 0 | ADD | 2 | 0 | 2 |
1110 | 1 | SUB | 3 | 1 | 4 |
1111 | 0 | SUB | 4 | 1 | 4 |
Average number of bits processed in a cycle = (4+4+4+2+4+2+3+3+4+3+3+2+4+2+4+4)/16 = 3.25
The bias for the 1-8-23 IEEE format is 127. 3EE00000h = 0 01111101 11000000000000000000000 = 1.11 * 2^-2 3D800000h = 0 01111011 00000000000000000000000 = 1.0 * 2^-4 Adjusting exponents, we get 1.11 * 2^(-2) 0.01 * 2^(-2) Since the exponents are now the same, we can add the two fractions. The result has the same exponent as the operands. The result is: 10.00 * 2^(-2). When normalized to IEEE format, this becomes 1.00 * 2^(-1), IEEE format: 0 01111110 00000000000000000000000 which in hex is 3F000000h.
Model 0.001: 1-7-8 Smallest positive normalized number is 0 0000001 00000000 = 1.0 * 2^(1-63) = 2^-62 Largest postive normalized number is 0 1111110 11111111 = 1.11111111 * 2^(126-63) = 1.11111111 * 2^63 This model has (8+1) binary bits of precision. Since 1024 (2^10) is approximately 1000 (10^3); 10 binary bits of precision correspond to about 3 decimal digits of precision. Hence, this model has 9 * (3/10) = 2.7 decimal digits of precision Model 0.002: 1-5-10 Smallest positive normalized number is 0 00001 0000000000 = 1.0 * 2^(1-15) = 2^-14 Largest postive normalized number is 0 11110 1111111111 = 1.1111111111 * 2^(30-15) = 1.1111111111 * 2^15 This model has (10+1) binary bits of precision. Hence, this model has 11 * (3/10) = 3.3 decimal digits of precision The model chosen would depend on the application - if a higher range is desired, then model 0.001 is better; if precision is more important, then model 0.002 will be preferred.
48 0 110 10000 19.5 0 101 00111 5/16 0 000 01010 0 0 000 00000 -1 1 001 00000 -infinity 1 111 00000
a) Single processor system - using Horner's rule
(((((ax + b)x + c)x + d)x + e)x + f)x + g
Number of operations = 12 (6 multiplies & 6 adds)
Number of time-steps = 12
b) Using 4 processors: time steps = 5, operations = 15.
One way of doing this is:
Step 1 | x * x | b * x | d * x | f * x |
Step 2 | x2 * x2 | a * x2 | dx + e | bx + c |
Step 3 | ax2 * x4 | (bx +c) * x4 | (dx + e) * x2 | fx + g |
Step 4 | ax6 + bx5 + cx4 | dx3 + ex2 + fx + g | ||
Step 5 | ax6 + bx5 + cx4 + dx3 + ex2 + fx + g |
An intuitive proof for why it can't be done in less than 5 time steps:
Lets assume it takes x time steps. In the xth time step, only one operation can be performed (either an add or a multiply) to get the final result. In the (x-1)th time step, at the most two operations can be performed, one each to get each of the operands for the operation in step x. In the (x-2)nd time step & other below, possibly 4 units of work can be found. If x = 4, then theres no way to compute all the operands such that only 3 operations are left to be performed after 2 time steps.
c) Speedup = T1/ Tp = 12/5 = 2.4
|
|
|
|
||
Number of switches in each level |
|
|
|
|
|
Number of levels (= latency) |
|
|
|
|
|
Cost | k2 * n/k * logkn
|
|
|
|
|
Reasonable choice depends on whether lower cost, latency, or contention is desired.
For n=64, Using k=64 yields the lowest latency, and k=2 or k=4 yields the lowest cost. If we would like low latency we would choose k to be 64. If we would like lowest cost, we would choose k to be 2 or 4. If we would like to minimize contention, we would choose k to be 64.
Another approach could be to minimize the cost-latency product, in which case we would choose k=8 for n=64.