Department of Electrical and Computer Engineering

The University of Texas at Austin

EE 360N, Spring 2009
Study Questions Solutions
Yale N. Patt, Instructor
Ramapriyan Chakravarthy, Khubaib, Vivekanand Venugopal, TAs

  1. Multiplicand is 0011011110.
    Cycle Multiplier Temp register
    0
    0001110010
     0000000000----------
    1
    --00011100
     000110111100--------
    2
    ----000111
     00000110111100------
    3
    ------0010
     1111001111011100----
    4
    --------00
     000110001011011100--
    5
    ----------
     00000110001011011100
    The result is 00000110001011011100.

  2. The exponenet field has a value 63. Therefore the exponent is 63-63=0. The value is 1.0000001111111111 * 2^0. This value is 1.0156.

  3. The bias for the 1-8-23 IEEE format is 127.
    
      3EE00000h = 0 01111101 11000000000000000000000 = 1.11 * 2^-2
      3D800000h = 0 01111011 00000000000000000000000 = 1.0 * 2^-4
    
    Adjusting exponents, we get
    
            1.11 * 2^(-2)
            0.01 * 2^(-2)
    
    
    Since the exponents are now the same, we can add the two
    fractions. The result has the same exponent as the operands. The
    result is: 10.00 * 2^(-2). When normalized to IEEE format, this
    becomes 1.00 * 2^(-1),
    
    IEEE format: 0 01111110 00000000000000000000000 which in hex is
    3F000000h.
    
    
  4. Model 0.001: 1-7-8
    Smallest positive normalized number is 0 0000001 00000000 = 1.0 * 2^(1-63) = 2^-62
    Largest postive normalized number is 0 1111110 11111111 = 1.11111111 * 2^(126-63) = 1.11111111 * 2^63
    
    This model has (8+1) binary bits of precision. Since 1024 (2^10) is
    approximately 1000 (10^3); 10 binary bits of precision correspond to
    about 3 decimal digits of precision. Hence, this model has 9 * (3/10)
    = 2.7 decimal digits of precision
    
    Model 0.002: 1-5-10
    Smallest positive normalized number is 0 00001 0000000000 = 1.0 * 2^(1-15) = 2^-14
    Largest postive normalized number is 0 11110 1111111111 = 1.1111111111 * 2^(30-15) = 1.1111111111 * 2^15
    
    This model has (10+1) binary bits of precision. Hence, this model has
    11 * (3/10) = 3.3 decimal digits of precision
    
    The model chosen would depend on the application - if a higher range
    is desired, then model 0.001 is better; if precision is more
    important, then model 0.002 will be preferred.
    
    1. 5 bits
    2. 1
    3.  
            48       0 110 10000
            19.5     0 101 00111
            5/16     0 000 01010
            0        0 000 00000
           -1        1 001 00000
         -infinity   1 111 00000
            
  5. a) Single processor system - using Horner's rule

    (((((ax + b)x + c)x + d)x + e)x + f)x + g

    Number of operations = 12 (6 multiplies & 6 adds)

    Number of time-steps = 12

    b) Using 4 processors: time steps = 5, operations = 15.

    One way of doing this is:

    Step 1 x * x b * x d * x f * x
    Step 2 x2 * x2 a * x2 dx + e bx + c
    Step 3 ax2 * x4 (bx +c) * x4 (dx + e) * x2 fx + g
    Step 4 ax6 + bx5 + cx4 dx3 + ex2 + fx + g
    Step 5 ax6 + bx5 + cx4 + dx3 + ex2 + fx + g

    An intuitive proof for why it can't be done in less than 5 time steps:

    Lets assume it takes x time steps. In the xth time step, only one operation can be performed (either an add or a multiply) to get the final result. In the (x-1)th time step, at the most two operations can be performed, one each to get each of the operands for the operation in step x. In the (x-2)nd time step & other below, possibly 4 units of work can be found. If x = 4, then theres no way to compute all the operands such that only 3 operations are left to be performed after 2 time steps.

    c)  Speedup = T1/ Tp = 12/5 = 2.4

  6. The table is shown below:
     
    k = 2
    k = 4
    k = 8
    k = 64
    Number of switches in each level
    n/k
    n/2
    n/4
    n/8
    n/64
    Number of levels (= latency)
    logkn
    log2n
    log4n
    log8n
    log64n
    Cost k2 * n/k * logkn
    = nklogkn
    2nlog2n
    4nlog4n
    8nlog8n
    64nlog64n

    Reasonable choice depends on whether lower cost, latency, or contention is desired.

    For n=64, Using k=64 yields the lowest latency, and k=2 or k=4 yields the lowest cost. If we would like low latency we would choose k to be 64. If we would like lowest cost, we would choose k to be 2 or 4. If we would like to minimize contention, we would choose k to be 64.

    Another approach could be to minimize the cost-latency product, in which case we would choose k=8 for n=64.