Tue, 10 Sep 2013, 00:48



To my students in 306:

Based on the number of students who stayed after class asking questions
today, it is clear we were too rushed at the end.  Ergo, this handout 
dealing with floating point numbers.

Real machines (ISAs) generally have two sizes of floating point data types:
32 bits and 64 bits.  The 32 bit variant we talked about in class today:

1 sign bit, 8 exponent bits, 23 fraction bits.

The 64 bit data type in IEEE Floating Point Standard has:

1 sign bit, 11 exponent bits, 52 fraction bits.

In class I used a 6 bit floating point data type to explain some of these
concepts: 1 sign bit, 3 exponent bits, and 2 fraction bits.

Tonight, I feel generous, and will use a 10 bit floating point data type: 
1 sign bit, 4 exponent bits and 5 fraction bits.  

Suppose we want to represent the number -99.

First the sign bit:
==================

The sign bit is easy (0 means +, 1 means -).  So the sign bit is 1.

Second, the fraction bits:
=========================

The fraction bits are also easy.  They just represent the bits of the value.
For example, 1100011 represents the magnitude of the number 99.  

That is, 1 x 2^6 + 1 x 2^5 + 0 x 2^4 + 0 x 2^3 + 0 x 2^2 + 1 x 2^1 + 1 x 2^0.

I first need to put it in normalized form.  The answer:

1.100011 x 2^6.  Do you see why?  If not you need to talk to your TA now!

We said our floating point numbers will be 10 bits and only five of the bits 
are for the fraction.  But here we have 7 bits.  The 1 to the left of the 
binary point is not a problem.  Normalized form means it must be a 1 so we 
do not waste any bits in its representation.  That leaves six bits .100011, 
but our 10 bit representation only allows five, so we have to discard the 
lowest bit since we can not squeeze 6 bits into a 5 bit container.  The 
fraction bits are therefore 10001.

At this point, my representation is: 1 xxxx 10001.

I put in xxxx where the exponent code is going to go.

Third, and finally, the exponent bits:
=====================================

The four bit exponent is from 0000 to 1111.  16 different codes, as you know.
All computer designs that I know of, following in the footsteps of the IEEE
Arithmetic, use 1111 and 0000 for special purposes.  

We use 1111 to represent infinity, a perfectly useful value.

0 1111 00000 is the representation for + infinity.
1 1111 00000 is the representation for - infinity.

If the exponent bits are 1111 but the fraction bits are not 00000, then 
we are not representing infinity.  What we are representing is beyond what
we need to know in 306.  Ask me when you are taking 460N in a couple of years.

We use 0000 to represent numbers that are too small to normalize.  We will
get to that in a moment.  But first, we want to know how to use 0001 to 1110 
for the exponent when the number CAN be expressed in normalized form.  Our 
example, -99 is a number that can be expressed in normalized form.  

With four bits of exponent, and removing 0000 and 1111, we have 14 codes left
over.  We use those 14 codes to represent 14 consecutive exponents.  The
convention is to use an *excess* code.  That is, the exponent being represented
is the value of the code MINUS the excess.  That is, if we subtract the excess
from the value of the code, we get the exponent.  We usually refer to the excess
as the BIAS, and in IEEE Arithmetic, it is the value 0111 (that is, 7) for 
our four bit exponents.  It would be 01111111 (that is, 127) for our 8-bit
exponent code.  Or, 01111111111 (that is, 2047) for our 11-bit exponent.

Back to our 4-bit exponent, we have code values of 0001 (1) to 1110 (14).  The
BIAS is 0111 (7).  Thus the exponent for the code 0001 is 1-7, which is -6.
The exponent for the code 1110 is 14-7, which is 7.  AGAIN, we get the exponent
by subtracting the BIAS from the code.  We get the code by adding the BIAS to
the exponent.

The complete set of codes and their corresponding exponents are as follows:

Code Exponent
==== ========

1111  special case
1110  +7
1101  +6
1100  +5
1011  +4
1010  +3
1001  +2
1000  +1
0111   0
0110  -1
0101  -2
0100  -3
0011  -4
0010  -5
0001  -6
0000  special case


Recall (way up above) the magnitude 99 in normalized form was 1.100011 x 2^6.

The exponent is +6.  If I add the BIAS (which we agreed was 7), we will get
the code.  Thus the code is 13, or in binary: 1101.

And we are done!  -99 is represented as 1 1101 10001.

Now, going back the other way!
=============================

Suppose we want to construct the floating point number from the 10-bit 
representation, how do we do it?  Let's look at the ten bit representation:

1 1101 10001.

The left most bit gives us the sign: negative.
The low 5 bits give us the fraction following 1. : 1.10001
The remaining 4 bits give us the exponent.  We take the code (13) and subtract
the BIAS (7) and get the exponent 6.

The answer 1.10001 x 2^6.  If we work it out, we have -98.  NOT -99.  How come!
Remember we could only pack 5 bits of fraction into our 5 bit field, so we lost
the value represented by that last bit, which in this case was 1.

Another example: +34.  The magnitude is 100010.  Normalized: 1.00010 x 2^5.

The representation: 0 1100 00010

Another example: +13.5.  The magnitude is 1101.1  Normalized: 1.1011 x 2^3.

The representation: 0 1010 10110

Note we only had four bits in our number, but the fraction field had room
for five bits so we filled it up with an extra 0.


Finally, we are ready for the exponent code 0000.  
================================================

(You may want to read the ps at the bottom of this email right now!)

This is for numbers that are two small to represent in normalized form.
So, we represent them, not as 

1.fraction x 2^exponent, but rather as 

0.fraction x 2^(-6), our smallest allowable exponent.

An example, the value +3/256.  This number in binary is 0.00000011.  If we
normalize it, we get 

1.1 x 2^(-7).

BUT, if we look at our exponent table above, we see we can not represent
the exponent -7 in our 4-bit code.  Therefore our number is too small to
normalize.  The BEST I can do is represent it with an exponent of -6, that is:

0.11 x 2^(-6).

Note that this number is no longer normalized.  It can not be!  It is too small!
In fact, we have a name for these numbers.  They are called SUBNORMAL numbers.
We represent this number: 0 0000 11000

Note we fill in the remaining three bits of our 5 bit fraction with 0.

Now, we wish to reconstruct the value from the representation.  The exponent
0000 tells us that we have

0.fraction x 2^(-6)

The sign bit 0 and fraction 11000 gives us: + 0.11000 x 2^(-6), which is +3/256.
  
And, now you have the whole story!

Good luck with the first problem set.

Yale Patt


ps. I have a little surprise for you.  Understanding the subnormal numbers is
not something I will expect of you in EE 306.  So, you can completely ignore
this part of my email.  I include it because there are 400 of you and some of
the students in EE 306 just can't wait for E 460N to understand SUBNORMAL 
numbers.  See you on Wednesday!