Tue, 10 Sep 2013, 00:48
To my students in 306: Based on the number of students who stayed after class asking questions today, it is clear we were too rushed at the end. Ergo, this handout dealing with floating point numbers. Real machines (ISAs) generally have two sizes of floating point data types: 32 bits and 64 bits. The 32 bit variant we talked about in class today: 1 sign bit, 8 exponent bits, 23 fraction bits. The 64 bit data type in IEEE Floating Point Standard has: 1 sign bit, 11 exponent bits, 52 fraction bits. In class I used a 6 bit floating point data type to explain some of these concepts: 1 sign bit, 3 exponent bits, and 2 fraction bits. Tonight, I feel generous, and will use a 10 bit floating point data type: 1 sign bit, 4 exponent bits and 5 fraction bits. Suppose we want to represent the number -99. First the sign bit: ================== The sign bit is easy (0 means +, 1 means -). So the sign bit is 1. Second, the fraction bits: ========================= The fraction bits are also easy. They just represent the bits of the value. For example, 1100011 represents the magnitude of the number 99. That is, 1 x 2^6 + 1 x 2^5 + 0 x 2^4 + 0 x 2^3 + 0 x 2^2 + 1 x 2^1 + 1 x 2^0. I first need to put it in normalized form. The answer: 1.100011 x 2^6. Do you see why? If not you need to talk to your TA now! We said our floating point numbers will be 10 bits and only five of the bits are for the fraction. But here we have 7 bits. The 1 to the left of the binary point is not a problem. Normalized form means it must be a 1 so we do not waste any bits in its representation. That leaves six bits .100011, but our 10 bit representation only allows five, so we have to discard the lowest bit since we can not squeeze 6 bits into a 5 bit container. The fraction bits are therefore 10001. At this point, my representation is: 1 xxxx 10001. I put in xxxx where the exponent code is going to go. Third, and finally, the exponent bits: ===================================== The four bit exponent is from 0000 to 1111. 16 different codes, as you know. All computer designs that I know of, following in the footsteps of the IEEE Arithmetic, use 1111 and 0000 for special purposes. We use 1111 to represent infinity, a perfectly useful value. 0 1111 00000 is the representation for + infinity. 1 1111 00000 is the representation for - infinity. If the exponent bits are 1111 but the fraction bits are not 00000, then we are not representing infinity. What we are representing is beyond what we need to know in 306. Ask me when you are taking 460N in a couple of years. We use 0000 to represent numbers that are too small to normalize. We will get to that in a moment. But first, we want to know how to use 0001 to 1110 for the exponent when the number CAN be expressed in normalized form. Our example, -99 is a number that can be expressed in normalized form. With four bits of exponent, and removing 0000 and 1111, we have 14 codes left over. We use those 14 codes to represent 14 consecutive exponents. The convention is to use an *excess* code. That is, the exponent being represented is the value of the code MINUS the excess. That is, if we subtract the excess from the value of the code, we get the exponent. We usually refer to the excess as the BIAS, and in IEEE Arithmetic, it is the value 0111 (that is, 7) for our four bit exponents. It would be 01111111 (that is, 127) for our 8-bit exponent code. Or, 01111111111 (that is, 2047) for our 11-bit exponent. Back to our 4-bit exponent, we have code values of 0001 (1) to 1110 (14). The BIAS is 0111 (7). Thus the exponent for the code 0001 is 1-7, which is -6. The exponent for the code 1110 is 14-7, which is 7. AGAIN, we get the exponent by subtracting the BIAS from the code. We get the code by adding the BIAS to the exponent. The complete set of codes and their corresponding exponents are as follows: Code Exponent ==== ======== 1111 special case 1110 +7 1101 +6 1100 +5 1011 +4 1010 +3 1001 +2 1000 +1 0111 0 0110 -1 0101 -2 0100 -3 0011 -4 0010 -5 0001 -6 0000 special case Recall (way up above) the magnitude 99 in normalized form was 1.100011 x 2^6. The exponent is +6. If I add the BIAS (which we agreed was 7), we will get the code. Thus the code is 13, or in binary: 1101. And we are done! -99 is represented as 1 1101 10001. Now, going back the other way! ============================= Suppose we want to construct the floating point number from the 10-bit representation, how do we do it? Let's look at the ten bit representation: 1 1101 10001. The left most bit gives us the sign: negative. The low 5 bits give us the fraction following 1. : 1.10001 The remaining 4 bits give us the exponent. We take the code (13) and subtract the BIAS (7) and get the exponent 6. The answer 1.10001 x 2^6. If we work it out, we have -98. NOT -99. How come! Remember we could only pack 5 bits of fraction into our 5 bit field, so we lost the value represented by that last bit, which in this case was 1. Another example: +34. The magnitude is 100010. Normalized: 1.00010 x 2^5. The representation: 0 1100 00010 Another example: +13.5. The magnitude is 1101.1 Normalized: 1.1011 x 2^3. The representation: 0 1010 10110 Note we only had four bits in our number, but the fraction field had room for five bits so we filled it up with an extra 0. Finally, we are ready for the exponent code 0000. ================================================ (You may want to read the ps at the bottom of this email right now!) This is for numbers that are two small to represent in normalized form. So, we represent them, not as 1.fraction x 2^exponent, but rather as 0.fraction x 2^(-6), our smallest allowable exponent. An example, the value +3/256. This number in binary is 0.00000011. If we normalize it, we get 1.1 x 2^(-7). BUT, if we look at our exponent table above, we see we can not represent the exponent -7 in our 4-bit code. Therefore our number is too small to normalize. The BEST I can do is represent it with an exponent of -6, that is: 0.11 x 2^(-6). Note that this number is no longer normalized. It can not be! It is too small! In fact, we have a name for these numbers. They are called SUBNORMAL numbers. We represent this number: 0 0000 11000 Note we fill in the remaining three bits of our 5 bit fraction with 0. Now, we wish to reconstruct the value from the representation. The exponent 0000 tells us that we have 0.fraction x 2^(-6) The sign bit 0 and fraction 11000 gives us: + 0.11000 x 2^(-6), which is +3/256. And, now you have the whole story! Good luck with the first problem set. Yale Patt ps. I have a little surprise for you. Understanding the subnormal numbers is not something I will expect of you in EE 306. So, you can completely ignore this part of my email. I include it because there are 400 of you and some of the students in EE 306 just can't wait for E 460N to understand SUBNORMAL numbers. See you on Wednesday!