CSE 283 Introduction to Object Oriented Design

Barbara Nostrand, Ph.D.

Fixed Point and Floating Point Numbers

Computers typically represent numbers in any of three ways.  One way,  is to represent the number as a string of decimal digits.  This mechanism is not terribly fast,  but allows for "inifinite precision" arithmetic.  The IBM 1620 represented all numbers in this way.  Also,  dialects of LISP generally support "symbolic arithmetic" of this sort.  However,  fixed point and floating point representation tend to be more efficient both in terms of space and time on computers than is "scratchpad" decimal arithmetic. 

Fixed Point 2's Compliment Numbers

Two's compiment numbers of fixed width bit strings where the most significant bit (MSB) represents the sign of the number and the least significant bit (LSB) indicates whether the number is even or odd.  For an 8-bit 2's compliment number we have the following:


Positive Numbers are simly the sum of the bit values corresponding to each one in the binary represenation of the number.  Thus,  001000112 = 3510.  To convert a number to binary,  you need to repeatedly divide the number by 2 saving the remainder of each division as a binary bit.  Essentially,  we repeatedly divide a number by 2 propogating the quotient and saving the remainders in their bit positions.  Thus,  we can convert the decimal number 117 to binary as follows: 


Thus,  we have converted the decimal number 13710 to the binary number 010001001

We haven't yet seen how to encode negative numbers.  To encode the negative number -13710 we begin by converting the positive number 13710 to binary.  Next,  we invert all the bits changing 1's into 0's and 0's into 1's.  Finally,  we add 1 to the result.  Note,  you convert a negative 2's compliment binary number into a positive 2's compliment binary number in exactly the same way that you convert a positive 2's compliment binary number into a negative number!  All you have to do is invert all the bits and add 1 in BOTH cases.  Although 2's compliment has only one value for zero,  it still has an extra bit pattern.  This means that there will be one more negative value available to us than we have for positive values.  The bit pattern for this extra value is 1 in the SIGN BIT followed by ALL ZEROS.  Here is an example of converting the negative number -97 to 16 bit 2's compliment binary. 

Convert 97 to binary0000000001100001
Invert all the bits1111111110011110
Add 11111111110011111

Floating Point Numbers

Floating point numbers are based on Scientific Notation.  At one time,  some computers represented these numbers using base 10.  Today,  almost if not all computers use a variant on the IEEE 754 Floating Point Standard.  This standard encodes both the mantisa and exponent of numbers in Scientific Notation as base two numbers. 

value = (-1)S2(E-127)(1.F)

(1.F) Represents the process of prefixing the unsigned F (mantissa) field with a 1.  This is done because any binary number other than zero (for which there is a special code) will begin with 1.  E represents an unsigned binary exponent field,  and S represents the sign bit.  For 32-bit floating point numbers we have:

  0 1      8 9                    31
We will look at how to convert numbers to floating point by considering an example.  Suppose we want to convert the number 0.310 to 32-bit floating point.  We make use of a simple mathematical identity to convert the number to an integer and back to a non-decimal,  but we perform the two conversions in different ways.  We perform the first conversion by shifting characters to the left,  and the second conversion by dividing the resulting binary pattern by a binary number.  Here is how we convert 0.3 to floating point: 

Shift number for desired precision and convert to binary 0.3 * 10  +1 = 3.0Shift Left One Digit000000000000000000000011
Divide by 10shift 3.0 * 10 –1 = 0.3Divide by 1010100110011001100110011001
Truncate the leading 1 bit00110011001100110011001

Observe that since the result is actually an infinite repeating bit pattern,  truncating on the right to fit the result into a computer word results in a binary value whose value is less than the actual desired value.  Herein lies the reason that unsigned mantissas are used instead of 2's compiment signed mantissas.  Our unsigned representation results in the floating point representation of a postive number being the same as the floating point representation for a negative number of the same magnitude.  Consequently,  we can add 0.3 to -0.3 and get 0.0 as our result.  If we had used floating point,  then truncating the respective bit patterns would have shifted both numbers in the negative direction,  and the result would have been non-zero! 

Last modified: 2007 OCT 15