IEEE floatingpoint standard
From Academic Kids

The IEEE Standard for Binary FloatingPoint Arithmetic (IEEE 754) is the most widelyused standard for floatingpoint computation, and is followed by many CPU and FPU implementations. The standard defines formats for representing floatingpoint numbers (including ±zero and denormals) and special values (infinities and NaNs) together with a set of floatingpoint operations that operate on these values. It also specifies four rounding modes and five exceptions (and when the exceptions occur, and what happens when they do occur).
IEEE 754 specifies four formats for representing floatingpoint values: singleprecision (32bit), doubleprecision (64bit), singleextended precision (≥ 43bit, not commonly used) and doubleextended precision (≥ 79bit, usually implemented with 80 bits). Only 32bit values are required by the standard, the others are optional. Many languages specify that IEEE formats and arithmetic be implemented, although sometimes it is optional. For example, the C programming language, which predated IEEE 754, now allows but does not require IEEE arithmetic (the C float typically is used for IEEE singleprecision and double uses IEEE doubleprecision).
The full title of the standard is IEEE Standard for Binary FloatingPoint Arithmetic (ANSI/IEEE Std 7541985), and it is also known as IEC 60559:1989, Binary floatingpoint arithmetic for microprocessor systems (originally the reference number was IEC 559:1989).[1] (http://www.opengroup.org/onlinepubs/009695399/frontmatter/refdocs.html)
Contents 
Anatomy of a floatingpoint number
Following is a description of the standard's format for floatingpoint numbers.
Bit conventions used in this article
Bits within a word of width W are indexed by integers in the range 0 to W−1 inclusive. The bit with index 0 is drawn on the right. When considering the word, or regions within the word, as binary numbers then usually the lowest indexed bit will also be the least significant.
Singleprecision 32 bit
A singleprecision binary floatingpoint number is stored in a 32 bit word:
1 8 23 width in bits ++++ S Exp  Fraction  ++++ 31 30 23 22 0 bit index (0 on right) bias +127
Where S is the sign bit and Exp is the Exponent field.
The exponent is biased in the engineering sense of the word – the value stored is offset (by 127 in this case) from the actual value. Biasing is done because exponents have to be signed values in order to be able to represent both tiny and huge values, but two's complement, the usual representation for signed values, would make comparison harder. To solve this the exponent is biased before being stored, by adjusting its value to put it within an unsigned range suitable for comparison. So, for a singleprecision number, an exponent in the range −126 to +127 is biased by adding 127 to get a value in the range 1 to 254 (0 and 255 have special meanings described below). When interpreting the floatingpoint number the bias is subtracted to retrieve the actual exponent.
The set of possible data values can be divided into the following classes:
 zeroes
 normalised numbers
 denormalised numbers
 infinities
 NaN (Not a Number)
(NaNs are used to represent undefined or invalid results, such as the square root of a negative number.)
The classes are primarily distinguished by the value of the Exp field, modified by the fraction. Consider the Exp and Fraction fields as unsigned binary integers (Exp will be in the range 0–255):
Class Exp Fraction Zeroes 0 0 Denormalised numbers 0 non zero Normalised numbers 1254 any Infinities 255 0 NaN (Not a Number) 255 non zero
For normalised numbers, the most common, Exp is the biased exponent and Fraction is the fractional part of the significand. The number has value v:
v = s × 2^{e} × m
Where
s = +1 (positive numbers) when S is 0
s = −1 (negative numbers) when S is 1
e = Exp − 127 (in other words the exponent is stored with 127 added to it, also called "biased with 127")
m = 1.Fraction in binary (that is, the significand is the binary number 1 followed by the radix point followed by the binary bits of Fraction). Therefore, 1 ≤ m < 2.
Note:
 Denormalised numbers are the same except that e = −126 and m is 0.Fraction
 −126 is the smallest exponent for a normalised number
 There are two Zeroes, +0 (S is 0) and −0 (S is 1)
 There are two Infinities +∞ (S is 0) and −∞ (S is 1)
 NaNs may have a sign and a significand, but these have no meaning other than for diagnostics; the first bit of the significand is often used to distinguish signaling NaNs from quiet NaNs
 NaNs and Infinities have all 1s in the Exp field.
An example
Let us encode the decimal number −118.625 using the IEEE 754 system.
We need to get the sign, the exponent and the fraction.
Because it is a negative number, the sign is "1". Let's find the others.
First, we write the number (without the sign) using binary notation. Look at binary numeral system to see how to do it. The result is 1110110.101.
Now, let's move the radix point left, leaving only a 1 at its left: 1110110.101=1.110110101·2^{6}
The fraction is the part at the right of the radix point, filled with 0 on the right until we get all 23 bits. That is 11011010100000000000000.
The exponent is 6, but we need to convert it to binary and bias it (so the most negative exponent is 0, and all exponents are nonnegative binary numbers). For the 32bit IEEE 754 format, the bias is 127 and so 6 + 127 = 133. In binary, this is written as 10000101.
Putting them all together:
1 8 23 width in bits ++++ S Exp  Fraction  11000010111011010100000000000000 ++++ 31 30 23 22 0 bit index (0 on right) bias +127
Doubleprecision 64 bit
Doubleprecision is essentially the same except that the fields are wider:
1 11 52 ++++ S Exp  Fraction  ++++ 63 62 52 51 0 bias +1023
NaNs and Infinities are represented with Exp being all 1s (2047).
For Normalised numbers the exponent bias is +1023 (so e is Exp − 1023). For Denormalised numbers the exponent is −1022 (the minimum exponent for a normalised number—it is not −1023 because normalised numbers have a leading 1 digit before the binary point and denormalised numbers do not). As before, both infinity and zero are signed.
Comparing floatingpoint numbers
Comparing floatingpoint numbers is usually best done using floatingpoint instructions. However, this representation makes comparisons of some subsets of numbers possible on a bytebybyte basis, if they share the same byte order and the same sign, and NaNs are excluded.
For example, for two positive numbers a and b, then a < b is true whenever the unsigned binary integers with the same bit patterns and same byte order as a and b are also ordered a < b. In other words, two positive floatingpoint numbers (known not to be NaNs) can be compared with an unsigned binary integer comparison using the same bits, providing the floatingpoint numbers use the same byte order (this ordering, therefore, cannot be used in portable code through a union in the C programming language). This is an example of lexicographic ordering.
Rounding floatingpoint numbers
The IEEE standard has four different rounding modes.
 Unbiased which rounds to the nearest value, if the number falls midway it is rounded to the nearest value with an even (zero) least significant bit. This mode is required to be default.
 Towards zero
 Towards positive infinity
 Towards negative infinity
References
 This article was originally based on material from the Free Online Dictionary of Computing, which is licensed under the GFDL.
Revision of the standard
Note that the IEEE 754 standard is currently (2004) under revision. See: IEEE 754r
External links
 IEEE 754 references (http://babbage.cs.qc.edu/courses/cs341/IEEE754references.html)
 Let's Get To The (Floating) Point by Chris Hecker (http://www.d6.com/users/checker/pdfs/gdmfp.pdf)
 What Every Computer Scientist Should Know About FloatingPoint Arithmetic by David Goldberg (http://docs.sun.com/source/8063568/ncg_goldberg.html)  a good introduction and explanation.de:IEEE 754