Weird Floating-Point Numbers: Infinity, Denormal, and NaN
CS 301 Lecture,
Dr. Lawlor
Normal (non-Weird) Floats
Recall that a "float" as as defined by IEEE Standard
754 consists of three bitfields:
Sign
|
Exponent
|
Mantissa
(or Fraction)
|
1 bit--
0 for positive
1 for negative
|
8 bits--
127 means 20
137 means 210
|
23 bits-- a binary fraction.
|
The hardware usually
interprets a float as having the value:
value = (-1) sign * 2 (exponent-127)
* 1.fraction
Note that the mantissa normally has an implicit leading 1
applied.
Weird: Zeros and Denormals
However, if the "exponent"
field is exactly zero, the implicit leading digit is taken to be 0,
like this:
value = (-1) sign * 2 (-126) * 0.fraction
Supressing the leading 1 allows you to exactly represent 0:
the bit pattern for 0.0 is just exponent==0 and
fraction==00000000 (that is, everything zero). If you set the
sign bit to negative, you have "negative zero", a strange
curiosity. Positive and negative zero work the same way in
arithmetic operations, and as far as I know there's no reason to
prefer
one to the other. The "==" operator claims positive and
negative zero are the same!
If the fraction field isn't zero, but the exponent field is, you
have a
"denormalized number"--these are numbers too small to represent with
a
leading one. You always need denormals to represent zero, but
denormals (also known as "subnormal" values) also provide a little
more
range at the very
low end--they can store values down to around 1.0e-40 for "float",
and
1.0e-310
for "double".
See below for the performance problem with
denormals.
Weird: Infinity
If the exponent field is as big as it can get (for "float", 255),
this
indicates another sort of special number. If the fraction
field
is zero, the number is interpreted as positive or negative
"infinity". The hardware will generate "infinity" when
dividing
by zero, or when another operation exceeds the representable range.
float z=0.0;
float f=1.0/z;
std::cout<<f<<"\n";
return (int)f;
(Try
this in NetRun now!)
Arithmetic on infinities works just the way you'd expect:infinity
plus
1.0 gives infinity, etc. (See tables below). Positive and
negative infinities exist, and work as you'd expect. Note that
while divide-by-integer-zero causes a crash (divide by zero
error), divide-by-floating-point-zero just happily returns infinity
by
default.
Weird: NaN
If you do an operation that doesn't make sense, like:
- 0.0/0.0 (neither zero nor infinity, because we'd want
(x/x)==1.0; but not 1.0 either, because we'd want
(2*x)/x==2.0...)
- infinity-infinity (might cancel out to anything)
- infinity*0
The machine just gives a special "error" number called a "NaN"
(Not-a-Number). The idea is if you run some complicated
program
that screws up, you don't want to get a plausible but wrong answer
like
"4" (like we get with integer overflow!); you want something totally
implausible like "nan" to indicate an error happened.
For
example, this program prints "nan" and returns -2147483648
(0x80000000):
float f=sqrt(-1.0);
std::cout<<f<<"\n";
return (int)f;
(Try
this in NetRun now!)
This is a "NaN", which is represented with a huge exponent and a
*nonzero* fraction field. Positive and negative nans exist,
but
like zeros both signs seem to work the same. x86 seems to
rewrite the bits
of all NaNs to a special pattern it prefers (0x7FC00000 for float,
with
exponent bits and the leading fraction bit all set to 1).
Performance impact of special values
Machines properly handle ordinary floating-point numbers and zero in
hardware at full speed.
However, most machines around the year 2000 *didn't* handle
denormals, infinities, or
NaNs in hardware--instead when one of these special values occurs,
they
trap out to software which handles the problem and restarts the
computation. This trapping
process takes time, as shown in the following program:
(Executable
NetRun Link)
enum {n_vals=1000};
double vals[n_vals];
int average_vals(void) {
for (int i=0;i<n_vals-1;i++)
vals[i]=0.5*(vals[i]+vals[i+1]);
return 0;
}
int foo(void) {
int i;
for (i=0;i<n_vals;i++) vals[i]=0.0;
printf(" Zeros: %.3f ns/float\n",time_function(average_vals)/n_vals*1.0e9);
for (i=0;i<n_vals;i++) vals[i]=1.0;
printf(" Ones: %.3f ns/float\n",time_function(average_vals)/n_vals*1.0e9);
for (i=0;i<n_vals;i++) vals[i]=1.0e-310;
printf(" Denorm: %.3f ns/float\n",time_function(average_vals)/n_vals*1.0e9);
float x=0.0;
for (i=0;i<n_vals;i++) vals[i]=1.0/x;
printf(" Inf: %.3f ns/float\n",time_function(average_vals)/n_vals*1.0e9);
for (i=0;i<n_vals;i++) vals[i]=x/x;
printf(" NaN: %.3f ns/float\n",time_function(average_vals)/n_vals*1.0e9);
return 0;
}
Many old machines run *seriously* slower for the weird
numbers. Here
are the results of the above program, in nanoseconds per float
operation, on a variety of old machines:
|
Intel P3 |
Intel P4 |
Core2 |
Q6600 |
Sandy Bridge |
Phenom II |
PPC G5 |
MIPS R5000 |
Intel 486 |
Zero |
4.0 |
1.6 |
1.6 |
1.1 |
0.6 |
1.0 |
2.3 |
131.0 |
1215.8 |
One |
4.0 |
1.6 |
1.9 |
1.1 |
0.6 |
1.0 |
2.2 |
130.6 |
864.8 |
Denorm |
335.1 |
295.5 |
517.9 |
130.0 |
46.3 |
109.0 |
10.1 |
24437.0 |
3879.0 |
Infinity |
191.9 |
706.4 |
346.9 |
1.1 |
0.6 |
1.0 |
2.1 |
153.2 |
2558.2 |
NaN |
206.2 |
772.2 |
356.3 |
1.1 |
0.6 |
1.0 |
2.1 |
10924.1 |
3103.7 |
Generally, no machine of this era has any performance penalty for
zero, despite it being somewhat "weird".
These numbers are mostly fast again on most recent (2020 era)
machines.
My friends at Illinois and I wrote a paper on this with many
more performance details in 2005.
Bonus: Arithmetic Tables for Special Floating-Point Numbers
These tables were computed for "float", but should be identical with
any
number size on any IEEE machine (which virtually everything
is).
"big" is a large but finite number, here
1.0e30. "lil" is a denormalized number, here 1.0e-40. "inf" is
an
infinity. "nan" is a Not-A-Number. Here's the source code
to generate these tables.
These all go exactly how you'd expect--"inf" for things that are too
big (or -inf for too small), "nan" for things that don't make sense
(like 0.0/0.0, or infinity
times zero, or nan with anything else).
Addition
+ |
-nan |
-inf |
-big |
-1 |
-lil |
-0 |
+0 |
+lil |
+1 |
+big |
+inf |
+nan |
-nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
-inf |
nan |
-inf |
-inf |
-inf |
-inf |
-inf |
-inf |
-inf |
-inf |
-inf |
nan |
nan |
-big |
nan |
-inf |
-2e+30 |
-big |
-big |
-big |
-big |
-big |
-big |
0 |
+inf |
nan |
-1 |
nan |
-inf |
-big |
-2 |
-1 |
-1 |
-1 |
-1 |
0 |
+big |
+inf |
nan |
-lil |
nan |
-inf |
-big |
-1 |
-2e-40 |
-lil |
-lil |
0 |
+1 |
+big |
+inf |
nan |
-0 |
nan |
-inf |
-big |
-1 |
-lil |
-0 |
0 |
+lil |
+1 |
+big |
+inf |
nan |
+0 |
nan |
-inf |
-big |
-1 |
-lil |
0 |
0 |
+lil |
+1 |
+big |
+inf |
nan |
+lil |
nan |
-inf |
-big |
-1 |
0 |
+lil |
+lil |
2e-40 |
+1 |
+big |
+inf |
nan |
+1 |
nan |
-inf |
-big |
0 |
+1 |
+1 |
+1 |
+1 |
2 |
+big |
+inf |
nan |
+big |
nan |
-inf |
0 |
+big |
+big |
+big |
+big |
+big |
+big |
2e+30 |
+inf |
nan |
+inf |
nan |
nan |
+inf |
+inf |
+inf |
+inf |
+inf |
+inf |
+inf |
+inf |
+inf |
nan |
+nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
Note how infinity-infinity gives nan, but infinity+infinity is
infinity.
Subtraction
- |
-nan |
-inf |
-big |
-1 |
-lil |
-0 |
+0 |
+lil |
+1 |
+big |
+inf |
+nan |
-nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
-inf |
nan |
nan |
-inf |
-inf |
-inf |
-inf |
-inf |
-inf |
-inf |
-inf |
-inf |
nan |
-big |
nan |
+inf |
0 |
-big |
-big |
-big |
-big |
-big |
-big |
-2e+30 |
-inf |
nan |
-1 |
nan |
+inf |
+big |
0 |
-1 |
-1 |
-1 |
-1 |
-2 |
-big |
-inf |
nan |
-lil |
nan |
+inf |
+big |
+1 |
0 |
-lil |
-lil |
-2e-40 |
-1 |
-big |
-inf |
nan |
-0 |
nan |
+inf |
+big |
+1 |
+lil |
0 |
-0 |
-lil |
-1 |
-big |
-inf |
nan |
+0 |
nan |
+inf |
+big |
+1 |
+lil |
0 |
0 |
-lil |
-1 |
-big |
-inf |
nan |
+lil |
nan |
+inf |
+big |
+1 |
2e-40 |
+lil |
+lil |
0 |
-1 |
-big |
-inf |
nan |
+1 |
nan |
+inf |
+big |
2 |
+1 |
+1 |
+1 |
+1 |
0 |
-big |
-inf |
nan |
+big |
nan |
+inf |
2e+30 |
+big |
+big |
+big |
+big |
+big |
+big |
0 |
-inf |
nan |
+inf |
nan |
+inf |
+inf |
+inf |
+inf |
+inf |
+inf |
+inf |
+inf |
+inf |
nan |
nan |
+nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
Multiplication
* |
-nan |
-inf |
-big |
-1 |
-lil |
-0 |
+0 |
+lil |
+1 |
+big |
+inf |
+nan |
-nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
-inf |
nan |
+inf |
+inf |
+inf |
+inf |
nan |
nan |
-inf |
-inf |
-inf |
-inf |
nan |
-big |
nan |
+inf |
+inf |
+big |
1e-10 |
0 |
-0 |
-1e-10 |
-big |
-inf |
-inf |
nan |
-1 |
nan |
+inf |
+big |
+1 |
+lil |
0 |
-0 |
-lil |
-1 |
-big |
-inf |
nan |
-lil |
nan |
+inf |
1e-10 |
+lil |
0 |
0 |
-0 |
-0 |
-lil |
-1e-10 |
-inf |
nan |
-0 |
nan |
nan |
0 |
0 |
0 |
0 |
-0 |
-0 |
-0 |
-0 |
nan |
nan |
+0 |
nan |
nan |
-0 |
-0 |
-0 |
-0 |
0 |
0 |
0 |
0 |
nan |
nan |
+lil |
nan |
-inf |
-1e-10 |
-lil |
-0 |
-0 |
0 |
0 |
+lil |
1e-10 |
+inf |
nan |
+1 |
nan |
-inf |
-big |
-1 |
-lil |
-0 |
0 |
+lil |
+1 |
+big |
+inf |
nan |
+big |
nan |
-inf |
-inf |
-big |
-1e-10 |
-0 |
0 |
1e-10 |
+big |
+inf |
+inf |
nan |
+inf |
nan |
-inf |
-inf |
-inf |
-inf |
nan |
nan |
+inf |
+inf |
+inf |
+inf |
nan |
+nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
Note that 0*infinity gives nan, and out-of-range multiplications
give infinities.
Division
/ |
-nan |
-inf |
-big |
-1 |
-lil |
-0 |
+0 |
+lil |
+1 |
+big |
+inf |
+nan |
-nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
-inf |
nan |
nan |
+inf |
+inf |
+inf |
+inf |
-inf |
-inf |
-inf |
-inf |
nan |
nan |
-big |
nan |
0 |
+1 |
+big |
+inf |
+inf |
-inf |
-inf |
-big |
-1 |
-0 |
nan |
-1 |
nan |
0 |
1e-30 |
+1 |
+inf |
+inf |
-inf |
-inf |
-1 |
-1e-30 |
-0 |
nan |
-lil |
nan |
0 |
0 |
+lil |
+1 |
+inf |
-inf |
-1 |
-lil |
-0 |
-0 |
nan |
-0 |
nan |
0 |
0 |
0 |
0 |
nan |
nan |
-0 |
-0 |
-0 |
-0 |
nan |
+0 |
nan |
-0 |
-0 |
-0 |
-0 |
nan |
nan |
0 |
0 |
0 |
0 |
nan |
+lil |
nan |
-0 |
-0 |
-lil |
-1 |
-inf |
+inf |
+1 |
+lil |
0 |
0 |
nan |
+1 |
nan |
-0 |
-1e-30 |
-1 |
-inf |
-inf |
+inf |
+inf |
+1 |
1e-30 |
0 |
nan |
+big |
nan |
-0 |
-1 |
-big |
-inf |
-inf |
+inf |
+inf |
+big |
+1 |
0 |
nan |
+inf |
nan |
nan |
-inf |
-inf |
-inf |
-inf |
+inf |
+inf |
+inf |
+inf |
nan |
nan |
+nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
Note that 0/0, and inf/inf give NaNs; while out-of-range divisions
like big/lil or 1.0/0.0 give infinities (and not errors!).
Equality
== |
-nan |
-inf |
-big |
-1 |
-lil |
-0 |
+0 |
+lil |
+1 |
+big |
+inf |
+nan |
-nan |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
-inf |
0 |
+1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
-big |
0 |
0 |
+1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
-1 |
0 |
0 |
0 |
+1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
-lil |
0 |
0 |
0 |
0 |
+1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
-0 |
0 |
0 |
0 |
0 |
0 |
+1 |
+1 |
0 |
0 |
0 |
0 |
0 |
+0 |
0 |
0 |
0 |
0 |
0 |
+1 |
+1 |
0 |
0 |
0 |
0 |
0 |
+lil |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
+1 |
0 |
0 |
0 |
0 |
+1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
+1 |
0 |
0 |
0 |
+big |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
+1 |
0 |
0 |
+inf |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
+1 |
0 |
+nan |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
Note that positive and negative zeros are considered equal, and a
"NaN" doesn't equal anything--even
itself!
Less-Than
< |
-nan |
-inf |
-big |
-1 |
-lil |
-0 |
+0 |
+lil |
+1 |
+big |
+inf |
+nan |
-nan |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
-inf |
0 |
0 |
+1 |
+1 |
+1 |
+1 |
+1 |
+1 |
+1 |
+1 |
+1 |
0 |
-big |
0 |
0 |
0 |
+1 |
+1 |
+1 |
+1 |
+1 |
+1 |
+1 |
+1 |
0 |
-1 |
0 |
0 |
0 |
0 |
+1 |
+1 |
+1 |
+1 |
+1 |
+1 |
+1 |
0 |
-lil |
0 |
0 |
0 |
0 |
0 |
+1 |
+1 |
+1 |
+1 |
+1 |
+1 |
0 |
-0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
+1 |
+1 |
+1 |
+1 |
0 |
+0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
+1 |
+1 |
+1 |
+1 |
0 |
+lil |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
+1 |
+1 |
+1 |
0 |
+1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
+1 |
+1 |
0 |
+big |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
+1 |
0 |
+inf |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
+nan |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
Note that "NaN" returns false to all comparisons--it's neither
smaller nor larger than the other numbers.