Floating-Point Numbers: Format, Roundoff
CS 301 Lecture, Dr. Lawlor
Ordinary integers can only represent integral values.
"Floating-point numbers" can
represent non-integral values. This is useful for engineering,
science, statistics, graphics, and any time you need to represent
numbers from the real world, which are rarely integral!
In binary, you can represent a non-integer like "two and three-eighths" as "10.011". That is, there's:
- a 1 in the 2's place (2=21)
- a 0 in the 1's place (1=20)
- a 0 in the (beyond the "binary point") 1/2's place (1/2=2-1),
- a 1 in the 1/4's place (1/4=2-2), and
- a 1 in the 1/8's place (1/8=2-3)
for a total of two plus 1/4 plus 1/8, or "2+3/8". Note that this
is a natural measurement in carpenter's fractional inches, but it's a
weird unnatural thing in metric-style decimal inches. That is,
fractions that are (negative) powers of two have a nice binary
representation, but look weird in decimal (1/16 = 0.0001base2 = 0.0625base10). Conversely, short decimal numbers
have a nice decimal representation, but often look weird as a binary
fraction (0.2base10 = 0.001100110011...base2).
Normalized Numbers
In C++, "float" and "double" store numbers in an odd way--they're really storing the number
in scientific notation, like
x = + 3.785746 * 105
Note that:
- You only need one bit to represent the sign--plus or minus.
- The exponent's just an integer, so you can store it as an integer.
- The 3.785746 part, called the "mantissa" or just "fraction" part, can be stored as the integer 3785746 (at least
as long as you can figure out where the decimal point goes!)
Scientific notation is designed to be compatible with slide rules (here's a circular slide rule demo);
slide rules are basically a log table starting at 1. This works
because log(1) = 0, and log(a) + log(b) = log(ab). But slide
rules only give you the mantissa; you need to figure out the exponent
yourself. The "order of magnitude" guess that engineers (and I)
like so much is just a calculation using zero significant digits--no
mantissa, all exponent.
Scientific notation can represent the same number in several different
ways:
x = + 3.785746 * 105 = + 0.3785746
* 106 = + 0.003785746 * 107 = + 37.85746 * 104
It's common to "normalize" a number in scientific notation so that:
- There's exactly one digit to the left of the decimal point.
- And that digit ain't zero.
This means the 105 version above is the "normal" way to write the
number above.
In binary, a "normalized" number *always* has a 1 at the left of the
decimal point (if it ain't zero, it's gotta be one). So sometimes there's no reason to even store the 1; you just
know it's there!
(Note that there are also "denormalized" numbers, like 0.0, that don't
have
a leading 1. This is how zero is represented--there's an implicit
leading 1 only if the exponent field is nonzero, an implicit leading 0
if the exponent field is
zero...)
Roundoff in Arithmetic
They're funny old things, floats. The fraction part (mantissa) only stores
so much precision; further bits are lost. For example, in reality,
1.234* 104 +
7.654* 100 = 1.2347654 * 104
But if we only keep three decimal places,
1.234* 104 +
7.654* 100 = 1.234 * 104
which is to say, adding a tiny value to a great big value might not
change the great big value at
all,
because the tiny value gets lost when rounding off to 3
places. To avoid this "roundoff error", when you're doing
arithmetic by hand, people recommend keeping lots of digits, and only
rounding once, at the end. But for a given value of "lots of
digits", did you keep enough?
For example, on a real computer adding one repeatedly will eventually stop doing anything:
float f=0.73;
while (1) {
volatile float g=f+1;
if (g==f) {
printf("f+1 == f at f=%.3f, or 2^%.3f\n",
f,log(f)/log(2.0));
return 0;
}
else f=g;
}
(executable
NetRun link)
Recall that for integers, adding one repeatedly will *never* give you
the same value--eventually the integer will wrap around, but it won't
just stop moving like floats!
For another example, floating-point arithmetic isn't "associative"--if
you change the order
of operations, you change the result (up to roundoff):
1.2355308 * 104 = 1.234* 104 +
(7.654* 100 + 7.654* 100)
1.2355308 * 104 = (1.234* 104
+ 7.654* 100) + 7.654* 100
In other words, parenthesis don't matter if you're computing the exact
result. But to three decimal places,
1.235 * 104 = 1.234* 104 +
(7.654* 100 + 7.654* 100)
1.234 * 104 = (1.234* 104 +
7.654* 100) + 7.654* 100
In the first line, the small values get added together, and
together they're enough to move the big value. But separately,
they splat like bugs against the windshield of the big value, and don't
affect it at all!
double lil=1.0;
double big=pow(2.0,64);
printf(" big+(lil+lil) -big = %.0f\n", big+(lil+lil) -big);
printf("(big+lil)+lil -big = %.0f\n",(big+lil)+lil -big);
(executable
NetRun link)
float gnats=1.0;
volatile float windshield=1<<24;
float orig=windshield;
for (int i=0;i<1000;i++)
windshield += gnats;
if (windshield==orig) std::cout<<"You puny bugs can't harm me!\n";
else std::cout<<"Gnats added "<<windshield-orig<<" to the windshield\n";
(executable
NetRun link)
In fact, if you've got a bunch of small values to add to a big value,
it's more roundoff-friendly to add all the small values together first,
then add them all to
the big value:
float gnats=1.0;
volatile float windshield=1<<24;
float orig=windshield;
volatile float gnatcup=0.0;
for (int i=0;i<1000;i++)
gnatcup += gnats;
windshield+=gnatcup; /* add all gnats to the windshield at once */
if (windshield==orig) std::cout<<"You puny bugs can't harm me!\n";
else std::cout<<"Gnats added "<<windshield-orig<<" to the windshield\n";
(executable
NetRun link)
Roundoff can be very annoying, but it doesn't matter if you don't care
about exact answers, like in many simulations (where "exact" means the same
as the real world, which you'll never get anyway) or games.
One very frustrating fact is that roundoff depends on the precision you
keep in your numbers. This, in turn, depends on the size of the
numbers. For example, a "float" is just 4 bytes, but it's not
very precise. A "double" is 8 bytes, but it's more precise.
A "long double" is 12 bytes (or more!), but it's got tons of
precision. There's often a serious tradeoff between precision and
space (and time), so just using long double for everything isn't a good
idea: your program may get bigger and slower, and you still might not
have enough precision.
Roundoff in Representation
Sadly, 0.1 decimal is an infinitely repeating pattern in binary: 0.0(0011),
with 0011 repeating. This means multiplying by some *finite*
pattern to approximate 0.1 is only an approximation of really dividing
by the integer 10.0. The exact difference is proportional to the
precision of the numbers and the size of the input data:
for (int i=1;i<1000000000;i*=10) {
double mul01=i*0.1;
double div10=i/10.0;
double diff=mul01-div10;
std::cout<<"i="<<i<<" diff="<<diff<<"\n";
}
(executable NetRun link)
In a perfect world, multiplying by 0.1 and dividing by 10 would give
the exact same result. But in reality, 0.1 has to be approximated
by a finite series of binary digits, while the integer 10 can be stored
exactly, so on the NetRun Pentium4 CPU, this code gives:
i=1 diff=5.54976e-18
i=10 diff=5.55112e-17
i=100 diff=5.55112e-16
i=1000 diff=5.55112e-15
i=10000 diff=5.55112e-14
i=100000 diff=5.55112e-13
i=1000000 diff=5.54934e-12
i=10000000 diff=5.5536e-11
i=100000000 diff=5.54792e-10
Program complete. Return 0 (0x0)
That is, there's a factor of 10^-18 difference between double-precision 0.1 and the true 1/10! This can add up over time.
Roundoff Taking Over Control
One place roundoff is very annoying is in your control
structures. For example, this loop will execute *seven* times,
even though it looks like it should only execute *six* times:
for (double k=0.0;k<1.0;k+=1.0/6.0) {
printf("k=%a (about %.15f)\n",k,k);
}
(executable NetRun link)
The trouble is of course that 1/6 can't be represented exactly in
floating-point, so if we add our approximation for 1/6 six times, we
haven't quite hit 1.0, so the loop executes one additional time.
There are several possible fixes for this:
- Don't use floating-point as your loop variable. Loop over an
integer i (without roundoff), and divide by six to get k. This is
the recommended approach if you care about the exact number of times
around the loop.
- Or you could adjust the loop
termination condition so it's
"k<1.0-0.00001", where the "0.00001" provides some safety margin for
roundoff. This sort of "epsilon" value is common along
floating-point boundaries, although too small and you can still get
roundoff, and too big and you've screwed up the computation.
- Or you could use a lower-precision comparison, like
"(float)k<1.0f". This also provides roundoff margin, because
the comparison is taking place at the lower "float" precision.
Any of these fixes will work, but you do have to realize this is a
potential problem, and put the precision-compensation code in there!
Bits in a Floating-Point Number
Floats represent continuous values. But they do it using discrete
bits.
A "float" (as defined by IEEE Standard
754) consists of three bitfields:
Sign
|
Exponent
|
Fraction (or
"Mantissa")
|
1 bit--
0 for positive
1 for negative
|
8 unsigned bits--
127 means 20
137 means 210
|
23 bits-- a binary fraction.
Don't forget the implicit
leading 1!
|
The sign is in the highest-order bit, the exponent in the next 8 bits,
and the fraction in the remaining bits.
The hardware interprets a float as having the value:
value = (-1) sign
* 2 (exponent-127) * 1.fraction
Note that the mantissa has an implicit leading binary 1 applied
(unless the exponent field is zero, when it's an implicit leading 0; a
"denormalized" number).
For example, the value "8" would be stored with sign bit 0, exponent
130 (==3+127), and mantissa 000... (without the leading 1), since:
8 = (-1) 0
* 2 (130-127) * 1.0000....
You can stare at the bits inside a float by converting it to an
integer. The quick and dirty way to do this is via a pointer
typecast, but modern compilers will sometimes over-optimize this,
especially in inlined code:
void print_bits(float f) {
int i=*reinterpret_cast<int *>(&f); /* read bits with "pointer shuffle" */
std::cout<<" float "<<std::setw(10)<<f<<" = ";
for (int bit=31;bit>=0;bit--) {
if (i&(1<<bit)) std::cout<<"1"; else std::cout<<"0";
if (bit==31) std::cout<<" ";
if (bit==23) std::cout<<" (implicit 1).";
}
std::cout<<std::endl;
}
int foo(void) {
print_bits(0.0);
print_bits(-1.0);
print_bits(1.0);
print_bits(2.0);
print_bits(4.0);
print_bits(8.0);
print_bits(1.125);
print_bits(1.25);
print_bits(1.5);
print_bits(1+1.0/10);
return sizeof(float);
}
(Try this in NetRun now!)
The official way to dissect the parts of a float is using a "union" and a
bitfield like so:
/* IEEE floating-point number's bits: sign exponent mantissa */
struct float_bits {
unsigned int fraction:23; /**< Value is binary 1.fraction ("mantissa") */
unsigned int exp:8; /**< Value is 2^(exp-127) */
unsigned int sign:1; /**< 0 for positive, 1 for negative */
};
/* A union is a struct where all the fields *overlap* each other */
union float_dissector {
float f;
float_bits b;
};
float_dissector s;
s.f=8.0;
std::cout<<s.f<<"= sign "<<s.b.sign<<" exp "<<s.b.exp<<" fract "<<s.b.fraction<<"\n";
return 0;
(Executable
NetRun link)
In addition to the 32-bit "float", there are several other different sizes of floating-point types:
C Datatype
|
Size
|
Approx. Precision
|
Approx. Range
|
Exponent Bits
|
Fraction Bits
|
+-1 range
|
float
|
4 bytes (everywhere)
|
1.0x10-7
|
1038
|
8
|
23
|
224
|
double
|
8 bytes (everywhere)
|
2.0x10-15
|
10308
|
11
|
52
|
253
|
long double
|
12-16 bytes (if it even exists)
|
2.0x10-20
|
104932
|
15
|
64
|
265
|
Nowadays floats have roughly the same
performance as
integers:
addition takes about a nanosecond, multiplication takes a few nanoseconds; and division takes a
dozen or more nanoseconds. That is, floats are now cheap, and you
can consider using floats for all sorts of stuff--even when you don't
care about fractions! The advantages of using floats are:
- Floats can store fractional numbers.
- Floats never overflow; they hit "infinity" as explored below.
- "double" has more bits than "int" (but less than "long").
Normal (non-Weird) Floats
To summarize, a "float" as as defined by IEEE Standard 754 consists of three bitfields:
Sign
|
Exponent
|
Mantissa (or Fraction)
|
1 bit--
0 for positive
1 for negative
|
8 bits--
127 means 20
137 means 210
|
23 bits-- a binary fraction.
|
The hardware usually interprets a float as having the value:
value = (-1) sign * 2 (exponent-127) * 1.fraction
Note that the mantissa normally has an implicit leading 1 applied.
Weird: Zeros and Denormals
However, if the "exponent"
field is exactly zero, the implicit leading digit is taken to be 0, like this:
value = (-1) sign * 2 (-126) * 0.fraction
Supressing the leading 1 allows you to exactly represent 0:
the bit pattern for 0.0 is just exponent==0 and
fraction==00000000 (that is, everything zero). If you set the
sign bit to negative, you have "negative zero", a strange
curiosity. Positive and negative zero work the same way in
arithmetic operations, and as far as I know there's no reason to prefer
one to the other. The "==" operator claims positive and negative zero are the same!
If the fraction field isn't zero, but the exponent field is, you have a
"denormalized number"--these are numbers too small to represent with a
leading one. You always need denormals to represent zero, but
denormals (also known as "subnormal" values) also provide a little more
range at the very
low end--they can store values down to around 1.0e-40 for "float", and
1.0e-310
for "double".
See below for the performance problem with
denormals.
Weird: Infinity
If the exponent field is as big as it can get (for "float", 255), this
indicates another sort of special number. If the fraction field
is zero, the number is interpreted as positive or negative
"infinity". The hardware will generate "infinity" when dividing
by zero, or when another operation exceeds the representable range.
float z=0.0;
float f=1.0/z;
std::cout<<f<<"\n";
return (int)f;
(Try this in NetRun now!)
Arithmetic on infinities works just the way you'd expect:infinity plus
1.0 gives infinity, etc. (See tables below). Positive and
negative infinities exist, and work as you'd expect. Note that
while divide-by-integer-zero causes a crash (divide by zero
error), divide-by-floating-point-zero just happily returns infinity by
default.
Weird: NaN
If you do an operation that doesn't make sense, like:
- 0.0/0.0 (neither zero nor infinity, because we'd want (x/x)==1.0; but not 1.0 either, because we'd want (2*x)/x==2.0...)
- infinity-infinity (might cancel out to anything)
- infinity*0
The machine just gives a special "error" number called a "NaN"
(Not-a-Number). The idea is if you run some complicated program
that screws up, you don't want to get a plausible but wrong answer like
"4" (like we get with integer overflow!); you want something totally
implausible like "nan" to indicate an error happened. For
example, this program prints "nan" and returns -2147483648 (0x80000000):
float f=sqrt(-1.0);
std::cout<<f<<"\n";
return (int)f;
(Try this in NetRun now!)
This is a "NaN", which is represented with a huge exponent and a
*nonzero* fraction field. Positive and negative nans exist, but
like zeros both signs seem to work the same. x86 seems to rewrite the bits
of all NaNs to a special pattern it prefers (0x7FC00000 for float, with
exponent bits and the leading fraction bit all set to 1).
Bonus: Performance impact of special values
Machines properly handle ordinary floating-point numbers and zero in hardware at full speed.
However, most modern machines *don't* handle denormals, infinities, or
NaNs in hardware--instead when one of these special values occurs, they
trap out to software which handles the problem and restarts the
computation. This trapping
process takes time, as shown in the following program:
(Executable NetRun Link)
enum {n_vals=1000};
double vals[n_vals];
int average_vals(void) {
for (int i=0;i<n_vals-1;i++)
vals[i]=0.5*(vals[i]+vals[i+1]);
return 0;
}
int foo(void) {
int i;
for (i=0;i<n_vals;i++) vals[i]=0.0;
printf(" Zeros: %.3f ns/float\n",time_function(average_vals)/n_vals*1.0e9);
for (i=0;i<n_vals;i++) vals[i]=1.0;
printf(" Ones: %.3f ns/float\n",time_function(average_vals)/n_vals*1.0e9);
for (i=0;i<n_vals;i++) vals[i]=1.0e-310;
printf(" Denorm: %.3f ns/float\n",time_function(average_vals)/n_vals*1.0e9);
float x=0.0;
for (i=0;i<n_vals;i++) vals[i]=1.0/x;
printf(" Inf: %.3f ns/float\n",time_function(average_vals)/n_vals*1.0e9);
for (i=0;i<n_vals;i++) vals[i]=x/x;
printf(" NaN: %.3f ns/float\n",time_function(average_vals)/n_vals*1.0e9);
return 0;
}
On my P4, this gives 3ns for zeros and ordinary values, 300ns for
denormals (a 100x slowdown), and 700ns for infinities and NaNs (a 200x
slowdown)!
On my PowerPC 604e, this gives 35ns for zeros, 65ns for denormals (a 2x
slowdown), and 35ns for infinities and NaNs (no penalty).
My friends at Illinois and I wrote a paper on this with many more performance details.
Bonus: Arithmetic Tables for Special Floating-Point Numbers
These tables were computed for "float", but should be identical with any
number size on any IEEE machine (which virtually everything is).
"big" is a large but finite number, here
1.0e30. "lil" is a denormalized number, here 1.0e-40. "inf" is an
infinity. "nan" is a Not-A-Number. Here's the source code to generate these tables.
These all go exactly how you'd expect--"inf" for things that are too
big (or -inf for too small), "nan" for things that don't make sense (like 0.0/0.0, or infinity
times zero, or nan with anything else).
Addition
+ |
-nan |
-inf |
-big |
-1 |
-lil |
-0 |
+0 |
+lil |
+1 |
+big |
+inf |
+nan |
-nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
-inf |
nan |
-inf |
-inf |
-inf |
-inf |
-inf |
-inf |
-inf |
-inf |
-inf |
nan |
nan |
-big |
nan |
-inf |
-2e+30 |
-big |
-big |
-big |
-big |
-big |
-big |
0 |
+inf |
nan |
-1 |
nan |
-inf |
-big |
-2 |
-1 |
-1 |
-1 |
-1 |
0 |
+big |
+inf |
nan |
-lil |
nan |
-inf |
-big |
-1 |
-2e-40 |
-lil |
-lil |
0 |
+1 |
+big |
+inf |
nan |
-0 |
nan |
-inf |
-big |
-1 |
-lil |
-0 |
0 |
+lil |
+1 |
+big |
+inf |
nan |
+0 |
nan |
-inf |
-big |
-1 |
-lil |
0 |
0 |
+lil |
+1 |
+big |
+inf |
nan |
+lil |
nan |
-inf |
-big |
-1 |
0 |
+lil |
+lil |
2e-40 |
+1 |
+big |
+inf |
nan |
+1 |
nan |
-inf |
-big |
0 |
+1 |
+1 |
+1 |
+1 |
2 |
+big |
+inf |
nan |
+big |
nan |
-inf |
0 |
+big |
+big |
+big |
+big |
+big |
+big |
2e+30 |
+inf |
nan |
+inf |
nan |
nan |
+inf |
+inf |
+inf |
+inf |
+inf |
+inf |
+inf |
+inf |
+inf |
nan |
+nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
Note how infinity-infinity gives nan, but infinity+infinity is infinity.
Subtraction
- |
-nan |
-inf |
-big |
-1 |
-lil |
-0 |
+0 |
+lil |
+1 |
+big |
+inf |
+nan |
-nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
-inf |
nan |
nan |
-inf |
-inf |
-inf |
-inf |
-inf |
-inf |
-inf |
-inf |
-inf |
nan |
-big |
nan |
+inf |
0 |
-big |
-big |
-big |
-big |
-big |
-big |
-2e+30 |
-inf |
nan |
-1 |
nan |
+inf |
+big |
0 |
-1 |
-1 |
-1 |
-1 |
-2 |
-big |
-inf |
nan |
-lil |
nan |
+inf |
+big |
+1 |
0 |
-lil |
-lil |
-2e-40 |
-1 |
-big |
-inf |
nan |
-0 |
nan |
+inf |
+big |
+1 |
+lil |
0 |
-0 |
-lil |
-1 |
-big |
-inf |
nan |
+0 |
nan |
+inf |
+big |
+1 |
+lil |
0 |
0 |
-lil |
-1 |
-big |
-inf |
nan |
+lil |
nan |
+inf |
+big |
+1 |
2e-40 |
+lil |
+lil |
0 |
-1 |
-big |
-inf |
nan |
+1 |
nan |
+inf |
+big |
2 |
+1 |
+1 |
+1 |
+1 |
0 |
-big |
-inf |
nan |
+big |
nan |
+inf |
2e+30 |
+big |
+big |
+big |
+big |
+big |
+big |
0 |
-inf |
nan |
+inf |
nan |
+inf |
+inf |
+inf |
+inf |
+inf |
+inf |
+inf |
+inf |
+inf |
nan |
nan |
+nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
Multiplication
* |
-nan |
-inf |
-big |
-1 |
-lil |
-0 |
+0 |
+lil |
+1 |
+big |
+inf |
+nan |
-nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
-inf |
nan |
+inf |
+inf |
+inf |
+inf |
nan |
nan |
-inf |
-inf |
-inf |
-inf |
nan |
-big |
nan |
+inf |
+inf |
+big |
1e-10 |
0 |
-0 |
-1e-10 |
-big |
-inf |
-inf |
nan |
-1 |
nan |
+inf |
+big |
+1 |
+lil |
0 |
-0 |
-lil |
-1 |
-big |
-inf |
nan |
-lil |
nan |
+inf |
1e-10 |
+lil |
0 |
0 |
-0 |
-0 |
-lil |
-1e-10 |
-inf |
nan |
-0 |
nan |
nan |
0 |
0 |
0 |
0 |
-0 |
-0 |
-0 |
-0 |
nan |
nan |
+0 |
nan |
nan |
-0 |
-0 |
-0 |
-0 |
0 |
0 |
0 |
0 |
nan |
nan |
+lil |
nan |
-inf |
-1e-10 |
-lil |
-0 |
-0 |
0 |
0 |
+lil |
1e-10 |
+inf |
nan |
+1 |
nan |
-inf |
-big |
-1 |
-lil |
-0 |
0 |
+lil |
+1 |
+big |
+inf |
nan |
+big |
nan |
-inf |
-inf |
-big |
-1e-10 |
-0 |
0 |
1e-10 |
+big |
+inf |
+inf |
nan |
+inf |
nan |
-inf |
-inf |
-inf |
-inf |
nan |
nan |
+inf |
+inf |
+inf |
+inf |
nan |
+nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
Note that 0*infinity gives nan, and out-of-range multiplications give infinities.
Division
/ |
-nan |
-inf |
-big |
-1 |
-lil |
-0 |
+0 |
+lil |
+1 |
+big |
+inf |
+nan |
-nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
-inf |
nan |
nan |
+inf |
+inf |
+inf |
+inf |
-inf |
-inf |
-inf |
-inf |
nan |
nan |
-big |
nan |
0 |
+1 |
+big |
+inf |
+inf |
-inf |
-inf |
-big |
-1 |
-0 |
nan |
-1 |
nan |
0 |
1e-30 |
+1 |
+inf |
+inf |
-inf |
-inf |
-1 |
-1e-30 |
-0 |
nan |
-lil |
nan |
0 |
0 |
+lil |
+1 |
+inf |
-inf |
-1 |
-lil |
-0 |
-0 |
nan |
-0 |
nan |
0 |
0 |
0 |
0 |
nan |
nan |
-0 |
-0 |
-0 |
-0 |
nan |
+0 |
nan |
-0 |
-0 |
-0 |
-0 |
nan |
nan |
0 |
0 |
0 |
0 |
nan |
+lil |
nan |
-0 |
-0 |
-lil |
-1 |
-inf |
+inf |
+1 |
+lil |
0 |
0 |
nan |
+1 |
nan |
-0 |
-1e-30 |
-1 |
-inf |
-inf |
+inf |
+inf |
+1 |
1e-30 |
0 |
nan |
+big |
nan |
-0 |
-1 |
-big |
-inf |
-inf |
+inf |
+inf |
+big |
+1 |
0 |
nan |
+inf |
nan |
nan |
-inf |
-inf |
-inf |
-inf |
+inf |
+inf |
+inf |
+inf |
nan |
nan |
+nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
nan |
Note that 0/0, and inf/inf give NaNs; while out-of-range divisions like big/lil or 1.0/0.0 give infinities (and not errors!).
Equality
== |
-nan |
-inf |
-big |
-1 |
-lil |
-0 |
+0 |
+lil |
+1 |
+big |
+inf |
+nan |
-nan |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
-inf |
0 |
+1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
-big |
0 |
0 |
+1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
-1 |
0 |
0 |
0 |
+1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
-lil |
0 |
0 |
0 |
0 |
+1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
-0 |
0 |
0 |
0 |
0 |
0 |
+1 |
+1 |
0 |
0 |
0 |
0 |
0 |
+0 |
0 |
0 |
0 |
0 |
0 |
+1 |
+1 |
0 |
0 |
0 |
0 |
0 |
+lil |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
+1 |
0 |
0 |
0 |
0 |
+1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
+1 |
0 |
0 |
0 |
+big |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
+1 |
0 |
0 |
+inf |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
+1 |
0 |
+nan |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
Note that positive and negative zeros are considered equal, and a "NaN" doesn't equal anything--even itself!
Less-Than
< |
-nan |
-inf |
-big |
-1 |
-lil |
-0 |
+0 |
+lil |
+1 |
+big |
+inf |
+nan |
-nan |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
-inf |
0 |
0 |
+1 |
+1 |
+1 |
+1 |
+1 |
+1 |
+1 |
+1 |
+1 |
0 |
-big |
0 |
0 |
0 |
+1 |
+1 |
+1 |
+1 |
+1 |
+1 |
+1 |
+1 |
0 |
-1 |
0 |
0 |
0 |
0 |
+1 |
+1 |
+1 |
+1 |
+1 |
+1 |
+1 |
0 |
-lil |
0 |
0 |
0 |
0 |
0 |
+1 |
+1 |
+1 |
+1 |
+1 |
+1 |
0 |
-0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
+1 |
+1 |
+1 |
+1 |
0 |
+0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
+1 |
+1 |
+1 |
+1 |
0 |
+lil |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
+1 |
+1 |
+1 |
0 |
+1 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
+1 |
+1 |
0 |
+big |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
+1 |
0 |
+inf |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
+nan |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
Note that "NaN" returns false to all comparisons--it's neither smaller nor larger than the other numbers.