Denormal Numbers, and Roundoff
CS 301 Lecture, Dr. Lawlor
A denormal (tiny) number doesn't have an implicit leading 1; it's
got an implicit leading 0. This means the ciruitry for doing
arithmetic on denormal numbers doesn't look much like the circuitry for
doing arithmetic on ordinary normalized numbers. On modern
machines, there *is* no circuitry for computing on denormal
numbers--instead, the hardware traps out to software when it hits a
denormal value.
This would be fine, except software is way slower than hardware--about 25x slower on the NetRun machine!
float f=pow(2,-128); // denormal
int foo(void) {
f*=1.00001; //<- you can do almost any operation here...
return 0;
}
(executable NetRun link)
As written, this takes 328ns per execution of foo, which is crazy slow.
If you initialize f to 27.2 (or any non-denormal value), this takes like 13ns per execution, which is reasonable.
That is, floating-point code can take absurdly longer when computing
denormals (infinities have the same problem). Denormals can have
a huge impact on the performance of real code--I've written a paper on this.
The easiest way to fix denormals is to round them off to zero--one
trick for doing this is to just add a big value ("big" compared to a
denormal can be like 1.0e-10) and then subtract it off again!
Roundoff
0.1 decimal is an infinitely repeating pattern in binary (0.0(0011),
with 0011 repeating). This means multiplying by some *finite*
pattern to approximate 0.1 is only an approximation of really dividing
by the integer 10.0. The exact difference is proportional to the
precision of the numbers and the size of the input data:
for (int i=1;i<1000000000;i*=10) {
double mul01=i*0.1;
double div10=i/10.0;
double diff=mul01-div10;
std::cout<<"i="<<i<<" diff="<<diff<<"\n";
}
(executable NetRun link)
On my P4, this gives:
i=1 diff=5.54976e-18
i=10 diff=5.55112e-17
i=100 diff=5.55112e-16
i=1000 diff=5.55112e-15
i=10000 diff=5.55112e-14
i=100000 diff=5.55112e-13
i=1000000 diff=5.54934e-12
i=10000000 diff=5.5536e-11
i=100000000 diff=5.54792e-10
Program complete. Return 0 (0x0)
That is, there's a factor of 10^-15 difference between double-precision 0.1 and the true 1/10!