Denormal Numbers, and Roundoff

CS 301 Lecture, Dr. Lawlor

A denormal (tiny) number doesn't have an implicit leading 1; it's got an implicit leading 0.  This means the ciruitry for doing arithmetic on denormal numbers doesn't look much like the circuitry for doing arithmetic on ordinary normalized numbers.  On modern machines, there *is* no circuitry for computing on denormal numbers--instead, the hardware traps out to software when it hits a denormal value.

This would be fine, except software is way slower than hardware--about 25x slower on the NetRun machine!
float f=pow(2,-128); // denormal

int foo(void) {
f*=1.00001; //<- you can do almost any operation here...
return 0;
}
(executable NetRun link)

As written, this takes 328ns per execution of foo, which is crazy slow.

If you initialize f to 27.2 (or any non-denormal value), this takes like 13ns per execution, which is reasonable.

That is, floating-point code can take absurdly longer when computing denormals (infinities have the same problem).  Denormals can have a huge impact on the performance of real code--I've written a paper on this.

The easiest way to fix denormals is to round them off to zero--one trick for doing this is to just add a big value ("big" compared to a denormal can be like 1.0e-10) and then subtract it off again!

Roundoff

0.1 decimal is an infinitely repeating pattern in binary (0.0(0011), with 0011 repeating).  This means multiplying by some *finite* pattern to approximate 0.1 is only an approximation of really dividing by the integer 10.0.  The exact difference is proportional to the precision of the numbers and the size of the input data:
for (int i=1;i<1000000000;i*=10) {
double mul01=i*0.1;
double div10=i/10.0;
double diff=mul01-div10;
std::cout<<"i="<<i<<" diff="<<diff<<"\n";
}
(executable NetRun link)

On my P4, this gives:
i=1  diff=5.54976e-18
i=10 diff=5.55112e-17
i=100 diff=5.55112e-16
i=1000 diff=5.55112e-15
i=10000 diff=5.55112e-14
i=100000 diff=5.55112e-13
i=1000000 diff=5.54934e-12
i=10000000 diff=5.5536e-11
i=100000000 diff=5.54792e-10
Program complete. Return 0 (0x0)
That is, there's a factor of 10^-15 difference between double-precision 0.1 and the true 1/10!