Sign |
Exponent |
Fraction (or
"Mantissa") |
1 bit-- 0 for positive 1 for negative |
8 unsigned bits-- 127 means 20 137 means 210 |
23 bits-- a binary fraction. Don't forget the implicit leading 1! |
union unholy_t { /* a union between a float and an integer */For example, we can use integer bitwise operations to zero out the float's sign bit, making a quite cheap floating-point absolute value operation:
public:
float f;
int i;
};
int foo(void) {
unholy_t unholy;
unholy.f=3.0; /* put in a float */
return unholy.i; /* take out an integer */
}
float val=-3.1415;Back before SSE, floating point to integer conversion in C++ was really really slow. The problem is that the same x86 FPU control word bits affect rounding both for float operations like addition and for float-to-int conversion. For example, this float-to-int code takes 55ns(!) on a pre-SSE Pentium III:
int foo(void) {
unholy_t unholy;
unholy.f=val; /* put in a negative float */
unholy.i=unholy.i&0x7fFFffFF; /* mask off the float's sign bit */
return unholy.f; /* now the float is positive! */
}
float val=+3.1415;The problem is evident in the assembly code--you've got to save the old control word out to memory, switch its rounding mode to integer, load the new control word, do the integer conversion, and finally load the original control word to resume normal operation.
int foo(void) {
return (int)(val+0.0001);
}
union unholy_t { /* a union between a float and an integer */This "fast float-to-integer trick" has been independently discovered by many smart people, including:
public:
float f;
int i;
};
float val=+3.1415;
int foo(void) {
unholy_t unholy;
unholy.f=val+(1<<23); /* scrape off the fraction bits with the weird constant */
return unholy.i&0x7FffFF; /* mask off the float's sign and exponent bits */
}