Intel's current main floating point unit is called SSE. This includes:
Here's a typical use. For a function that returns "float", the compiler will expect the answer in xmm0:
movss xmm0,[a] ; load from memory addss xmm0,xmm0 ; add to itself (double it) ret ; Done with function section .data a: dd 1.234
You can also call functions taking floats, like "print_float", which takes one argument in xmm0. Annoyingly, functions that take floats will crash if the stack is not aligned to a multiple of 16 bytes, so you need to allocate enough stack space to make that happen.
movss xmm0,[a] ; load from memory addss xmm0,xmm0 ; add to itself (double it) sub rsp,8 ; align stack for print_float extern print_float call print_float add rsp,8 ; Clean up stack ret ; Done with function section .data a: dd 1.234
Because all the xmm registers are trashable, not preserved, you can't count on your value still being there after *any* function call. This means you need to save the value to the stack. The instruction "movaps" saves the whole xmm0 register to 16 bytes of memory, but careful! "movaps" will crash if the address you pass isn't a multiple of 16 bytes, an "aligned" address.
movss xmm0,[a] ; load from memory addss xmm0,xmm0 ; add to itself (double it) sub rsp,8+16 ; align stack, and leave space for xmm0 movaps [rsp],xmm0 ; save our xmm0 extern print_float call print_float movaps xmm0,[rsp] ; restore our xmm0 add rsp,8+16 ; Clean up stack ret ; Done with function section .data a: dd 1.234
You can also print floating point info by storing it to memory, and calling a memory-accessing function like farray_print:
movss xmm0,[a] ; load from memory addss xmm0,xmm0 ; add to itself (double it) movss [a],xmm0 ; store back to memory mov rdi,a; address of our float mov rsi,1; number of floats to print sub rsp,8 ; align stack for farray_print extern farray_print call farray_print add rsp,8 ; Clean up stack ret ; Done with function section .data a: dd 1.234
The full list of single-float instructions is below. There are also double precision instructions, ending in "sd", and some very interesting parallel instructions (we'll talk about these next week).
Instruction | Comments | |
Arithmetic | addss | sub, mul, div all work the same way |
Compare | minss | max works the same way |
Sqrt | sqrtss | Square root (sqrt), reciprocal (rcp), and reciprocal-square-root (rsqrt) all work the same way |
Move | movss | Copy DWORD sized data to and from memory. |
Convert | cvtss2si cvttss2si |
Convert to ("2", get it?) Single Integer (si, stored in register like eax). "cvtt" versions do truncation (round toward zero, like C++ default); "cvt" versions round to nearest. |
Compare to flags | ucomiss | Sets CPU flags like normal x86 "cmp" instruction, but from SSE registers. Use with "jb", "jbe", "je", "jae", or "ja" for normal comparisons (but not jl, jle, jg, or jge, for some reason). Sets "pf", the parity flag, if either input is a NaN. |
Here's an example of using the instruction cvtss2si to convert to integer:
movss xmm3,[pi]; load up constant
addss xmm3,xmm3 ; add pi to itself
cvtss2si eax,xmm3 ; round to integer
ret
section .data
pi: dd 3.14159265358979 ; constant
Here we're using ucomiss to compare two floats:
movss xmm3,[a]
ucomiss xmm3,[b]
jbe wejumped
mov eax, 1
ret
wejumped:
mov eax,3
ret
a: dd 1.23
b: dd 1.27
Note that above we're using data declared with "dd" (data DWORD) and instructions ending in "ss" (scalar single-precision float), which corresponds to the C/C++ type "float" (4 bytes). You can get higher precision computations using the C/C++ type "double" (8 bytes), but you need to declare the data with "dq" (data QWORD), and use instructions ending in "sd" (scalar double-precision float). You can convert a register from one type to another with "cvtss2sd" or "cvtsd2ss", but you can't just mix and match sizes--the machine will happily interpret the wrong bits and spit out garbage!
C/C++ Name | Declare data | Instructions |
float (4 bytes) | dd 1.234 | addss xmm0,xmm0 |
double (8 bytes) | dq 1.234 | addsd xmm0,xmm0 |
movsd xmm0,[a] ; load double from memory addsd xmm0,xmm0 ; add to itself (double it) ret ; Done with function section .data a: dq 1.234
I personally try to use "float" whenever I can, since double cost twice the memory, but C/C++ tends to use double by default.
print_float and farray_print are both netrun only, but from anywhere you can call the standard C function printf to print double-precision values.
movsd xmm0,[a] ; load *double* from memory (printf expects double) addsd xmm0,xmm0 mov rdi,formatString ; integer argument: pointer to format string sub rsp,8 ; align stack mov al,1 ; Count of xmm registers passed to printf (one register, xmm0) extern printf call printf add rsp,8 ; Clean up stack ret ; Done with function section .data a: dq 1.234 formatString: db `The float = %f\n`,0
Today, SSE is the typical way to do floating point work. Some older 32-bit compilers still use the FPU (to work with very old pre-SSE hardware, like a Pentium 1), and the very latest cutting edge machines (Sandy or Ivy Bridge) can use AVX, but this is the mainstream typical version is the 64-bit standard you should probably use for your homeworks.
CS 301 Lecture Note, 2014, Dr. Orion Lawlor, UAF Computer Science Department.