Creative Uses of the Jump Instruction
CS 301 Lecture, Dr. Lawlor
warning: comparison between signed and unsigned integer expressions
Why do you care about all those silly jump instructions we talked about last class?
Well, for one thing, it affects how you can write C++ code. For
example, this code produces a warning, then gives the wrong answer
"999":
unsigned int i=3000000000;
if (i<-3) return 999; /* what?! */
else return 0;
(Try this in NetRun now!)
The problem is, in assembly language:
- When comparing unsigned numbers, you use "ja" and "jb", which read the CF (carry flag).
- When comparing signed numbers, you use "jg" and "jl", which read SF and OF (sign and overflow flags).
When comparing unsigned i to signed -3, which jump do you use?
Well, sadly, they both give the wrong answer! This is a hardware
limitation (no mixed-sign comparisons) that ends up as a language
limitation (warning and wrong answer for mixed-sign comparisons).
In the above code, if the compiler decides to perform the signed comparison:
unsigned 3 billion < signed -3
(convert to signed)
signed -1 billion < signed -3
So this will return true: 3 billion is less than -3 if they're both converted to signed 32-bit values.
It doesn't actually help if we perform an unsigned comparison:
unsigned 3 billion < signed -3
(convert to unsigned)
unsigned 3 billion < signed 4 billion
This will also return true: 3 billion is less than -3 if they're both converted to unsigned 32-bit values.
On a 64-bit machine, you can fix this problem by typecasting both sides
to a "long" before doing the compare. This works because a signed
long has enough bits to represent +3 billion without wraparound:
unsigned int i=3000000000;
if ((long)i<(long)-3) return 999;
else return 0;
(Try this in NetRun now!)
This fixes the problem, but "unsigned long" suffers from the same exact malfunction at 15 quadrillion:
unsigned long i=15000000000000000000ul;
if ((long)i<(long)-3) return 999;
else return 0;
(Try this in NetRun now!)
Bottom line: you can't reliably compare signed and unsigned numbers. Pick one or the other, and stick with it.
Reminder: negative signed numbers are represented by setting the high
bit, which corresponds to a really huge unsigned number; similarly,
huge unsigned numbers like above correspond to negative signed
numbers.
Low-Branch Bounds Checking
You can actually exploit the signed/unsigned comparison differences to speed up this common bounds-checking code:
int i=3;
const int n=10;
int arr[n];
int foo(void) {
if (i>=0 && i<n) return arr[i];
else return 0;
}
(Try this in NetRun now!)
Note that we have to do two separate tests on "i": it could be
negative, so we check against zero; or it could be n or bigger, so we
compare against n. But if you look at the code the compiler
generates, there's only one test on "i":
mov edx, <<i>>
mov eax,0x0
cmp edx,0x9
ja bad
... load up arr[i] ...
bad: ret
Wait, only one comparison, and it's an *unsigned* compare? But "i" is signed!
This is intentional: if "i" is negative (signed), then treated as an
unsigned number, it's huge. This lets us replace a two-test
signed "negative or too big" with a single unsigned "too big" test!
Overflow Checking in Assembly
C++ just blindly ignores overflow, so it's possible for a program to
fail silently and give you the wrong answer. In assembly, you can
check the overflow flag with "jo" (jump-if-overflow), like this:
mov edi,1000000000
add edi,edi
jo uhoh ; error-check the add
mov eax,edi
ret ; ordinary return
uhoh:
mov eax,-999 ; Special value to indicate we had an overflow
ret ; error return
(Try this in NetRun now!)
This program will hence return the right answer, or admit something
failed. You do have to be careful to do a "jo uhoh" after every
possible arithmetic instruction, since the overflow flag gets re-set
after each "add".
BONUS: Tail Call
It's surprisingly easy to define a function in assembly language: just make a label, and "call" it:
mov edi,1000
call myfunction
add eax,7
ret
myfunction:
mov eax,edi ; copy our first parameter into eax (to be returned)
ret ; go back to foo
(Try this in NetRun now!)
"call" and "ret" match one another, so myfunction's "ret" goes back to
our first function. If you use "jmp" instead of "call", "ret"
will return all the way back out to main, so this code will return 1000:
mov edi,1000
jmp myfunction
add eax,7 ; <- never executed!
ret
myfunction:
mov eax,edi ; copy our first parameter into eax (to be returned)
ret ; go back to main
(Try this in NetRun now!)
In fact, the only difference between "jmp" and "call" is what happens
at the following "ret"; so if you want "ret" to return to your function
again, you can actually speed up your code by replacing "call" with
"jmp". This is a "tail call", and should become more clear once we talk about the stack.