Creative Uses of the Jump Instruction

warning: comparison between signed and unsigned integer expressions

Why do you care about all those silly jump instructions we talked about last class?

Well, for one thing, it affects how you can write C++ code. For example, this code produces a warning, then gives the wrong answer "999":

unsigned int i=3000000000;
if (i<-3) return 999; /* what?! */
else return 0;

(Try this in NetRun now!)

The problem is, in assembly language:

When comparing unsigned numbers, you use "ja" and "jb", which read the CF (carry flag).
When comparing signed numbers, you use "jg" and "jl", which read SF and OF (sign and overflow flags).

When comparing unsigned i to signed -3, which jump do you use? Well, sadly, they both give the wrong answer! This is a hardware limitation (no mixed-sign comparisons) that ends up as a language limitation (warning and wrong answer for mixed-sign comparisons).

In the above code, if the compiler decides to perform the signed comparison:
unsigned 3 billion < signed -3
(convert to signed)
signed -1 billion < signed -3
So this will return true: 3 billion is less than -3 if they're both converted to signed 32-bit values.

It doesn't actually help if we perform an unsigned comparison:
unsigned 3 billion < signed -3
(convert to unsigned)
unsigned 3 billion < signed 4 billion
This will also return true: 3 billion is less than -3 if they're both converted to unsigned 32-bit values.

On a 64-bit machine, you can fix this problem by typecasting both sides to a "long" before doing the compare. This works because a signed long has enough bits to represent +3 billion without wraparound:

unsigned int i=3000000000;
if ((long)i<(long)-3) return 999;
else return 0;

(Try this in NetRun now!)

This fixes the problem, but "unsigned long" suffers from the same exact malfunction at 15 quadrillion:

unsigned long i=15000000000000000000ul;
if ((long)i<(long)-3) return 999;
else return 0;

(Try this in NetRun now!)

Bottom line: you can't reliably compare signed and unsigned numbers. Pick one or the other, and stick with it.

Reminder: negative signed numbers are represented by setting the high bit, which corresponds to a really huge unsigned number; similarly, huge unsigned numbers like above correspond to negative signed numbers.

Low-Branch Bounds Checking

You can actually exploit the signed/unsigned comparison differences to speed up this common bounds-checking code:

int i=3;
const int n=10;
int arr[n];
int foo(void) {
	if (i>=0 && i<n) return arr[i];
	else return 0;
}

(Try this in NetRun now!)

Note that we have to do two separate tests on "i": it could be negative, so we check against zero; or it could be n or bigger, so we compare against n. But if you look at the code the compiler generates, there's only one test on "i":

	mov    edx, <<i>>
	mov    eax,0x0
	cmp    edx,0x9
	ja     bad
	... load up arr[i] ...
bad:	ret

Wait, only one comparison, and it's an *unsigned* compare? But "i" is signed!

This is intentional: if "i" is negative (signed), then treated as an unsigned number, it's huge. This lets us replace a two-test signed "negative or too big" with a single unsigned "too big" test!

Overflow Checking in Assembly

C++ just blindly ignores overflow, so it's possible for a program to fail silently and give you the wrong answer. In assembly, you can check the overflow flag with "jo" (jump-if-overflow), like this:

mov edi,1000000000

add edi,edi
jo uhoh ; error-check the add

mov eax,edi
ret ; ordinary return

uhoh:
mov eax,-999 ; Special value to indicate we had an overflow
ret ; error return

(Try this in NetRun now!)

This program will hence return the right answer, or admit something failed. You do have to be careful to do a "jo uhoh" after every possible arithmetic instruction, since the overflow flag gets re-set after each "add".

BONUS: Tail Call

It's surprisingly easy to define a function in assembly language: just make a label, and "call" it:

mov edi,1000
call myfunction
add eax,7
ret

myfunction:
mov eax,edi ; copy our first parameter into eax (to be returned)
ret ; go back to foo

(Try this in NetRun now!)

"call" and "ret" match one another, so myfunction's "ret" goes back to our first function. If you use "jmp" instead of "call", "ret" will return all the way back out to main, so this code will return 1000:

mov edi,1000
jmp myfunction
add eax,7 ; <- never executed!
ret

myfunction:
mov eax,edi ; copy our first parameter into eax (to be returned)
ret ; go back to main

(Try this in NetRun now!)

In fact, the only difference between "jmp" and "call" is what happens at the following "ret"; so if you want "ret" to return to your function again, you can actually speed up your code by replacing "call" with "jmp". This is a "tail call", and should become more clear once we talk about the stack.