Function Calls and other Flow Control in Assembly

CS 301 Lecture, Dr. Lawlor

Calling Functions from Assembly

You use the "call" instruction to call functions.  You can actually call C++'s "cout" if you're sufficiently dedicated, but the builtin NetRun functions are designed to be easier to call.  You first need to tell the assembler that "read_input" is an external function.  All you do is say "extern read_input".  Then you run that function, with "call read_input", and the CPU will execute the read_input function until it returns.  Before read_input returns, it puts the read-in value into eax, where you can grab it.

So this assembly program reads an integer and returns it:
extern read_input
call read_input
ret
(executable NetRun link)

Be careful, though!  The read_input function can and will use all the other scratch registers for its own purposes.  In particular, it's tricky to call read_input twice to read two numbers, since you need to stash the first number somewhere other than registers during the second call!

Passing Parameters in Assembly

In 64-bit x86 mode, you pass the first few parameters in registers. 

Annoyingly, exactly which registers you use depends on the machine:
For example, on NetRun (which runs Linux), I can call the one-parameter NetRun builtin function "print_int" with one integer like this:
mov edi,0xdeadbeef
extern print_int
call print_int
ret

(Try this in NetRun now!)

Jumps

A jump instruction, like "jmp", just switches the CPU to executing a different piece of code.  It's the assembly equivalent of "goto", but unlike goto, jumps are not considered shameful in assembly.

You say where to jump to using a "jump label", which is just any string with a colon after it.  (The same syntax is used in C/C++)
Assembly jump
C++ goto
	mov eax,3
jmp quiddit
mov eax,999 ; <- not executed!
quiddit:
ret

(Try this in NetRun now!)

	int x=3;
goto quiddit;
x=999;
quiddit:
return x;

(Try this in NetRun now!)

 
In both cases, we return 3, because we jump right over the 999 assignment.  Jumping is somewhat useful for skipping over bad code, but it really gets useful when you add conditional jumps...

Conditional Jumps: Branching in Assembly

In assembly, all flow control is done with two types of instruction:
Here's how to use compare and jump-if-equal ("je"):
	mov eax,3
cmp eax,3 ; how does eax compare with 3?
je lemme_outta_here ; if it's equal, then jump
mov eax,999 ; <- not executed *if* we jump over it
lemme_outta_here:
ret

(Try this in NetRun now!)

Here's compare and jump-if-less-than ("jl"):
	mov eax,1
cmp eax,3 ; how does eax compare with 3?
jl lemme_outta_here ; if it's less, then jump
mov eax,999 ; <- not executed *if* we jump over it
lemme_outta_here:
ret

(Try this in NetRun now!)

The C++ equivalent to compare-and-jump-if-whatever is "if (something) goto somewhere;".

Also, check out the machine code generated for the conditional jump--the jump destination is encoded as the number of bytes of machine code to skip over.  For example, the "jl" above gets encoded in machine code like this:
   0:	b8 01 00 00 00       	mov    eax,0x1
5: 83 f8 03 cmp eax,0x3
8: 7c 05 jl f <foo+0xf>
a: b8 e7 03 00 00 mov eax,0x3e7
f: c3 ret
The distance to jump, shown in red above, is five bytes, because the code we're skipping over is five bytes long.  Note that a jump label doesn't show up in machine code at all--it's just used by the assembler to figure out how far to jump.

Here's the whole conversion table of compare instructions:
English
Less Than
Less or Equal
Equal
Greater or Equal
Greater Than
Not Equal
C/C++
<
<=
==
>=
>
!=
Assembly
  (signed)
jl
jle
je or jz
jg
jge
jne or jnz
Assembly
  (unsigned)
jb
jbe
je or jz
ja
jae
jne or jnz
The "b" in the unsigned comparison instructions stands for "below", and the "a" for "above".   Note that there's no mixed-signedness compare instructions (e.g., to compare a signed to an unsigned number); this missing instruction is why compilers whine about "warning: comparing signed and unsigned numbers"!

In C/C++, the compiler can tell whether you want a signed and unsigned comparison based on the variable's types.  There aren't any types in assembly, so it's up to you to pick the right instruction!

Converting C++ flow control structures to Assembly

You can actually write a very peculiar variant of C++, where "if" statements only contain "goto" statements.  This is perfectly legal C/C++:
int main() {
int i=0;
if (i>=10) goto byebye;
std::cout<<"Not too big!\n";
byebye: return 0;
}
This way of writing C++ is quite similar to assembly--in fact, there's a one-to-one correspondence between lines of code written this way and machine language instructions.  More complicated C++, like the "for" construct, expands out to many lines of assembly.
	int i, n=10;
for (i=0;i<n;i++) {
std::cout<<"In loop: i=="<<i<<"\n";
}
Here's an expanded version of this C++ "for" loop:
	int i=0, n=10;
start: std::cout<<"In loop: i=="<<i<<"\n";
i++;
if (i<n) goto start;
(executable NetRun link)

You've got to convince yourself that this is really equivalent to the "for" loop in all cases.  Careful--if n is a parameter, it's not!   (What if n>=i?)

All C flow-control constructs can be written using just "if" and "goto", which usually map one-to-one to a compare-and-jump sequence in assembly.
Normal C
Expanded C
if (A) {
  ...
}
if (!A) goto END;
{
  ...
}
END:
if (!A) {
  ...
}
if (A) goto END;
{
  ...
}
END:
if (A&&B) {
  ...
}
if (!A) goto END;
if (!B) goto END;
{
  ...
}
END:
if (A||B) {
  ...
}
if (A) goto STUFF;
if (B) goto STUFF;
goto END;
STUFF:
{
  ...
}
END:
while (A)  {
  ...
}
goto TEST;
START:
{
  ...
}
TEST: if (A) goto START;
do {
  ...
} while (A)
START:
{
  ...
}
if (A) goto START;
for (i=0;i<n;i++)
{
  ...
}
i=0;         /* Version A */
goto TEST;
START:
{
  ...
}
i++;
TEST: if (i<n) goto START;
for (i=0;i<n;i++)
{
  ...
}

i=0;          /* Version B */
START: if (i>=n) goto END;
{
  ...
}
i++;
goto START;
END:

Note that the last two translations of the "for" concept (labelled Version A and Version B) both compute the same thing.  Which one is faster? If the loop iterates many times, I claim version (A) is faster, since there's only one (conditional) goto each time around the loop, instead of two gotos in version (B)--one conditional and one unconditional.  But version (B) is probably faster if n is often 0, because in that case it quickly jumps to END (in one conditional jump).