Function Calls and other Flow Control in Assembly
CS 301 Lecture, Dr. Lawlor
Calling Functions from Assembly
You use the "call" instruction to call functions. You can
actually call C++'s "cout" if you're sufficiently dedicated, but the builtin NetRun
functions are designed to be easier to call. You first
need to tell the assembler that "read_input" is an external
function. All you do is say "extern read_input". Then you
run that function, with "call read_input", and the CPU will execute the
read_input function until it returns. Before read_input returns,
it puts the read-in value into eax, where you can grab it.
So this assembly program reads an integer and returns it:
extern read_input
call read_input
ret
(executable NetRun link)
Be careful, though! The read_input function can and will use all
the other scratch registers for its own purposes. In particular, it's
tricky to call read_input twice to read two numbers, since you need to
stash the first number somewhere other than registers during the second
call!
Passing Parameters in Assembly
In 64-bit x86 mode, you pass the first few parameters in registers.
Annoyingly, exactly which registers you use depends on the machine:
- On x86-64 UNIX systems, including NetRun, the first six parameters go into rdi, rsi, rdx, rcx, r8, and r9.
- On Windows 64, the first four parameters go into rcx, rdx, r8, and r9.
- On 32-bit x86 systems, no parameters go in registers, they're all on "the stack" (wait a week to hear about this!)
For example, on NetRun (which runs Linux), I can call the one-parameter
NetRun builtin function "print_int" with one integer like this:
mov edi,0xdeadbeef
extern print_int
call print_int
ret
(Try this in NetRun now!)
Jumps
A jump instruction, like "jmp", just switches the CPU to executing a
different piece of code. It's the assembly equivalent of "goto",
but unlike goto, jumps are not considered shameful in assembly.
You say where to jump to using a "jump label", which is just any string
with a colon after it. (The same syntax is used in C/C++)
In both cases, we return 3, because we jump right over the 999
assignment. Jumping is somewhat useful for skipping over bad
code, but it really gets useful when you add conditional jumps...
Conditional Jumps: Branching in Assembly
In assembly, all flow control is done with two types of instruction:
- A compare instruction, like "cmp", compares two values.
- A conditional jump instruction, like "je" (jump-if-equal), does a goto somewhere if the two values satisfy the right condition.
Here's how to use compare and jump-if-equal ("je"):
mov eax,3
cmp eax,3 ; how does eax compare with 3?
je lemme_outta_here ; if it's equal, then jump
mov eax,999 ; <- not executed *if* we jump over it
lemme_outta_here:
ret
(Try this in NetRun now!)
Here's compare and jump-if-less-than ("jl"):
mov eax,1
cmp eax,3 ; how does eax compare with 3?
jl lemme_outta_here ; if it's less, then jump
mov eax,999 ; <- not executed *if* we jump over it
lemme_outta_here:
ret
(Try this in NetRun now!)
The C++ equivalent to compare-and-jump-if-whatever is "if (something) goto somewhere;".
Also, check out the machine code generated for the conditional
jump--the jump destination is encoded as the number of bytes of machine
code to skip over. For example, the "jl" above gets encoded in
machine code like this:
0: b8 01 00 00 00 mov eax,0x1
5: 83 f8 03 cmp eax,0x3
8: 7c 05 jl f <foo+0xf>
a: b8 e7 03 00 00 mov eax,0x3e7
f: c3 ret
The distance to jump, shown in red above, is five bytes, because the
code we're skipping over is five bytes long. Note that a jump
label doesn't show up in machine code at all--it's just used by the
assembler to figure out how far to jump.
Here's the whole conversion table of compare instructions:
English
|
Less Than
|
Less or Equal
|
Equal
|
Greater or Equal
|
Greater Than
|
Not Equal
|
C/C++
|
<
|
<=
|
==
|
>=
|
>
|
!=
|
Assembly
(signed)
|
jl
|
jle
|
je or jz
|
jg
|
jge
|
jne or jnz
|
Assembly
(unsigned)
|
jb
|
jbe
|
je or jz
|
ja
|
jae
|
jne or jnz
|
The "b" in the unsigned comparison instructions stands for "below", and
the "a" for "above". Note that there's no mixed-signedness
compare instructions (e.g., to compare a signed to an unsigned number);
this missing instruction is why compilers whine about "warning:
comparing signed and unsigned numbers"!
In C/C++, the compiler can tell whether you want a signed and unsigned
comparison based on the variable's types. There aren't any types
in assembly, so it's up to you to pick the right instruction!
Converting C++ flow control structures to Assembly
You can actually write a very peculiar variant of C++, where
"if" statements only contain "goto" statements. This is perfectly legal C/C++:
int main() {
int i=0;
if (i>=10) goto byebye;
std::cout<<"Not too big!\n";
byebye: return 0;
}
This way of writing C++ is quite similar to assembly--in fact, there's a
one-to-one correspondence between lines of code written this
way and machine language instructions. More complicated C++, like
the "for" construct, expands out to many lines of assembly.
int i, n=10;
for (i=0;i<n;i++) {
std::cout<<"In loop: i=="<<i<<"\n";
}
Here's an expanded version of this C++ "for" loop:
int i=0, n=10;
start: std::cout<<"In loop: i=="<<i<<"\n";
i++;
if (i<n) goto start;
(executable NetRun link)
You've got to convince yourself that this is really equivalent to the
"for" loop in all cases. Careful--if n is a parameter, it's not! (What if n>=i?)
All C flow-control constructs can be written using just "if" and
"goto", which usually map one-to-one to a compare-and-jump sequence in assembly.
Normal C
|
Expanded C
|
if (A) {
...
}
|
if (!A) goto END; {
...
}
END:
|
if (!A) {
...
}
|
if (A) goto END;
{
...
}
END:
|
if (A&&B) {
...
}
|
if (!A) goto END;
if (!B) goto END;
{
...
}
END:
|
if (A||B) {
...
}
|
if (A) goto STUFF;
if (B) goto STUFF;
goto END;
STUFF:
{
...
}
END:
|
while (A) {
...
}
|
goto TEST;
START:
{
...
}
TEST: if (A) goto START;
|
do {
...
} while (A)
|
START:
{
...
}
if (A) goto START;
|
for (i=0;i<n;i++)
{
...
}
|
i=0; /* Version A */
goto TEST;
START:
{
...
}
i++;
TEST: if (i<n) goto START;
|
for (i=0;i<n;i++)
{
...
}
|
i=0; /* Version B */
START: if (i>=n) goto END;
{
...
}
i++;
goto START;
END:
|
Note that the last two translations of the "for" concept (labelled
Version A and Version B) both compute the same thing. Which one
is faster? If the loop iterates many times, I claim version (A) is
faster, since there's only one
(conditional) goto each time around the loop, instead of two gotos in
version (B)--one
conditional and one unconditional. But version (B) is probably
faster if n is often 0, because in that case it quickly jumps to END
(in one conditional jump).