Function Calls and other Flow Control in Assembly

Calling Functions from Assembly

You use the "call" instruction to call functions. You can actually call C++'s "cout" if you're sufficiently dedicated, but the builtin NetRun functions are designed to be easier to call. You first need to tell the assembler that "read_input" is an external function. All you do is say "extern read_input". Then you run that function, with "call read_input", and the CPU will execute the read_input function until it returns. Before read_input returns, it puts the read-in value into eax, where you can grab it.

So this assembly program reads an integer and returns it:

extern read_input
call read_input
ret

(executable NetRun link)

Be careful, though! The read_input function can and will use all the other scratch registers for its own purposes. In particular, it's tricky to call read_input twice to read two numbers, since you need to stash the first number somewhere other than registers during the second call!

Passing Parameters in Assembly

In 64-bit x86 mode, you pass the first few parameters in registers.

Annoyingly, exactly which registers you use depends on the machine:

On x86-64 UNIX systems, including NetRun, the first six parameters go into rdi, rsi, rdx, rcx, r8, and r9.
On Windows 64, the first four parameters go into rcx, rdx, r8, and r9.
On 32-bit x86 systems, no parameters go in registers, they're all on "the stack" (wait a week to hear about this!)

For example, on NetRun (which runs Linux), I can call the one-parameter NetRun builtin function "print_int" with one integer like this:

mov edi,0xdeadbeef
extern print_int
call print_int
ret

(Try this in NetRun now!)

Jumps

A jump instruction, like "jmp", just switches the CPU to executing a different piece of code. It's the assembly equivalent of "goto", but unlike goto, jumps are not considered shameful in assembly.

You say where to jump to using a "jump label", which is just any string with a colon after it. (The same syntax is used in C/C++)

Assembly jump

C++ goto

	mov eax,3
	jmp quiddit
	mov eax,999  ; <- not executed!
quiddit:
	ret

(Try this in NetRun now!)

	int x=3;
	goto quiddit;
	x=999;
quiddit:
	return x;

(Try this in NetRun now!)

In both cases, we return 3, because we jump right over the 999 assignment. Jumping is somewhat useful for skipping over bad code, but it really gets useful when you add conditional jumps...

Conditional Jumps: Branching in Assembly

In assembly, all flow control is done with two types of instruction:

A compare instruction, like "cmp", compares two values.
A conditional jump instruction, like "je" (jump-if-equal), does a goto somewhere if the two values satisfy the right condition.

Here's how to use compare and jump-if-equal ("je"):

	mov eax,3
	cmp eax,3 ; how does eax compare with 3?
	je lemme_outta_here  ; if it's equal, then jump
	mov eax,999  ; <- not executed *if* we jump over it
lemme_outta_here:
	ret

(Try this in NetRun now!)

Here's compare and jump-if-less-than ("jl"):

	mov eax,1
	cmp eax,3 ; how does eax compare with 3?
	jl lemme_outta_here  ; if it's less, then jump
	mov eax,999  ; <- not executed *if* we jump over it
lemme_outta_here:
	ret

(Try this in NetRun now!)

The C++ equivalent to compare-and-jump-if-whatever is "if (something) goto somewhere;".

Also, check out the machine code generated for the conditional jump--the jump destination is encoded as the number of bytes of machine code to skip over. For example, the "jl" above gets encoded in machine code like this:

   0:	b8 01 00 00 00       	mov    eax,0x1
   5:	83 f8 03             	cmp    eax,0x3
   8:	7c 05                	jl     f <foo+0xf>
   a:	b8 e7 03 00 00       	mov    eax,0x3e7
   f:	c3                   	ret

The distance to jump, shown in red above, is five bytes, because the code we're skipping over is five bytes long. Note that a jump label doesn't show up in machine code at all--it's just used by the assembler to figure out how far to jump.

Here's the whole conversion table of compare instructions:

English	Less Than	Less or Equal	Equal	Greater or Equal	Greater Than	Not Equal
C/C++	<	<=	==	>=	>	!=
Assembly (signed)	jl	jle	je or jz	jg	jge	jne or jnz
Assembly (unsigned)	jb	jbe	je or jz	ja	jae	jne or jnz

The "b" in the unsigned comparison instructions stands for "below", and the "a" for "above". Note that there's no mixed-signedness compare instructions (e.g., to compare a signed to an unsigned number); this missing instruction is why compilers whine about "warning: comparing signed and unsigned numbers"!

In C/C++, the compiler can tell whether you want a signed and unsigned comparison based on the variable's types. There aren't any types in assembly, so it's up to you to pick the right instruction!

Converting C++ flow control structures to Assembly

You can actually write a very peculiar variant of C++, where "if" statements only contain "goto" statements. This is perfectly legal C/C++:

int main() {
	int i=0;
	if (i>=10) goto byebye;
	std::cout<<"Not too big!\n";
byebye:	return 0;
}

This way of writing C++ is quite similar to assembly--in fact, there's a one-to-one correspondence between lines of code written this way and machine language instructions. More complicated C++, like the "for" construct, expands out to many lines of assembly.

	int i, n=10;
	for (i=0;i<n;i++) {
		std::cout<<"In loop: i=="<<i<<"\n";
	}

Here's an expanded version of this C++ "for" loop:

	int i=0, n=10;
start:	std::cout<<"In loop: i=="<<i<<"\n";
	i++;
	if (i<n) goto start;

(executable NetRun link)

You've got to convince yourself that this is really equivalent to the "for" loop in all cases. Careful--if n is a parameter, it's not! (What if n>=i?)

All C flow-control constructs can be written using just "if" and "goto", which usually map one-to-one to a compare-and-jump sequence in assembly.

Normal C	Expanded C
if (A) { ... }	if (!A) goto END; { ... } END:
if (!A) { ... }	if (A) goto END; { ... } END:
if (A&&B) { ... }	if (!A) goto END; if (!B) goto END; { ... } END:
if (A\|\|B) { ... }	if (A) goto STUFF; if (B) goto STUFF; goto END; STUFF: { ... } END:
while (A) { ... }	goto TEST; START: { ... } TEST: if (A) goto START;
do { ... } while (A)	START: { ... } if (A) goto START;
for (i=0;i<n;i++) { ... }	i=0; /* Version A */ goto TEST; START: { ... } i++; TEST: if (i<n) goto START;
for (i=0;i<n;i++) { ... }	i=0; /* Version B */ START: if (i>=n) goto END; { ... } i++; goto START; END:

Note that the last two translations of the "for" concept (labelled Version A and Version B) both compute the same thing. Which one is faster? If the loop iterates many times, I claim version (A) is faster, since there's only one (conditional) goto each time around the loop, instead of two gotos in version (B)--one conditional and one unconditional. But version (B) is probably faster if n is often 0, because in that case it quickly jumps to END (in one conditional jump).