Calling Functions and Passing Parameters in Assembly

Calling C++ Functions from Assembly

You use the "call" instruction to call functions. You can actually call C++'s "cout" if you're sufficiently dedicated, but the builtin NetRun functions are designed to be easier to call. There are two steps in calling a function from assembly.

First, you need to tell the assembler that "read_input" is an external function. All you do is say "extern read_input". This is the assembly equivalent of a "#include" statement, although it only applies to a single function.

Second, you call that function, with the "call" instruction. When it sees "call read_input", the CPU will execute the read_input function until it returns. Before read_input returns, it will put the read-in value into eax, where you can grab it.

So this assembly program reads an integer and returns it:

extern read_input
call read_input
; ... read_input puts its value into eax, where we leave it for main ...
ret

(executable NetRun link)

Be careful, though! The read_input function can and will use all the other scratch registers for its own purposes. In particular, it's tricky to call read_input twice to read two numbers, since you need to stash the first number somewhere other than registers during the second call! (You'll hear more than you ever want to hear about this, when we talk about memory in about a month.)

The whole list of NetRun builtin functions is in the help. You can also call any of the C standard library functions, such as getchar.

Passing Parameters in Assembly

In 64-bit x86 code, you pass the first few parameters in registers.

Annoyingly, exactly which registers you use depends on the machine:

On x86-64 UNIX systems, including Linux and default NetRun, the first six parameters go into rdi, rsi, rdx, rcx, r8, and r9.
On Windows 64, the first four parameters go into rcx, rdx, r8, and r9.
On 32-bit x86 systems, no parameters go in registers, they're all on "the stack" (wait a month to hear about this!).

For example, on NetRun's 64-bit Linux, I can call the one-parameter NetRun builtin function "print_int" with one integer like this:

mov edi,0xdeadbeef
extern print_int
call print_int
ret

(Try this in NetRun now!)

Similarly, if somebody calls my function, and they pass me parameters, those parameters will be in registers rdi, rsi, and so on.

Calling Assembly Functions From C++

Here's some C++ code that calls an external function "bar". Note that this code gives a link error when you try to run it in NetRun, because "bar" is never defined. The "extern "C"" tells C++ to just look for a C-style plain function "bar", instead of a fancy overloaded C++ function "bar(int,int,int)".

extern "C" int bar(int a,int b,int c);

int foo(void) {
	return bar(0xA0B1C2D3, 0xE0E1E2E3, 0xF0F1F2F3);
}

(executable NetRun link)

We can actually write this "bar" function in assembly, like this:

global bar
bar:
  mov eax,edi
  ret

(Try this in NetRun now!)

When we get called, our first parameter is sitting in register edi as usual.

The "global" keyword in assembly tells the assembler to make a symbol, in this case bar, visible from outside the file.

The "Link With:" box tells NetRun to link together two different projects, in this case one in C++ and the other in assembly.

Name Mangling, plain C, and extern "C"

C++ "mangles" the linker names of its functions to include the data types of the function arguments. This is good, because it lets you overload function names; but it's bad, because plain C and assembly don't do anything special to the linker names of functions.

In plain C or assembly, a function "foo" shows up as just plain "foo" in the linker. In C++, a function foo shows up as "foo()" or "foo(int,double,void *)". (Check out the disassembly to be sure how your linker names are coming out.)

So if you call C or assembly code from C++, you have to turn off C++'s name mangling by declaring the C or assembly routines 'extern "C"', like this:

extern "C" void some_assembly_routine(int param1,char *param2);

or wrapped in curly braces like this:

extern "C" {
	void one_assembly_routine(int x);
	void another_assembly_routine(char c);
}

In fact, it's common to provide a "magic" header file for C code that automatically provides 'extern "C"' prototypes for C++, but just works normally in plain C:

#ifdef __cplusplus /* only defined in C++ code */
extern "C" {
#endif
	void one_assembly_routine(int x);
	void another_assembly_routine(char c);
#ifdef __cplusplus /* only defined in C++ code */
}
#endif

Definitely try these things out yourself:

Plain C bar routine:

int bar(int i,int j) {
	printf("bar(%d,%d)\n",i,j);
	return i;
}

(executable NetRun link)

C++ foo routine that calls bar:

extern "C" int bar(int i,int j);
int foo(void) 
{
	return bar(2,3);
}

(executable NetRun link)

Try:

Remove the 'extern "C"' from the bar prototype. Note the link error is looking for "bar(int,int)", which means it's looking for a C++ bar; but bar is C.
Make bar C++. Without extern "C", everything works, because C++ is calling C++.
Make bar C++, and add 'extern "C"' back to foo's prototype. It won't link, because it's looking for the "C" bar.
Make bar C++, but add 'extern "C"' to bar's declaration in *both* routines. Now you're linking the C++ bar using the C name.
Make foo C, but leave 'extern "C"' in bar's implementation.
Make both routines C, and remove the extern "C"s. Everything works fine, because C is calling C.

Code written in	With name	Has linker name
C++	int bar(int a,int b)	bar(int,int) <- But "mangled" to be alphanumeric...
C++	extern "C" int bar(int a,int b)	bar
C	int bar(int a,int b)	bar
Assembly	global bar bar:	bar
Fortran	SUBROUTINE bar()	bar_, BAR, BAR_, bar__, or some such. Disassemble to be sure!

Bottom line: to call code written in anything else (C, Assembly, Fortran) from C++, or to call C++ from anything else, add extern "C" to the C++ code. A disassembler, or even just a link editor tool (on UNIX, "nm"), can really help you for these sorts of problems. It's very common that C++ is looking for "stuff()", while the library provides merely "stuff" (no parenthesis, so plain C) or vice versa.

It gets even worse on some operating systems, where even plain C++ code puts underscores at the start or end of every function name, so you have to repeat those underscores in assembly. Worst of all, in Windows DLLs, you need a "__declspec" at just the right point in every function's declaration. The bottom line is that mixing languages is both possible and common in big projects, but it sure ain't easy!

For scripting languages, like Python or Perl or Tcl or such, a "wrapper generator" like SWIG can be a big help, since it understands both the 'extern "C"' business and the gory details of getting data in and out of your scripting language.