Calling Functions and Passing Parameters in Assembly
CS 301 Lecture, Dr. Lawlor
Calling C++ Functions from Assembly
You use the "call" instruction to call functions. You can
actually call C++'s "cout" if you're sufficiently dedicated, but the builtin NetRun
functions are designed to be easier to call. There are two steps in calling a function from assembly.
First, you
need to tell the assembler that "read_input" is an external
function. All you do is say "extern
read_input". This is the assembly equivalent of a
"#include" statement, although it only applies to a single function.
Second, you call that function, with the "call" instruction. When it sees "call read_input", the CPU will execute the
read_input function until it returns. Before read_input returns,
it will put the read-in value into eax, where you can grab it.
So this assembly program reads an integer and returns it:
extern read_input
call read_input
; ... read_input puts its value into eax, where we leave it for main ...
ret
(executable NetRun link)
Be careful, though! The read_input function can and will use all
the other scratch registers for its own purposes. In particular, it's
tricky to call read_input twice to read two numbers, since you need to
stash the first number somewhere other than registers during the second
call! (You'll hear more than you ever want to hear about this, when we talk about memory in about a month.)
The whole list of NetRun builtin functions is in the help. You can also call any of the C standard library functions, such as getchar.
Passing Parameters in Assembly
In 64-bit x86 code, you pass the first few parameters in registers.
Annoyingly, exactly which registers you use depends on the machine:
- On x86-64 UNIX systems, including Linux and default NetRun, the first six parameters go into rdi, rsi, rdx, rcx, r8, and r9.
- On Windows 64, the first four parameters go into rcx, rdx, r8, and r9.
- On 32-bit x86 systems, no parameters go in registers, they're all on "the stack" (wait a month to hear about this!).
For example, on NetRun's 64-bit Linux, I can call the one-parameter
NetRun builtin function "print_int" with one integer like this:
mov edi,0xdeadbeef
extern print_int
call print_int
ret
(Try this in NetRun now!)
Similarly, if somebody calls my function, and they pass me
parameters, those parameters will be in registers rdi, rsi, and so on.
Calling Assembly Functions From C++
Here's some C++ code that calls an external function "bar". Note
that this code gives a link error when you try to run it in NetRun,
because "bar" is never defined.
The "extern "C"" tells C++ to just look for a
C-style plain function "bar", instead of a fancy overloaded C++
function "bar(int,int,int)".
extern "C" int bar(int a,int b,int c);
int foo(void) {
return bar(0xA0B1C2D3, 0xE0E1E2E3, 0xF0F1F2F3);
}
(executable NetRun link)
We can actually write this "bar" function in assembly, like this:
global bar
bar:
mov eax,edi
ret
(Try this in NetRun now!)
When we get called, our first parameter is sitting in register edi as usual.
The "global" keyword in assembly tells the assembler to make a symbol, in this case bar, visible from outside the file.
The "Link With:" box tells NetRun to link together two different projects, in this case one in C++ and the other in assembly.
Name Mangling, plain C, and extern "C"
C++ "mangles" the linker names of its functions to include the data
types of the function arguments. This is good, because it lets you
overload function names; but it's bad, because plain C and assembly don't
do anything special to the linker names of functions.
In plain C or assembly, a function "foo"
shows up as just plain "foo" in the linker. In C++, a function foo shows
up as "foo()" or "foo(int,double,void *)". (Check out the disassembly to
be sure how your linker names are coming out.)
So if you call C or assembly code from C++,
you have to turn off C++'s name mangling by declaring the
C or assembly routines 'extern "C"', like this:
extern "C" void some_assembly_routine(int param1,char *param2);
or wrapped in curly braces like this:
extern "C" {
void one_assembly_routine(int x);
void another_assembly_routine(char c);
}
In fact, it's common to provide a "magic" header file for C code that
automatically provides 'extern "C"' prototypes for C++, but just works
normally in plain C:
#ifdef __cplusplus /* only defined in C++ code */
extern "C" {
#endif
void one_assembly_routine(int x);
void another_assembly_routine(char c);
#ifdef __cplusplus /* only defined in C++ code */
}
#endif
Definitely try these things out yourself:
Plain C bar routine:
int bar(int i,int j) {
printf("bar(%d,%d)\n",i,j);
return i;
}
(executable NetRun link)
C++ foo routine that calls bar:
extern "C" int bar(int i,int j);
int foo(void)
{
return bar(2,3);
}
(executable NetRun link)
Try:
- Remove the 'extern "C"' from the bar prototype. Note the
link error is looking for "bar(int,int)", which means it's looking for
a C++ bar; but bar is C.
- Make bar C++. Without extern "C", everything works, because C++ is calling C++.
- Make bar C++, and add 'extern "C"' back to foo's prototype. It won't link, because it's looking for the "C" bar.
- Make bar C++, but add 'extern "C"' to bar's declaration in *both*
routines. Now you're linking the C++ bar using the C name.
- Make foo C, but leave 'extern "C"' in bar's implementation.
- Make both routines C, and remove the extern "C"s. Everything works fine, because C is calling C.
Code written in
|
With name
|
Has linker name
|
C++
|
int bar(int a,int b)
|
bar(int,int) <- But "mangled" to be alphanumeric...
|
C++
|
extern "C" int bar(int a,int b) |
bar |
C
|
int bar(int a,int b)
|
bar
|
Assembly
|
global bar
bar:
|
bar
|
Fortran
|
SUBROUTINE bar()
|
bar_, BAR, BAR_, bar__, or some such. Disassemble to be sure! |
Bottom line: to call code written in anything else (C, Assembly,
Fortran) from C++, or to call C++ from anything else, add extern "C" to
the C++ code. A disassembler, or even just a link editor tool (on
UNIX, "nm"), can really help you for these sorts of problems.
It's very common that C++ is looking for "stuff()", while the library
provides merely "stuff" (no parenthesis, so plain C) or vice versa.
It gets even worse on some operating systems, where even plain C++ code
puts underscores at the start or end of every function name, so you
have to repeat those underscores in assembly. Worst of all, in
Windows DLLs, you need a "__declspec" at just the right point in every
function's declaration. The bottom line is that mixing languages
is both possible and common in big projects, but it sure ain't easy!
For scripting languages, like Python or Perl or Tcl or such, a "wrapper generator" like SWIG
can be a big help, since it understands both the 'extern "C"' business
and the gory details of getting data in and out of your scripting
language.