Mixing Assembly and C++ Code
CS 301 Lecture, Dr. Lawlor
Here's how you write an entire function in assembly. The "global
bar" keyword in assembly tells the assembler to make the label "bar"
visible from outside the file.
global bar
bar:
add rdi,1000
mov rax,rdi
ret
(Try this in NetRun now!)
The "Link With:" box (under "Options") tells NetRun to link together
two different projects, in this case one in C++ and the other in
assembly. The C++ code calls the assembly here.
extern "C" int bar(int param);
int foo(void) {
return bar(6);
}
(Try this in NetRun now!)
You can call C++ code from assembly almost as easily, by making the
C++ code extern "C", using "extern someName" in assembly, and then call
the function normally.
Mixed Assembly and C++ at the Command Line
The most portable way to include some assembly functions in your code is to
compile the assembly in a separate file, then link it with the
C++. For example, in a file name "foo.S":
section .text
global _foo
_foo:
mov eax,7
ret
(Note the weird underscore in front of the function name--this is a Windows thing!)
You'd assemble this into "foo.obj" on windows with this command line:
nasm -f win32 foo.S
Then in a file named "main.cpp", we call foo with an extern "C" prototype:
#include <iostream>
extern "C" int foo(void);
int main() {
std::cout<<"Foo returns "<<foo()<<"\n";
return 0;
}
We compile the C++ and link it to the assembly using the Microsoft Visual C++ compiler like this:
cl -EHsc main.cpp foo.obj
(You may have to run "vc_vars.bat" to get "cl" into your PATH.)
We now have a functioning C++/Assembly executable! The same exact command-line trick works on Linux or OS X with gcc.
If you don't like the command line, and few people do, you can integrate NASM with Visual C++ as I explain here.
Mixed Assembly and C++ in One Source File
It's even possible on most compilers to include a little bit of
assembly code right inside
your C or C++ file, called "inline assembly" because the assembly is
inside the C/C++. This is usually a
bit faster (because no function call overhead) and simpler (less hassle
at build time) than having a separate ".S" file that you run through
YASM and then link to the C or C++ code. However, for long
stretches of assembly, a separate file still works better.
Here's a simple example in Microsoft Visual C++ inline assembly:
int foo(void) { __asm{ mov eax,100 leave ret }; }
Note that now:
- The keyword is __asm
- The assembly code is wrapped in curly braces
- The destination register is on the *left*, just like yasm.
Note also that I've used the "leave" instruction to clean up foo's
stack frame (mov esp,ebp; pop ebp;) before returning. The compiler secretly generates the
corresponding function prologue at the start of the function.
Microsoft Outside Variable Access:
In Microsoft Visual C or C++, you can read and write variables from the program by just giving their names.
Simple example:
void *frame; __asm mov ebp,frame;
Complicated example:
int foo(void) { int joe=1234, fred; __asm{ mov eax,joe ; eax = joe; add eax,2 ; eax += 2; mov fred,eax ; fred = eax }; return fred; }
This is clearly very convenient! But what happens if we try to do
the same thing with a variable named "al"? (Remember, "al" is a
register on x86!) |
GCC Inline Assembly:
Here's an example of how to declare a little assembly snippet inside C++ code using the Linux/UNIX/MacOS gcc compiler:
int foo(void) { __asm__( /* Assembly function body */ " mov $100,%eax\n" /* moves 100 into eax! */ " leave\n" " ret\n" ); }
Note that:
- The keyword is __asm__
- The assembly code is wrapped in parenthesis.
- The assembly code shows up as a string
- There are weird symbols in front of constants ($ means constant) and registers (% means register)
- DYSLEXIA ALERT: GCC sasembly is abckwards. The destination register goes at the *end* of the instruction.
I've linked the text to the NetRun version of this code. Note that
I've set the NetRun "Mode" to "Whole Subroutine"--this keeps NetRun
from pasting in the start and end of the foo subroutine.
The bottom line is just to use the __asm__ keyword, which takes the assembly code as a big string.
GCC Outside Variable Access:
Accessing outside variables is truly hideous in gcc inline assembly.
Simple example:
void *frame; /* Frame pointer */ __asm__ ("mov %%ebp,%0":"=r"(frame));
Complicated example:
int foo(void) { int joe=1234, fred; __asm__( " mov %1,%%eax\n" " add $2,%%eax\n" " mov %%eax,%0\n" :"=r" (fred) /* %0: Out */ :"r" (joe) /* %1: In */ :"%eax" /* Overwrite */ ); return fred; }
The __asm__ keyword can take up to four strings, separated by colons:
- The assembly code. Now registers need to be prefixed with "%%", not just "%", to distinguish them from arguments.
- A comma-separated list of output arguments. These can go into registers ("=r"), memory ("=m"), etc.
- A comma-separated list of input arguments.
- A comma-separated list of overwritten registers ("trashed"
registers). The compiler then knows not to put anything important
in these registers.
See the gcc manual for so many hideous details, you'll want to cry.
GCC Whole Function in Assembly
Partly because GCC's inline assembly syntax is so horrible, it's often
easier to just write the whole function (argument access, frame
setup, and value return) in assembly. There doesn't seem to be a
way to do this in Visual C++, although (in either case) it's easy
enough to separately compile a whole file full of pure assembly code
and just link it in.
To write a function in assembly, just:
- Write a C function prototype. In C++, make the prototype 'extern "C"' to avoid a link error.
- Put your code in an "__asm__" block outside any subroutine.
- Put the function name at the start of the assembly block as a label.
- If you want to call the function from outside that file, use ".globl my_sub" to make the subroutine's name visible outside.
Here's a complete example, where my assembly function just returns 100:
extern "C" int my_sub(void); /* Prototype */
__asm__( /* Assembly function body */ "my_sub:\n" " mov $100,%eax\n" " ret\n" );
int foo(void) { return my_sub()+1; }
This is actually a pretty clean way to do inline assembly in gcc,
although you do have to remember the calling convention (%rdi, %rsi, etc) to find your
arguments!
|
BONUS: C++ Name Mangling, mixing C and C++, and extern "C"
C++ "mangles" the linker names of its functions to include the data
types of the function arguments. This is good, because it lets you
overload function names; but it's bad, because plain C and assembly don't
do anything special to the linker names of functions.
This makes life a little simpler, but it means you can't overload functions in plain C or assembly.
In C or assembly, a function "foo"
shows up as just plain "foo" in the linker. In C++, a function foo shows
up as "foo()" or "foo(int,double,void *)". (Check out the disassembly to
be sure how your linker names are coming out.)
So if you call C or assembly code from C++,
you have to turn off C++'s name mangling by declaring the
C or assembly routines 'extern "C"', like this:
extern "C" void some_assembly_routine(int param1,char *param2);
or wrapped in curly braces like this:
extern "C" {
void one_assembly_routine(int x);
void another_assembly_routine(char c);
}
In fact, it's common to provide a "magic" header file for C code that
automatically provides 'extern "C"' prototypes for C++, but just works
normally in plain C:
#ifdef __cplusplus /* only defined in C++ code */
extern "C" {
#endif
void one_assembly_routine(int x);
void another_assembly_routine(char c);
#ifdef __cplusplus /* only defined in C++ code */
}
#endif
Definitely try these things out yourself:
Plain C bar routine:
int bar(int i,int j) {
printf("bar(%d,%d)\n",i,j);
return i;
}
(executable NetRun link)
C++ foo routine that calls bar:
extern "C" int bar(int i,int j);
int foo(void)
{
return bar(2,3);
}
(executable NetRun link)
Try:
- Remove the 'extern "C"' from the bar prototype. Note the
link error is looking for "bar(int,int)", which means it's looking for
a C++ bar; but bar is C.
- Make bar C++. Without extern "C", everything works, because C++ is calling C++.
- Make bar C++, and add 'extern "C"' back to foo's prototype. It won't link, because it's looking for the "C" bar.
- Make bar C++, but add 'extern "C"' to bar's declaration in *both*
routines. Now you're linking the C++ bar using the C name.
- Make foo C, but leave 'extern "C"' in bar's implementation.
- Make both routines C, and remove the extern "C"s. Everything works fine, because C is calling C.
Code written in
|
With name
|
Has linker name
|
C++
|
int bar(int a,int b)
|
bar(int,int) <- But "mangled" to be alphanumeric...
|
C++
|
extern "C" int bar(int a,int b) |
bar |
C
|
int bar(int a,int b)
|
bar |
Assembly
|
global bar
bar:
|
bar
|
Fortran
|
SUBROUTINE bar()
|
bar_, BAR, BAR_, bar__, or some such. Disassemble to be sure...
|
Bottom line: to call code written in anything else (C, Assembly,
Fortran) from C++, or to call C++ from anything else, add extern "C" to
the C++ code!
BONUS: Argument Passing in Fortran
C and C++ are kinda asymmetrical, because "int" parameters are placed
directly on the stack (like "push 3"), while arrays are always passed
via pointer (like "push my_array", which pushes the *address* of the
array, not the actual integers in the array). C/C++ do this
because you can cheaply copy an "int", but copying an array might take
a lot of time and memory.
Fortran, curiously, passes *everything* via pointer--if a Fortran
function takes an int parameter, what gets pushed on the stack is a
*pointer* to an int, not the int itself!
To summarize:
When passing...
|
In C/C++, you...
|
In Fortran, you...
|
an int
|
pass the int
|
pass a pointer to the integer
|
an array
|
pass a pointer to the first element of the array
|
pass a pointer to the first element of the array |
a char
|
pass an int containing the character's value
|
pass a pointer to the character
|
Fortran 1D arrays are indexed using round brackets, like
"myarr(i)". And the index of the first array element in Fortran
is "myarr(1)", not "myarr[0]" like C/C++. But beyond those small
differences, arrays work exactly the same in Fortran as in C/C++, and
in fact it's not always possible from looking at the generated assembly
code whether the original code was written in C, C++, Fortran, or
Assembly!
Fun With Fortran!
CS 301 isn't a computer languages course, but I think it's pretty interesting to look
at old-school Fortran, a language from 1956. Note how this little function returns 10,
like you'd expect. And the assembly code is pretty much exactly
what you'd get from C/C++!
function foo()
INTEGER foo
i = 7;
foo = i + 3;
end function
(executable NetRun link)
Here's a "do loop" (the Fortran equivalent of C/C++ "for"):
function foo()
INTEGER foo
do i=1,10
CALL print_int(i)
end do
foo = i + 3;
end function
(executable NetRun link)
Note that "print_int" is defined in NetRun's "inc.c" as:
CDECL void print_int__(int *i) {print_int(*i);}
Here,
- "CDECL" is a NetRun macro that expands to extern "C" in C++. This prevents C++ from screwing up the name.
- "print_int__" (note the extra underscores!) is the function name Fortran will try to link with.
- Fortran "int" is passed via a pointer, so you have to accept a pointer in C++.
- The Fortran interface for print_int just calls the C++ interface.
This sort of Fortran/C/C++ interfacing is really common in big projects.