Linking Assembly and C++ (or C or Fortran or...)

CS 301 Lecture, Dr. Lawlor

Here's how you write an entire function in assembly. The "global bar" keyword in assembly tells the assembler to make the label "bar" visible from outside the file.

global bar
bar:
	add rdi,1000
	mov rax,rdi
	ret

(Try this in NetRun now!)

The "Link With:" box (under "Options") tells NetRun to link together two different projects, in this case one in C++ and the other in assembly. The C++ code calls the assembly here.

extern "C" int bar(int param);

int foo(void) {
	return bar(6);
}

(Try this in NetRun now!)

You can call C++ code from assembly almost as easily, by making the C++ code extern "C", using "extern someName" in assembly, and then call the function normally.

Name Mangling and extern "C"

C++ "mangles" the linker names of its functions to include the data types of the function arguments. This is good, because it lets you overload function names; but it's bad, because plain C and assembly don't do anything special to the linker names of functions. This makes life a little simpler, but it means you can't overload functions in plain C or assembly.

In C or assembly, a function "foo" shows up as just plain "foo" in the linker. In C++, a function foo shows up as "foo()" or "foo(int,double,void *)". (Check out the disassembly to be sure how your linker names are coming out.)

So if you call C or assembly code from C++, you have to turn off C++'s name mangling by declaring the C or assembly routines 'extern "C"', like this:

extern "C" void some_assembly_routine(int param1,char *param2);

or wrapped in curly braces like this:

extern "C" {
	void one_assembly_routine(int x);
	void another_assembly_routine(char c);
}

In fact, it's common to provide a "magic" header file for C code that automatically provides 'extern "C"' prototypes for C++, but just works normally in plain C:

#ifdef __cplusplus /* only defined in C++ code */
extern "C" {
#endif
	void one_assembly_routine(int x);
	void another_assembly_routine(char c);
#ifdef __cplusplus /* only defined in C++ code */
}
#endif

Definitely try these things out yourself:

Plain C bar routine:

int bar(int i,int j) {
	printf("bar(%d,%d)\n",i,j);
	return i;
}

(executable NetRun link)

C++ foo routine that calls bar:

extern "C" int bar(int i,int j);
int foo(void) 
{
	return bar(2,3);
}

(executable NetRun link)

Try:

Remove the 'extern "C"' from the bar prototype. Note the link error is looking for "bar(int,int)", which means it's looking for a C++ bar; but bar is C.
Make bar C++. Without extern "C", everything works, because C++ is calling C++.
Make bar C++, and add 'extern "C"' back to foo's prototype. It won't link, because it's looking for the "C" bar.
Make bar C++, but add 'extern "C"' to bar's declaration in *both* routines. Now you're linking the C++ bar using the C name.
Make foo C, but leave 'extern "C"' in bar's implementation.
Make both routines C, and remove the extern "C"s. Everything works fine, because C is calling C.

Code written in	With name	Has linker name
C++	int bar(int a,int b)	bar(int,int) <- But "mangled" to be alphanumeric...
C++	extern "C" int bar(int a,int b)	bar
C	int bar(int a,int b)	bar
Assembly	global bar bar:	bar
Fortran	SUBROUTINE bar()	bar_, BAR, BAR_, bar__, or some such. Disassemble to be sure...

Bottom line: to call code written in anything else (C, Assembly, Fortran) from C++, or to call C++ from anything else, add extern "C" to the C++ code!

Passing Functions as Arguments

You can even pass a pointer to a function as a function argument. The C++ syntax is rather hideous for doing so. "careful_add" takes three parameters: two integers, and a function taking no arguments and returning int.

extern "C" {
	int careful_add(int a,int b,int (*errorfunction)(void));
};

extern "C" 
int adderr(void)
{
	std::cout<<"Program error detected: overflow during add.\n";
	exit(0); // should stop entire program now.
}

int foo(void) {
	int a=500000000,b=500000000,c=800000000,d=500000000;
	int sum=careful_add(
		careful_add(a,b,adderr),
		careful_add(c,d,adderr),
	adderr);
	return sum;
}

(Try this in NetRun now!)

From assembly, it looks pretty straightforward--the function pointer is passed like any another function argument (here, in rdx).

; int careful_add(int a,int b,void *errfunction);
global careful_add ; it's a function
careful_add:
	add edi,esi
	jo somewhere
	mov eax,edi
	ret

	somewhere:
	call rdx
	ret

(Try this in NetRun now!)

BONUS: Argument Passing in Fortran

C and C++ are kinda asymmetrical, because "int" parameters are placed directly on the stack (like "push 3"), while arrays are always passed via pointer (like "push my_array", which pushes the *address* of the array, not the actual integers in the array). C/C++ do this because you can cheaply copy an "int", but copying an array might take a lot of time and memory.

Fortran, curiously, passes *everything* via pointer--if a Fortran function takes an int parameter, what gets pushed on the stack is a *pointer* to an int, not the int itself!

To summarize:

When passing...	In C/C++, you...	In Fortran, you...
an int	pass the int	pass a pointer to the integer
an array	pass a pointer to the first element of the array	pass a pointer to the first element of the array
a char	pass an int containing the character's value	pass a pointer to the character

Fortran 1D arrays are indexed using round brackets, like "myarr(i)". And the index of the first array element in Fortran is "myarr(1)", not "myarr[0]" like C/C++. But beyond those small differences, arrays work exactly the same in Fortran as in C/C++, and in fact it's not always possible from looking at the generated assembly code whether the original code was written in C, C++, Fortran, or Assembly!

Fun With Fortran!

CS 301 isn't a computer languages course, but I think it's pretty interesting to look at old-school Fortran, a language from 1956. Note how this little function returns 10, like you'd expect. And the assembly code is pretty much exactly what you'd get from C/C++!

       function foo()
       INTEGER foo

       i = 7;
       foo = i + 3;
       
       end function

(executable NetRun link)

Here's a "do loop" (the Fortran equivalent of C/C++ "for"):

       function foo()
       INTEGER foo

       do i=1,10
         CALL print_int(i)
       end do
       foo = i + 3;
       
       end function

(executable NetRun link)

Note that "print_int" is defined in NetRun's "inc.c" as:
CDECL void print_int__(int *i) {print_int(*i);}
Here,

"CDECL" is a NetRun macro that expands to extern "C" in C++. This prevents C++ from screwing up the name.
"print_int__" (note the extra underscores!) is the function name Fortran will try to link with.
Fortran "int" is passed via a pointer, so you have to accept a pointer in C++.
The Fortran interface for print_int just calls the C++ interface.

This sort of Fortran/C/C++ interfacing is really common in big projects.