Preprocessor Macros in C++ and NASM

CS 301 Lecture, Dr. Lawlor

Macros are an ancient and controversial way to transform source code. The controversy is due to the fact that like "goto", macros are too powerful and general to be trusted.

Constant-like Macro

The standard uses of this are pretty straightforward:
#define symbol replacement
makes the preprocessor replace every occurrence of "symbol" with "replacement" before compiling. So this works fine, and returns 17:

#define n 17

return n;
(Try this in NetRun now!)

Unlike everything else in C++, "n" is now defined as 17 from this point onwards, even across scope like classes or functions. Declaring "int n=3;" gets rewritten to "int 17=3;", which won't compile! To avoid this, by convention macros are written in capital letters, like "NUM_ELEMENTS", not just bare n.

There's another problem with plain string replacement. Let's say you've defined a constant like:

#define n 10+10

return n*n;
(Try this in NetRun now!)

This returns 10+10*10+10 = 120. What?!

There are several well known fixes for this bug:

Avoidance: never use macros. "enum" or "const int" provide basically the same functionality, and don't have this problem. Some see the preprocessor as an unwanted holdover from plain C that should never be used.
Workaround: if you #define something, wrap it in parenthesis, like
```
#define n (10+10)
```
This works, but you must remember to do it every time!

Function-like Macro

In plain C, before inline functions, it was pretty common to write short utility functions using macros:

#define times3(x) x*3
return times3(10);

(Try this in NetRun now!)

Again, the problem is this is a plain text replacment. So calling the same macro with a sum:

    return times3(10+10);

(Try this in NetRun now!)

This returns... 40. Thats 10+10*3, not (10+10)*3. Again, the fix is to wrap the expression in parenthesis. Note that arguments and the overall expression both need parenthesis, to protect against operators inside or outside the macro call:

#define times3(x) ((x)*3)
return times3(10+10);

(Try this in NetRun now!)

This works basically reliably, although there's not much benefit to using a macro over using a small function here.

Syntax-changing Macros

Macros have the ability to adjust syntax in really abitrary ways. For example, I can rename the curly braces with macros, like so:

#define begin {
#define end }

int x=5;
if (x>3) 
begin
	return 7;
end

(Try this in NetRun now!)

A FORTRAN programmer used to typing in ALL CAPS might be more comfortable using #define FOR for.

People are merely annoyed by most of the above uses. It can get much worse, though:

#define BOOGA int x=9;  return (x&


BOOGA 7);

(Try this in NetRun now!)

In addition to screwing up any syntax highlighting editor, this basically destroys the readability of the code, and is bad enough to make people seriously talk of banning all macros.

Hairy C++ Macro Features

__FILE__ expands to a string with the filename of the current source code. Handy for debugging.
__LINE__ expands to an integer with the current line number in the source code. Handy for generating names that should be different for each call to the macro.
#b makes a quoted string version of the argument b, which is handy for print statements. This is "stringification".
a##b sticks together constants or arguments a and b without any spaces. This "token pasting" is handy for generating new names, like "myClass_##name" or "myClass_#__LINE__".
You can extend a macro across several lines with a backslash. These are only for your convenience in writing the macro, and don't make it out to the compiler. In some compilers, a // comment inside a macro will thus kill off the whole rest of the macro!

Here's an example of stringification:

#define quote(string) #string
std::cout<<quote(This stuff goes in as a string...)<<std::endl;

(Try this in NetRun now!)

I use stringification all the time for GPU programming, where the graphics driver wants the GPU code as a string. I can have the same code work in C++ directly, then have a macro spit out a stringified version for the GPU to run.

Another place stringification is useful is in error checking. This not only checks for errors, but shows you the code and tells you the line number where they happened:

#define checkErrs(code) { int err=code; /* run */  if (err!=0) std::cout<<"Error in "<<#code<<" at line "<<__LINE__<<" of file "<<__FILE__<<"\n"; }

int x=18;
checkErrs(x-18);
checkErrs(x-10);
return 0;
(Try this in NetRun now!)

This macro definition looks a little better using backslashes to separate the lines:

#define checkErrs(code) { \
	int err=code; /* run */  \
	if (err!=0) { \
		std::cout<<"Error "<<err<<" in '"<<#code<<"' at line "<<__LINE__<<" of file "<<__FILE__<<"\n"; \
	} \
}

I need the curly braces to be able to declare "int err" repeatedly. But now people typically add an extra semicolon at the end of the macro call; this is untidy, and will throw off an "if..else" statement with the macro in the middle. There's a bizarre well-known solution, which is to add a worthless do{}while(0) that only exists to consume the semicolon:

#define checkErrs(code) do { \
	int err=code; /* run */  \
	if (err!=0) { \
		std::cout<<"Error "<<err<<" in '"<<#code<<"' at line "<<__LINE__<<" of file "<<__FILE__<<"\n"; \
	} \
} while(0)

Another trick I use a lot is to generate classes inside a macro. For example, if my calculator needs ten "operator" classes for each of the basic operators, instead of typing them all out I'll generate them with a macro like this:

#define makeop(name,op) \
class calcop_##name { public: \
	int calculate(int a,int b) { return a op b; } \
}

makeop(add,+);
makeop(sub,-);
makeop(mul,*);
makeop(div,/);
makeop(and,&);
makeop(or,|);
makeop(left,<<);
makeop(right,>>);

int foo(void) {
	calcop_add a;
	return a.calculate(100,10);
}
(Try this in NetRun now!)

The nice part is now if you need the calcop classes to inherit from some base class, you can add it to the macro. Forgot the "const" in calculate? Just add it to the macro. To add a "getSymbol" method returning the operator's symbol, you can use stringification to add it to the macro definition like "const char *getSymbol(void) const { return #op; }". If each operator needs to be registered into the list of operators, you can add that as well.

C++ macros can become fairly complex, which is bad, but they provide very useful "metaprogramming" abilities, especially in big programs.   "Metaprogramming" is when the output of the first program (the preprocessor) is the source code for another program, here C++. You can even do explicit metaprogramming, where one program outputs a second program:
    Compile generator code.
    Run generator. Output is final code.
    Compile final code.
    Run final code.

For example, your C++ compiler is probably using "yacc", a "compiler compiler", to generate the source code for the compiler!

Macros in NASM

The NASM "%define" works basically like the C++ "#define", just replacing text willy-nilly.

For example, a speaker of French might prefer to convert the names of registers and instructions like so:

%define deplacer     mov eax,
%define retour    ret

deplacer 3
retour

(Try this in NetRun now!)

Note that "deplacer" is half an instruction--macros don't care about syntax.

The wide variety of macros supported by NASM is listed in Chapter 4 of the NASM manual. Briefly, this is:

For single-line replacements, use %define myThingy 17, as above. Basically just like C++ but with a different letter in front.
For bringing stuff in from a file, %include "somefile.S", again similar to C++.
For multi-line function-like macros, %macro myStuff nParameters and %endmacro
They also have conditional execution, like %ifndef and such.

Just a word of warning: every assembler does these things slightly differently, so you'll need to read the manual for your specific assembler.

For example:

%macro printit 1 ; calls print_long with its argument
	mov rdi,%1 ; copy macro argument 1 into parameter register
	extern print_long
	call print_long
%endmacro

mov rcx,23
printit rcx
ret
(Try this in NetRun now!)

I built a little stack tracing macro named "s" that you can use to watch the stack grow and shrink:

%include "lib/trace_s.S" ; defines a tricky stack tracing macro named "s"

s push 7
s push 3
s add rsp,16
s ret
(Try this in NetRun now!)

Various weird things that are handy in function-like macros include:

Local labels, like %%here, which only apply within that macro call. Defining a jump label the ordinary way usually results in multiple definitions if you call the macro several times.
You can stringify stuff in a macro using %defstr. The C++ "#arg" trick doesn't work.