Metaprogramming with Macros in Assembly and C++
CS 301 Lecture, Dr. Lawlor
"Metaprogramming" is when the output of your program is another
program. It's a common trick--for example, if your code needs a
table of primes, you can make the list at runtime (which is slow),
pregenerate the list and hardcode it in the code (which makes for lots
of code), or write a program to generate the code for the table, and
run the program before compiling the main program!
Basic C++ Macros
C++ inherited a pretty heavy-duty line oriented preprocessor from
C. It's a totally separate program (cpp) using a unique
string-rewriting language.
The standard uses of this are pretty straightforward:
#define symbol replacement
makes the preprocessor replace every occurrence of "symbol" with
"replacement" before compiling. So this works fine, and returns
17:
#define n 17
return n;
(Try this in NetRun now!)
Unlike everything else in C++, "n" is now defined as 17 from this point
onwards, even across classes, functions, or anything. Declaring
"int n=3;" gets rewritten to "int 17=3;", which won't compile!
Macro Danger
Let's say you've defined a constant like:
#define n 10+10
return n*n;
(Try this in NetRun now!)
This returns 10+10*10+10 = 120. What?!
There are several well known fixes for this bug:
- Never use macros. "enum" or "const int" provide basically
the same functionality, and don't have this problem. Some see the
preprocessor as an unwanted holdover from plain C that should never be
used.
- If you #define something, wrap it in parenthesis, like
#define n (10+10)
This works, but you must remember to do it every time!
- If you use a macro, wrap the uses in parenthesis just in case,
like "return (n)*(n);". This is annoying, and you again have to
remember to do it every time.
Macros with arguments
A macro can take an argument, sort of like a preprocessor-time
function. The argument gets pasted in with the string
version. Here's a straightforward usage:
#define square(x) ((x)*(x))
return square(10);
Note the extra parenthesis, used to avoid the bug above.
Because macro parameters are just string replaced, you can do weirdly
powerful metaprogramming: this "twice" macro works with any operator:
#define twice(op,x) ((x) op (x))
return twice(*,10);
(Try this in NetRun now!)
Hairy C++ Macro Features
- __FILE__ expands to a string with the filename of the current source code. Handy for debugging.
- __LINE__ expands to an integer with the current line number in the source code. Handy for generating names that should be different for each call to the macro.
- #b makes a quoted string version of the argument b, which is handy for print statements. This is "stringification".
- a##b sticks together constants or arguments a and b without any
spaces. This "token pasting" is handy for generating new names,
like "myClass_##name" or "myClass_#__LINE__".
- You can extend a macro across several lines with a backslash.
These are only for your convenience in writing the macro, and don't
make it out to the compiler. In some compilers, a // comment
inside a macro will thus kill off the whole rest of the macro!
Here's an example of stringification:
/* "trace" macro executes the argument, then prints it to the screen as a string! */
#define trace(code) code; std::cout<<#code<<"\n";
trace( int x=3; )
trace( x+=2; )
trace( return x; )
(Try this in NetRun now!)
I use stringification all the time for GPU programming, where the
graphics driver wants the GPU code as a string. I can have the same
code work in C++ directly, then have a macro spit out a stringified
version for the GPU to run.
Another place stringification is useful is in error checking.
This not only checks for errors, but shows you the code and tells you
the line number where they happened:
#define checkerrs(code) { int err=code; /* run */ if (err!=0) std::cout<<"Error in "<<#code<<" at line "<<__LINE__<<" of file "<<__FILE__<<"\n"; }
int x=18;
checkerrs(x-18);
checkerrs(x-10);
return 0;
(Try this in NetRun now!)
This code looks a little better using backslashes to separate the lines:
#define checkerrs(code) { \
int err=code; /* run */ \
if (err!=0) { \
std::cout<<"Error "<<err<<" in '"<<#code<<"' at line "<<__LINE__<<" of file "<<__FILE__<<"\n"; \
} \
}
I need the curly braces to be able to declare "int err" repeatedly. But
now people typically add an extra semicolon at the end of the macro
call; this is untidy, and will throw off an "if..else" statement with
the macro in the middle. There's a bizarre well-known solution,
which is to add a worthless do{}while(0) that only exists to consume
the semicolon:
#define checkerrs(code) do { \
int err=code; /* run */ \
if (err!=0) { \
std::cout<<"Error "<<err<<" in '"<<#code<<"' at line "<<__LINE__<<" of file "<<__FILE__<<"\n"; \
} \
} while(0)
Another trick I use a lot is to generate classes inside a macro.
For example, if my calculator needs ten "operator" classes for each of
the basic operators, I'll generate them with a macro like this:
#define makeop(name,op) \
class calcop_##name { public: \
int calculate(int a,int b) { return a op b; } \
}
makeop(add,+);
makeop(sub,-);
makeop(mul,*);
makeop(div,/);
makeop(and,&);
makeop(or,|);
makeop(left,<<);
makeop(right,>>);
int foo(void) {
calcop_add a;
return a.calculate(100,10);
}
(Try this in NetRun now!)
The nice part is now if you need the calcop classes to inherit from
some base class, you can add it to the macro. To add a
"getSymbol" method returning the operator's symbol, you can use
stringification to add it to the macro definition like "const char
*getSymbol(void) const { return #op; }". If each operator needs
to be registered into the list of operators, you can add that as well.
C++ macros can become fairly complex, which is bad, but they provide very interesting abilities, especially in big programs.
NASM Macros
The wide variety of macros supported by NASM is listed in Chapter 4 of the NASM manual. Briefly, this is:
- For single-line replacements, use %define myThingy 17, basically just like C++ but with a different letter in front.
- For bringing stuff in from a file, %include "somefile.S", again similar to C++.
- For multi-line function-like macros, %macro myStuff nParameters and %endmacro
- They also have conditional execution, like %ifndef and such.
Just a word of warning: every assembler does these things slightly
differently, so you'll need to read the manual for your specific
assembler.
For example:
%define n 10
mov rax,n
ret
(Try this in NetRun now!)
%macro printit 1 ; calls print_long with its argument
mov rdi,%1 ; copy argument into parameter register
extern print_long
call print_long
%endmacro
mov rcx,23
printit rcx
ret
(Try this in NetRun now!)
I built a little stack tracing macro named "s" that you can use to watch the stack grow and shrink:
%include "lib/trace_s.S" ; defines a tricky stack tracing macro named "s"
s push 7
s push 3
s add rsp,16
s ret
(Try this in NetRun now!)
Various weird things that are handy in function-like macros include:
- Local labels, like %%here, which only apply within that macro call. Defining a jump label the ordinary way usually results in multiple definitions if you call the macro several times.
- You can stringify stuff in a macro using %defstr. The C++ "#arg" trick doesn't work.