const unsigned char table[]={
0xb0, /*set x = ... */
73, /* ... this byte */
0xc3 /* exit */
};
int foo(void) {
int x=0; /* our "register" (temporary storage, and return value) */
int i=0; /* our location in the table */
while (1) { /* always keep looping through the table */
int instruction=table[i++];
if (instruction==0xb0) { /* set-x instruction */
x=table[i++]; /* next byte is the new value for x */
}
else if (instruction==0xc3) {
return x; /* stop looping through the table */
}
else {
cout<<"Illegal instruction:" <<std::hex<<instruction<<"\n";
return -999;
}
}
}
What's amazing is that I can tell the CPU to execute the bytes above,
and those bytes act like a function that returns 73--the CPU is just
table-driven hardware! For example, the byte "0xc3" tells an x86 CPU to
return from the current function. The byte "0xb0" is followed by
a one-byte parameter to load up for return. So this code actually works!
const char commands[]={
0xb0,73, /* load a value to return */
0xc3 /* return from the current function */
};
int foo(void) {
typedef int (*fnptr)(void); // pointer to a function returning an int
fnptr f=(fnptr)commands; // typecast the command array to a function
return f(); // call the new function!
}
These raw byte commands that the CPU executes are called "machine
code". "assembly language" is just a human-readable translation
of machine code. An "assembler", like NASM, reads assembly language and writes executable machine code. A "disassembler", like PE Explorer or IDA Pro (for Windows), or objdump
(for Linux or Mac OS X), reads an executable and writes assembly
language (in NetRun, hit "Disassemble" checkbox under "Options").
If you just want to look at the machine code inside a function, you
can just do some pointer typecasting from function to array (the opposite of what we did above!) and start printing bytes of
machine code:
int bar(void) { /* some random function: we look at bar's machine code below! */This prints out the same bytes inside bar that you see in the "Disassembler" tab. Which instructions is the compiler using? 0xb8 is the 32-bit version of the load-a-constant instruction 0xb0 above, so the next four bytes are all representing the constant 3 (0x00000003 is stored as 0x03 0x00 0x00 0x00). 0xc3 is just the return instruction, like we used above.
return 3;
}
int foo(void) {
const unsigned char *data=(unsigned char *)(&bar);
for (int i=0;i<10;i++) /* print out the bytes of the bar function */
std::cout<<"0x"<<std::hex<<(int)data[i]<<"\n";
return 0;
}
mov eax,5The assembler (NASM, in this case) will then spit out the following machine code:
ret
00000000 <foo>:Note the middle column contains the same 0xb8 and so on that the compiler generates, or we could even write by hand. (NetRun always puts in a spare "ret" instruction at the end, in case you forget.)
0: b8 05 00 00 00 mov eax,0x5
5: c3 ret
mov eax,Yup, line-oriented stuff is indeed annoying. Be careful that your editor doesn't mistakenly add newlines to long lines of text! I usually leave off the semicolons for lines without comments, because otherwise I find myself tempted to do this:
5
mov ecx, 5; mov eax, 3; Whoops!It doesn't look like it, but the semicolon makes that second instruction A COMMENT!
int a = 3;And finally here's the assembly code:
int c = 7;
a += c;
return a;
mov eax, 3(executable NetRun link)
mov ecx, 7
add eax, ecx
ret
Opcode |
Does |
Example |
add |
+ |
add eax,ecx |
sub |
- |
sub eax,ecx |
imul |
* |
imul eax,ecx |
idiv |
/ |
idiv ecx <- Warning! Weirdness! (see below) |
and |
& |
and eax,ecx |
or |
| |
or eax,ecx |
xor |
^ |
xor eax,ecx |
not |
~ |
not eax |
mov eax,73; top
mov ecx,10; bottom
mov edx,0 ; high bits of top
idiv ecx ; divide eax by ecx
; now eax = 73/10, edx=73%10
What a strange instruction!