Switch Statements and CPU Machine Code

CS 301 Lecture, Dr. Lawlor

Here's a table-driven cat:

const int commands[]={
	0, /* sleeps */
	1, /* meow */
	1,
	2,73, /* eats */
	0, /* sleep */
	9 /* exit */
};

int foo(void) {
	int index=0;
	while (true) {
		switch (commands[index]) {
		case 0: std::cout<<"sleep\n"; break;
		case 1: std::cout<<"meow\n"; break;
		case 2: /* need a parameter so we know what to eat */
			{
				index++; /* skip over the 2 */
				int what=commands[index];
				std::cout<<"eat "<<what<<"\n"; 
				break;
			}
		case 9: return 0;
		default: std::cout<<"Unrecognized cat-cmd! cmd="<<commands[index]<<"\n";
		}
		index++;
	}
}

(Try this in NetRun now!)

The only interesting thing happening here is the "eat" case: this command needs a parameter, so we put the parameter right in the array of commands.

CPU Machine Code

The CPU is not a cat. So as you might expect, running the above command table on the CPU just crashes horribly:

const char commands[]={
	0, /* sleeps */
	1, /* meow */
	1,
	2,73, /* eats */
	0, /* sleep */
	9 /* exit */
};
int foo(void) {
	typedef int (*fnptr)(void); // pointer to a function returning an int
	fnptr f=(fnptr)commands; // typecast the command array to a function
	return f(); // call the new function!
}

(Try this in NetRun now!)

(Don't worry about the hideous C++ syntax for function pointer stuff.)

However, the CPU is a table-driven machine, only it uses different values for commands: for example, the byte "0xc3" tells an x86 CPU to return from the current function. The byte "0xb0" is followed by a one-byte parameter to load up for return. So this works!

const char commands[]={
	0xb0,73, /* load a value to return */
	0xc3 /* return from the current function */
};
int foo(void) {
	typedef int (*fnptr)(void); // pointer to a function returning an int
	fnptr f=(fnptr)commands; // typecast the command array to a function
	return f(); // call the new function!
}

(Try this in NetRun now!)

These raw byte commands that the CPU executes are called "machine code". "assembly language" is just a human-readable translation of machine code. An "assembler", like NASM, reads assembly language and writes executable machine code. A "disassembler", like PE Explorer or IDA Pro (for Windows), or objdump (for Linux or Mac OS X), reads an executable and writes assembly language.

If you just want to look at the machine code inside a function, you can just do some pointer typecasting and start printing bytes of machine code:

int bar(void) { /* some random function: we look at bar's machine code below! */
	return 17;
}

int foo(void) {
	const unsigned char *data=(unsigned char *)(&bar);
	for (int i=0;i<10;i++) /* print out the bytes of the bar function */
		std::cout<<"0x"<<std::hex<<(int)data[i]<<"\n";
	return 0;
}

(Try this in NetRun now!)

We'll be learning about assembly language and machine code for the rest of the semester!