Course Review and Cheat Sheet

CS 301 Lecture, Dr. Lawlor

Briefly, you should know both the how and the why of everything we did in class or on the homeworks.

Writing Numbers in Binary, Decimal, Hexadecimal

Place/bit Number	i	...	4	3	2	1	0
Decimal: Base-10	10ⁱ	...	10000	1000	100	10	1
Binary: Base-2	2ⁱ	...	16 = 2⁴	8 = 2³	4 = 2²	2	1
Hex: Base-16	16ⁱ	...	65536 = 2	4096 = 16³	256 = 16²	16	1
Base-n	nⁱ	...	n⁴	n³	n²	n	1 = n⁰

Bitwise Operations

Name	Purpose	C++	Assembly
AND	Turn bits off, extract bits.	x=x&0xf0;	and rax,0xf0 (Try this in NetRun now!)
OR	Turn bits on, combine fields.	x=x\|0xf0;	or rax,0xf0 (Try this in NetRun now!)
NOT	Invert all bits.	x=~x;	not rax (Try this in NetRun now!)
XOR	Invert selected bits, encryption.	x=x^0xf0;	xor rax,0xf0 (Try this in NetRun now!)
left shift	Multiply by 2ⁿ. Reassemble bits.	x=x<<2;	shl rax,2 (Try this in NetRun now!)
right shift (unsigned)	Divide unsigned by 2ⁿ. Brings in zero bits.	unsigned x; x=x>>2;	shr rax,2 (Try this in NetRun now!)
right shift (signed)	Divide signed by 2ⁿ. Brings in copies of the sign bit.	int x; x=x>>2;	sar rax,2 (Try this in NetRun now!)

Sizes of Stuff

Name	C++	Asm Register	Asm Memory	Asm Data	Bits	Bytes (8 bits)	Hex Digits (4 bits)	Unsigned Range	Signed Range
Bit	n/a	n/a	n/a	n/a	1	less than 1	less than 1	0..1	-1..0
Byte	char	al	BYTE	db	8	1	2	255	-128 .. 127
short	short	ax	WORD	dw	16	2	4	65535	-32768 .. +32767
32-bit int	int	eax	DWORD	dd	32	4	8	>4 billion	-2G .. +2G
64-bit int	"long" (or "long long")	rax	QWORD	dq	64	8	16	>16 quadrillion	-8Q .. +8Q

You can determine the usable range of a value by experimentally measure the overflow point, for example with code like this:

int value=1; /* value to test, starts at first (lowest) bit */
int bit=0;
while (value!=0) {
	value=value+value; /* moves over by one bit (value=value<<1 would work too) */
	bit++;
}
return bit;

(Try this in NetRun now!)

Signed versus Unsigned Integers

two's complement signed integers. The sign bit. To flip the sign, flip all the bits and then add one.

Concept of Machine Code

Or, why this code works, and returns 73:

const char commands[]={
	0xb0,73, /* load a value to return */
	0xc3 /* return from the current function */
};
int foo(void) {
	typedef int (*fnptr)(void); // pointer to a function returning an int
	fnptr f=(fnptr)commands; // typecast the command array to a function
	return f(); // call the new function!
}

(Try this in NetRun now!)

The Registers

Here's the full list of x86 registers. The new 64 bit registers are shown in red.

Notes	64-bit	32-bit	16-bit	8-bit
Values are returned from functions in this register. Multiply instructions put the low bits of the result here too.	rax	eax	ax	ah and al
Typical scratch register. Some instructions use it as a counter (such as bit shifts).	rcx	ecx	cx	ch and cl
Scratch register. Multiply instructions put the high bits of the result here, and divide demands that this contain zero.	rdx	edx	dx	dh and dl
Preserved register: don't use it without saving it!	rbx	ebx	bx	bh and bl
The stack pointer. Points to the top of the stack	rsp	esp	sp	spl
Preserved register. Sometimes used to store the old value of the stack pointer, or the "base".	rbp	ebp	bp	bpl
Scratch register. Also used to pass function argument #2 (on 64-bit Linux).	rsi	esi	si	sil
Scratch register. Function argument #1 (on 64-bit Linux).	rdi	edi	di	dil
Scratch register. These were added in 64-bit mode, so the names are slightly different.	r8	r8d	r8w	r8b
Scratch register.	r9	r9d	r9w	r9b
Scratch register.	r10	r10d	r10w	r10b
Scratch register.	r11	r11d	r11w	r11b
Preserved register.	r12	r12d	r12w	r12b
Preserved register.	r13	r13d	r13w	r13b
Preserved register.	r14	r14d	r14w	r14b
Preserved register.	r15	r15d	r15w	r15b

Conditional Jumps

These only work after a "cmp" instruction (or another instruction that sets the flags properly).

Instruction	Useful to...	Flags (see below)
jmp	Always jump	None
ja	Unsigned > (signed "x>n \|\| x<0")	CF=0 and ZF=0
jae	Unsigned >= (signed "x>=n \|\| x<0")	CF=0
jb	Unsigned < (signed "x<n && x>=0")	CF=1
jbe	Unsigned <= (signed "x<=n && x>=0")	CF=1 or ZF=1
jc	Unsigned overflow checking	CF=1
jecxz	Compare ecx with 0	ecx=0
je or jz	Equality	ZF=1
jg	Signed >	ZF=0 and SF=OF
jge	Signed >=	SF=OF
jl	Signed <	SF!=OF
jle	Signed <=	ZF=1 or SF!=OF
jne or jnz	Inequality	ZF=0
jo	Signed overflow checking	OF=1
jp or jpe	Parity check (even)	PF=1
jpo	Parity check (odd)	PF=0
js	Jump if negative	SF=1

There are also "n" NOT versions for each jump; for example "jno" jumps if there is NOT overflow.

Often you can avoid a branch entirely using funky arithmetic, like "blended = (1-doit)*A+doit*B;".

Every C flow-control construct can be written using just "if" and "goto", which usually map one-to-one to a compare-and-jump sequence in assembly.

Normal C	Expanded C
if (A) { ... }	if (!A) goto END; { ... } END:
if (!A) { ... }	if (A) goto END; { ... } END:
if (A&&B) { ... }	if (!A) goto END; if (!B) goto END; { ... } END:
if (A\|\|B) { ... }	if (A) goto STUFF; if (B) goto STUFF; goto END; STUFF: { ... } END:
while (A) { ... }	goto TEST; START: { ... } TEST: if (A) goto START;
do { ... } while (A)	START: { ... } if (A) goto START;
for (i=0;i<n;i++) { ... }	i=0; /* Version A */ goto TEST; START: { ... } i++; TEST: if (i<n) goto START;
for (i=0;i<n;i++) { ... }	i=0; /* Version B */ START: if (i>=n) goto END; { ... } i++; goto START; END:
switch (x) { case 0: ...; break; case 1: ...; break; case 2: ...; break;	jmp [table + rax*8] label0: ... jmp end label1: ... jmp end label2: ... jmp end end: table: dq label0, label1, label2

Call, Return, and The Stack

push == subtract the stack pointer, and move QWORD there.
pop == move QWORD off stack, and add stack pointer. Must match with a push!
call == push address to come back to, and jmp to new function.
ret == pop address to return to, and jmp back there.

Note that if you leave junk on the stack, "ret" will happily try to return there, usually resulting in a crash.
If your bad C++ "char buf[100];" grows too long, it might overwrite the return address, possibly handing your machine to evil hackers.

Typical use of push and pop to save a preserved register:
push rbp
... happily trash rbp in my function ...
pop rbp ; restore his version
ret ; <- he'll have no idea that I trashed his register!

Or, to preserve a scratch register across a function call:
push rcx
call ....; he'll trash rcx
pop rcx; bring my version back!new

You can save as many registers as you want, but you need to pop them in the opposite order--otherwise you've flipped their values around!

Accessing Memory in Arrays

Array element i of the "int" array arr is stored at [arr+4*i].
Array element i of the "long" array arr is stored at [arr+8*i].
Character i of the "char" array arr is stored at [arr+i].

Accessing Memory in a Class or Struct

A pointer to a class is a pointer to the first thing inside the class. You can figure out the "offset" of anything inside the class by counting the size of the stuff before it, plus padding inserted for alignment. Or you can call "offsetof(classname,fieldname)" and let the compiler do the work.

Alignment: a pointer to an object using n bytes must be a multiple of n. E.g., a pointer to an int (4 bytes) will be a multiple of 4, so 0x1234 is OK, 0x1235 isn't!

Memory Allocation: Static, Malloc, and the Stack

Static allocation with "dd" (data DWORD, permanently reserves program memory):

mov DWORD[myInt],7 ; overwrite our int
mov eax,DWORD[myInt] ; copy the modified int into eax
ret

section .data
myInt:
	dd 2 ; "data DWORD" containing this value

(Try this in NetRun now!)

Dynamic allocation with malloc (call "free" to deallocate the space when you're done):

mov rdi, 4; malloc's first (and only) parameter: number of bytes to allocate
extern malloc
call malloc
; on return, rax points to our newly-allocated memory

mov DWORD [rax],7; write constant into memory
mov eax,DWORD [rax]; read it back from memory
ret

(Try this in NetRun now!)

Dynamic allocation on the stack (be sure to hand the space back afterwards!):

sub rsp,8 ; I claim the next eight bytes in the name of... me!

mov DWORD [rsp],1492 ; store an integer into our stack space
mov eax,DWORD [rsp] ; read integer from where we stored it

add rsp,8 ; Hand back the stack space
ret

(Try this in NetRun now!)

The stack must always be 8 byte aligned, which is why I'm grabbing 8 bytes, although I only need 4 bytes. Actually, if you call any functions that do floating point work, the stack actually needs to be 16 byte alignedonce you're inside the function!

Defining Functions in Assembly, mixing C++ and Assembly

To make an assembly function visible from outside, from your assembly say:

global myfunc
myfunc:

You may need to add underscores, or capital letters, or some other crud depending on the system.

To call this from C++, declare a function prototype:

extern "C" int myfunc(int someParameter);

Your function will find its parameters in registers (64-bit mode) or on the stack (32-bit mode). The exact registers depend on the machine's calling convention:

	64-bit Linux, Mac	64-bit Windows	32-bit
Return value	rax	rax	eax
First parameter	rdi	rcx	DWORD[rsp+4]
Second parameter	rsi	rdx	DWORD[rsp+8]
Third parameter	rdx	r8	DWORD[rsp+12]
More parameters	see docs

You can also intermix assembly language inside your C++ code using "inline assembly". The syntax is nice on Windows: __asm { mov eax,13 }, but hideous and dyslexic on Linux or Mac.