Course Review for the Midterm Exam

CS 301 Lecture, Dr. Lawlor

Briefly, you should know both the how and the why of everything we did in class or on the homeworks.  Yes, this covers an enormous amount of stuff!

The test will be in class and on paper, probably 2-3 pages and about ten hard questions.  Example questions:

Writing Numbers in Binary, Decimal, Hexadecimal

Place/bit Numberi
...
4
3
2
1
0
Decimal: Base-10
10i
...
10000
1000
100
10
1
Binary: Base-2
2i ...
16 = 24
8 = 23
4 = 22
2
1
Hex: Base-16
16i ...
65536 = 164
4096 = 163 256 = 162
16
1
Base-n
ni ...
n4
n3 n2 n
1 = n0

Bitwise Operations

Name
Purpose
C++
Assembly
AND
Turn bits off, extract bits.
x=x&0xf0;
and rax,0xf0  (Try this in NetRun now!)
OR
Turn bits on, combine fields.
x=x|0xf0;
or rax,0xf0   (Try this in NetRun now!)
NOT
Invert all bits.
x=~x;
not rax     (Try this in NetRun now!)
XOR
Invert selected bits, encryption.
x=x^0xf0;
xor rax,0xf0    (Try this in NetRun now!)
left shift
Multiply by 2n.   Reassemble bits.
x=x<<2;
shl rax,2    (Try this in NetRun now!)
right shift
(unsigned)
Divide unsigned by 2n.
Brings in zero bits.
unsigned x;
x=x>>2;
shr rax,2   (Try this in NetRun now!)
right shift
(signed)
Divide signed by 2n.
Brings in copies of the sign bit.
int x;
x=x>>2;
sar rax,2   (Try this in NetRun now!)

You can combine these in weird and crazy ways, for example with the (mask&trueval)|((~mask)&falseval) trick.

Sizes of Stuff

Name C++ Asm Register Asm Memory Asm Data Bits Bytes
(8 bits)
Hex Digits
(4 bits)
Unsigned Range Signed Range
Bit n/a n/a n/a n/a 1 less than 1 less than 1 0..1 -1..0
Byte char al BYTE db 8 1 2 255 -128 .. 127
short short ax WORD dw 16 2 4 65535 -32768 .. +32767
32-bit int int eax DWORD dd 32 4 8 >4 billion -2G .. +2G
64-bit int "long" (or "long long") rax QWORD dq 64 8 16 >16 quadrillion -8Q .. +8Q

You can determine the usable range of a value by experimentally measuring the overflow point, for example with code like this:
int value=1; /* value to test, starts at first (lowest) bit */
int bit=0;
while (value!=0) {
value=value+value; /* moves over by one bit (value=value<<1 would work too) */
bit++;
}
return bit;

(Try this in NetRun now!)

Signed versus Unsigned Integers

two's complement signed integers.  The sign bit.  To flip the sign, flip all the bits and then add one.

Concept of Machine Code

Or, why this code works, and returns 73:

const char commands[]={
0xb0,73, /* load a value to return */
0xc3 /* return from the current function */
};
int foo(void) {
typedef int (*fnptr)(void); // pointer to a function returning an int
fnptr f=(fnptr)commands; // typecast the command array to a function
return f(); // call the new function!
}

(Try this in NetRun now!)

The Registers

Here's the full list of x86 registers.  The new 64 bit registers are shown in red.

Notes
64-bit
32-bit
16-bit
8-bit
Values are returned from functions in this register. 
Multiply instructions put the low bits of the result here too.
rax
eax
ax
ah and al
Typical scratch register.  Some instructions use it as a counter (such as bit shifts).
rcx
ecx
cx
ch and cl
Scratch register.  Multiply instructions put the high bits of the result here,
and divide demands that this contain zero.
rdx
edx
dx
dh and dl
Preserved register: don't use it without saving it!
rbx
ebx
bx
bh and bl
The stack pointer.  Points to the top of the stack
rsp
esp
sp
spl
Preserved register.  Sometimes used to store the old value of the stack pointer, or the "base".
rbp
ebp
bp
bpl
Scratch register.  Also used to pass function argument #2 (on 64-bit Linux).
rsi
esi
si
sil
Scratch register.  Function argument #1 (on 64-bit Linux). rdi
edi
di
dil
Scratch register.  These were added in 64-bit mode, so the names are slightly different.
r8
r8d
r8w
r8b
Scratch register.
r9
r9d
r9w
r9b
Scratch register.
r10
r10d
r10w
r10b
Scratch register. r11
r11d
r11w
r11b
Preserved register.
r12
r12d
r12w
r12b
Preserved register. r13 r13d r13w r13b
Preserved register. r14 r14d r14w r14b
Preserved register. r15 r15d r15w r15b

Conditional Jumps

These only work after a "cmp" instruction (or another instruction that sets the flags properly).

Instruction
Useful to...
jmp
Always jump
ja Unsigned >   (signed "x>n || x<0")
jae
Unsigned >= (signed "x>=n || x<0")
jb
Unsigned <   (signed "x<n && x>=0")
jbe
Unsigned <=  (signed "x<=n && x>=0")
jc
Unsigned overflow checking
jecxz
Compare ecx with 0   (what?!)
je or jz
Equality
jg
Signed >
jge
Signed >=
jl
Signed <
jle
Signed <=
jne or jnz
Inequality
jo
Signed overflow checking
jp or jpe
Parity check (even)
jpo
Parity check (odd)
js
Jump if negative
There are also "n" NOT versions for each jump; for example "jno" jumps if there is NOT overflow.

Often you can avoid a branch entirely using funky arithmetic, like "blended = (1-doit)*A+doit*B;".

Every C flow-control construct can be written using just "if" and "goto", which usually map one-to-one to a compare-and-jump sequence in assembly.

Normal C
Expanded C
if (A) {
  ...
}
if (!A) goto END;
{
  ...
}
END:
if (!A) {
  ...
}
if (A) goto END;
{
  ...
}
END:
if (A&&B) {
  ...
}
if (!A) goto END;
if (!B) goto END;
{
  ...
}
END:
if (A||B) {
  ...
}
if (A) goto STUFF;
if (B) goto STUFF;
goto END;
STUFF:
{
  ...
}
END:
while (A)  {
  ...
}
goto TEST;
START:
{
  ...
}
TEST: if (A) goto START;
do {
  ...
} while (A)
START:
{
  ...
}
if (A) goto START;
for (i=0;i<n;i++)
{
  ...
}
i=0;         /* Version A */
goto TEST;
START:
{
  ...
}
i++;
TEST: if (i<n) goto START;
for (i=0;i<n;i++)
{
  ...
}

i=0;          /* Version B */
START: if (i>=n) goto END;
{
  ...
}
i++;
goto START;
END:

Call, Return, and The Stack

Note that if you leave junk on the stack, "ret" will happily try to return there, usually resulting in a crash.
If your bad C++ "char buf[100];" gets a big string slammed into it, it will eventually overwrite the return address, possibly handing your machine to evil hackers.

Typical use of push and pop to save a preserved register:
push rbp
... happily trash rbp in my function ...
pop rbp ; restore his version
ret ; <- he'll have no idea that I trashed his register!

Or, to preserve a scratch register across a function call:
push rcx
call ....; he'll trash rcx
pop rcx; bring my version back!new

You can save as many registers as you want, but you need to pop them in the opposite order--otherwise you've flipped their values around!

Accessing Memory in Arrays

Array element i of the "int" array arr is stored at DWORD [arr+4*i].
Array element i of the "long" array arr is stored at QWORD [arr+8*i].
Character i of the "char" array arr is stored at BYTE [arr+i].

Accessing Memory in a Class or Struct

A pointer to a class is a pointer to the first thing inside the class.  You can figure out the "offset" of anything inside the class by counting the size of the stuff before it, plus padding inserted for alignment.  Or you can call "offsetof(classname,fieldname)" and let the compiler do the work.

Alignment: a pointer to an object using n bytes must be a multiple of n.  E.g., a pointer to an int (4 bytes) will be a multiple of 4, so 0x1234 is OK, 0x1235 isn't!

Memory Allocation: Static, Malloc, and the Stack

Static allocation with "dd" (data DWORD, permanently reserves program memory):
mov DWORD[myInt],7 ; overwrite our int
mov eax,DWORD[myInt] ; copy the modified int into eax
ret

section .data ; <- makes the following stored in writeable memory
myInt:
dd 2 ; "data DWORD" containing this value

(Try this in NetRun now!)

Dynamic allocation with malloc (call "free" to deallocate the space when you're done):

mov rdi, 4; malloc's first (and only) parameter: number of bytes to allocate
extern malloc
call malloc
; on return, rax points to our newly-allocated memory

mov DWORD [rax],7; write constant into memory
mov eax,DWORD [rax]; read it back from memory
ret

(Try this in NetRun now!)

Dynamic allocation on the stack (be sure to hand the space back afterwards!):

sub rsp,8 ; I claim the next eight bytes in the name of... me!

mov DWORD [rsp],1492 ; store an integer into our stack space
mov eax,DWORD [rsp] ; read integer from where we stored it

add rsp,8 ; Hand back the stack space
ret

(Try this in NetRun now!)

The stack must always be 8 byte aligned, which is why I'm grabbing 8 bytes, although I only need 4 bytes.  Actually, if you call any functions that do floating point work, the stack actually needs to be 16 byte aligned once you're inside the function!

Defining Functions in Assembly, mixing C++ and Assembly

To make an assembly function visible from outside, from your assembly say:

    global myfunc
    myfunc:

You may need to add underscores, or capital letters, or some other crud depending on the system.

To call this from C++, declare a function prototype:

   extern "C" int myfunc(int someParameter);

Your function will find its parameters in registers (64-bit mode) or on the stack (32-bit mode).  The exact registers depend on the machine's calling convention:


64-bit Linux, Mac 64-bit Windows 32-bit
Return value
rax
rax
eax
First parameter rdi rcx DWORD[esp+4]
Second parameter rsi rdx DWORD[esp+8]
Third parameter rdx r8 DWORD[esp+12]
More parameters see docs

You can also intermix assembly language inside your C++ code using "inline assembly".  The syntax is nice on Windows: __asm { mov eax,13 }, but hideous and dyslexic on Linux or Mac.