Course Review for the Final Exam

CS 301 Lecture, Dr. Lawlor

The final exam is this Friday, 3:15pm, in the usual classroom (Chapman 106). It's nominally two hours long, but in reality it shouldn't take much longer than the midterm, an hour at most.

Things you should know for the final exam are all contained in the lecture notes and homeworks:

Basics of assembly language:

bits, bytes, and bitwise operations
size of char, short, int, long, and pointers
preserved and scratch registers for x86-64
how the stack works, especially around a function call
how flags work, and how "cmp ecx,4" and "jle somewhere" actually work together
existence of and purpose of machine code.

How classes get laid out in memory:

size in bytes of a class
byte offsets from a pointer to the start of the class
padding for alignment
the existence and purpose of the vtable pointer.

Floating-point numbers:

SSE instructions, like "addss" and how to use them to solve float problems.
Parallel SSE instructions, like "addps", and how to use them to solve bigger problems.
Bits used to represent the sign (1 bit), exponent (8 bits), and mantissa (23 bits, plus implicit 1)
Floating-point roundoff:

small + big - big == 0 ?!
big + small == big ?!

Fixes for roundoff:

use a bigger datatype (replace float with double)
switch to integer (but watch out for overflow!)
rearrange floating-point operations to do all small numbers first, then the bigger ones

Performance impact of denormal and NaN floats.

High performance computing:

Algebraic rearrangement, like replacing x=x*x*x*x*x*x*x*x; with x=x*x; x=x*x; x=x*x;
Strength reduction, like replacing integer x%2ⁿ with x&(2ⁿ-1) like in HW7.2; or floating point a/b with a*(1.0/b) like in HW7.5.
Cache optimizations, like:

moving your data structures closer together in memory (e.g., eliminate intermediate allocations, or malloc overhead, or just shrink the size of your classes)
rearranging memory accesses for better locality (e.g., interchange x and y loops)
avoiding big power-of-two jumps (be nicer to L1 cache's memory-to-cache mapping)

GPU computing, like GLSL requires a complete rewrite but gives excellent performance for floating-point intensive code with few branches.
Multicore computing, like OpenMP, works with most plain old C++ code and is better for branch, network, or disk intensive operations.