Course Review for Final Exam
CS 441 Lecture, Dr. Lawlor
Things to know for the final exam:
- SSE/AVX/SIMD instructions: basic idea, how to convert loops (unwrap + cleanup), how to convert branches (SIMD bitwise branch trick), general performance limits (number of SIMD units) and pitfalls (branch heavy or divergent code).
- OpenMP/pthreads multithreaded shared memory: basic idea, how to
convert loops (divide up loop iterations), performance limits (core
count and memory bus), and pitfalls (race conditions, false sharing).
- sockets/MPI distributed memory message passing: basic idea, how
to write an application (divide up work, pass messages), performance
limits (core count, network speed), and pitfalls (waiting for messages
that will never arrive).
- GPU programming: basic idea, how to write an application (call a kernel, move memory around), performance limits (SIMD*cores*pipeline depth), and pitfalls (sequential code).
Cross cutting parallel performance issues:
- Programming model: machines must be easy to program, or nobody will use the fancy parallel stuff.
- Load balance: keep all the parallel hardware busy doing useful work, or performance suffers. Amdahl's Law.
- Data movement: balance accesses between network, RAM, coherent caches, local caches, and registers. Race conditions.
Everything you need to know is in the lecture notes!