Review for Midterm: Crosscutting Issues

CS 441/641 Lecture, Dr. Lawlor

Many of the same issues come up again and again at different levels of the system.
Here's how these issues are solved at various levels of the machine.

Data Dependencies
Branching
Old Code
Load Imbalance
Pipelining
Solve with operand forwarding (register file bypass)
Predict or flush
May have poor branch behavior
Breaking up the pipeline stages so they take approximately equal time.
Superscalar
Solve WAW/WAR with register renaming;
Tolerate RAW with out-of-order execution.
Predict or die!
May not have enough instruction-level parallelism
Keeping all execution units busy (e.g., big instruction window)
SIMD Between instructions: see above.
Within a register: need data shuffling.
Take both branches, mux into one answer.  Branch locality is important.
Needs to be rewritten to use weird SIMD datatypes and branches.  See Intel's ArBB (or NVIDIA CUDA) for automatic translation.
Not a problem, since data-parallel is automatically balanced
Multicore
Solve WAW/WAR with privatization;
Tolerate RAW with locks or atomics.
Not a problem, since cores can branch independently.
May have multithread shared data problems, making it difficult to run correctly in multicore.
Keeping all the threads busy (e.g., dynamic scheduling)

Other things to know:
641 students should be familiar with the broad outlines of the topics in each of the research papers I've been sending out.