Data Dependencies |
Branching |
Old Code |
Load Imbalance |
|
Pipelining |
Solve with operand forwarding (register file bypass) |
Predict or flush |
May have poor branch behavior |
Breaking up the pipeline stages so they take approximately equal time. |
Superscalar |
Solve WAW/WAR with register renaming; Tolerate RAW with out-of-order execution. |
Predict or die! |
May not have enough instruction-level parallelism |
Keeping all execution units busy (e.g., big instruction window) |
SIMD | Between instructions: see above. Within a register: need data shuffling. |
Take both branches, mux into one answer. Branch locality is important. |
Needs to be rewritten to use weird SIMD datatypes and branches. See Intel's ArBB (or NVIDIA CUDA) for automatic translation. |
Not a problem, since data-parallel is automatically balanced |
Multicore |
Solve WAW/WAR with privatization; Tolerate RAW with locks or atomics. |
Not a problem, since cores can branch independently. |
May have multithread shared data problems, making it difficult to run correctly in multicore. |
Keeping all the threads busy (e.g., dynamic scheduling) |