T-I-D-A |
Description |
|
Sequential Programming Model |
||
1-1-1-1: Sequential |
A sequential CPU reads a single instruction, pulls its data, and executes it. This is the model programmers have been using since 1948. | |
1-p-p-1: Pipelined |
Here we've split the CPU's execution path into p pipeline stages. You need at least p instructions, and at least p independent data items. Only one arithmetic unit actually completes a result each clock cycle, though. The point of pipelining a CPU is to do the fetching, decoding, and operand reads in parallel with actual execution. Typical pipelines are 3 to 30 stages long, with a current typical value of about 10 stages. | |
1-k-k-k: Superscalar |
A superscalar CPU simultaniously
fetches a set of k instructions, reads all their data, and executes
them all at once. This only
works if all k instructions and data are utterly independent, which is
rare, so k is typically 2 to 4 for real CPUs. In a search for
more independent instructions, superscalar machines typically need register renaming, out-of-order execution, and sophisticated branch prediction. |
|
Multithreaded Programming Model |
||
s-1-s-s: SMP (Multicore) |
This is just s replicated copies of a single CPU. Unlike superscalar, multicore requires the programmer to specify s independent execution threads, but the benefit is the CPU doesn't need to do dependency analysis, so s can reach into dozens or even hundreds. |
|
h-1-h-1: SMT (Hyperthreading) |
Here we have h
replicated registers and decoders, but they all share a single set of
arithmetic units. This has the same programming model as
multicore, but the hardware is cheaper. |
|
SIMD Programming Model |
||
1-1-m-m: SIMD |
Single Instruction Multiple Data:
the programmer issues a single instruction, like "addps", and it runs
several data items through a set of arithmetic units. The
advantage is fewer instructions, which means less work fetching and
decoding. |
|
1-1-v-1: Vector |
Vector machines,
such as the Cray Y-MP, could add vector registers with a single
instruction. But unlike full SIMD, the hardware only had one
arithmetic circuit, so the vector's values had to go in one at a
time. Again, this is the same programming model as SIMD, but the
hardware is cheaper. |
Single threaded |
||||||||||
One Instruction at a time |
|
|||||||||
Many Instructions simultaniously |
|