Parallel Computing and MPI
CS 321 Lecture,
Dr. Lawlor, 2006/03/10
This material is "bonus" material, in the sense that I won't ask
about it on homeworks or tests. If this sort of thing interests
you, talk to me and I'll see about offering a class on parallel
computing--possibly CS 421 (Distributed Operating Systems) or a CS 493
(Special Topics).
Motivation
Every year, chip designers are able to pack more and more transistors
onto each piece of silicon. But it's getting tougher and tougher
to figure out where to add transistors to make a single fast processor
faster. You could add more cache, or more arithmetic logic, or
more prediction/speculation/control logic, but single-processor
machines are mostly limited today by the dependencies between
instructions.
An obvious way to add value to a piece of silicon, then, is to put more
processors on it. This was first tried back in the late 1990's
with "Symmetric Multi Threading", or HyperThreading (Intel's brand
name), where two processors share their arithmetic units. The
trouble with HyperThreading is that sharing creates a more complicated
chip, and introduces pipelining problems that don't exist for
independent processors.
Intel's latest chip is the Core series. The name comes from the
term "multicore", where two or more totally separate processors sit on
one piece of silicon. Apple is using multi-core chips, including the Core Duo, in all their latest designs.
Intel has publically discussed moving to multicore systems since 1989.
History of Parallel Computing
Multiple processors running simultaniously means you're doing parallel computing.
The first major parallel machine was built in Illinois, the ILLIAC-IV.
There's a "Top 500" list updated every 6 months of the fastest computers in the world.
Machines are measured in "flops": floating-point operations per
second. A typical desktop nowadays might get around 8 gigaflops
with SSE instructions: 0.5ns per SSE add means 2 billion SSE adds per
second, and since each SSE add operates on 4 floats, that's 8 gigaflops.
In 2002, the fastest machine in the world was the Japanese Earth Simulator,
an NEC SX-6 style vector processor capable of 35 teraflops (TF).
The Earth Simulator cost around $400 million, and had "just" 5,120
processors.
In 2005, IBM's Blue Gene
eclipsed the Earth Simulator. Blue Gene uses custom (low-end!)
IBM Power 440 microprocessors on a special very high-density
package--it's got 128,000 processors, and is capable of 280 teraflops.
IBM is also putting a Power processor into the Cell chips for the PlayStation 3.
MPI: the Message Passing Interface
MPI is the standard interface for talking over the network on a parallel machine. Every
big parallel machine has MPI installed, ready to go, and running every
day. Back in the late 1990's people were still using other
interfaces like PVM, and for shared-memory Fortran programs OpenMP is fairly popular, but the biggest interface for the biggest machines is invariably MPI.
The offical source of MPI documentation is the standard. You can also find tons of MPI tutorials on the web. Here's a typical MPI tutorial.