The Message Passing Interface: MPI

CS 441 Lecture, Dr. Lawlor

First, read this basic MPI tutorial, which has a good survey of all the MPI routines. The gory details are in the MPI 1.1 standard, which includes some examples and is fairly readable for a standard (that's not saying much: most standards are hideously unreadable).

Pay particular attention to the "big eight" MPI functions:

MPI_Init and MPI_Finalize: these set up and tear down MPI. MPI_Init sets up your environment--lots of stuff, from command-line arguments to file I/O, doesn't work right until you call MPI_Init, so call it before you do anything. Also, you absolutely MUST call MPI_Finalize under every possible exit path, or many MPI implementations won't kill the other processes correctly, leaving zombies to prey on living processes. Because of these dangers, every MPI main program should start with *exactly* this code:
```
#include <mpi.h>
#include <stdlib.h> /* for atexit */
void my_exit_fn(void) {MPI_Finalize();}
int main(int argc,char *argv[])
{
	MPI_Init(&argc,&argv);
	atexit(my_exit_fn);
	... now start actual work ...
}
```
MPI_Comm_size and MPI_Comm_rank: these return your process number (your "rank"), and the number of processes (the "size") in a "communicator", which is almost always just the whole machine, called MPI_COMM_WORLD. Here's how you get your rank and size:
```
        int rank,size;
        MPI_Comm_rank(MPI_COMM_WORLD,&rank);
        MPI_Comm_size(MPI_COMM_WORLD,&size);
```
MPI_Send and MPI_Recv: these "point to point" functions just send bytes from one place to another. They're the meat and potatoes of MPI. The arguments give a contiguous list of data in memory, of a fixed length, of a given data type, being sent to a given destination rank, with any integer "tag" you like (a tag of zero works fine), on a communicator (almost always MPI_COMM_WORLD). MPI_Recv takes an "MPI_Status" pointer, where MPI can stash the actual received message length and source process. I remember the first six arguments for the send/receive calls, MPI_Send(void* buffer, int count, MPI_Datatype datatype, int processor_dest, int tag, MPI_Comm comm); using the mnemonic "Bob Can't Do Peanuts with That Crap" (no offence to Bob).
MPI_Bcast and MPI_Reduce: these "collective" functions broadcast data from one processor out to every processor, or reduce data from all processors to one "master" processor.

Those are really the only functions you learn in MPI 1.1, the rest are just small variations on those themes.

For example, here's an idiomatic MPI program where the first process sends one integer to the last process:

#include <mpi.h> /* for MPI_ functions */
#include <stdio.h> /* for printf */
#include <stdlib.h> /* for atexit */

void call_finalize(void) {MPI_Finalize();}
int main(int argc,char *argv[]) {
        MPI_Init(&argc,&argv);
        atexit(call_finalize); /*<- important to avoid weird errors! */

        int rank=0,size=1;
        MPI_Comm_rank(MPI_COMM_WORLD,&rank);
        MPI_Comm_size(MPI_COMM_WORLD,&size);

        int tag=17; /*<- random integer ID for this message exchange */
        if (rank==0) {
                int val=1234;
                MPI_Send(&val,1,MPI_INT, size-1,tag,MPI_COMM_WORLD);
                printf("Rank %d sent value %d\n",rank,val);
        }
        if (rank==size-1) {
                MPI_Status sts;
                int val=0;
                MPI_Recv(&val,1,MPI_INT, MPI_ANY_SOURCE,tag,MPI_COMM_WORLD,&sts);
                printf("Rank %d received value %d\n",rank,val);
        }
        return 0;
}

(Try this in NetRun now!)

Here's a more complex program that renders parts of the mandelbrot set on each MPI process, and assembles the pieces on rank 0:

/* 
Mandelbrot renderer in MPI 
Dr. Orion Lawlor, 2010-11-30 (Public Domain)
*/
#include <mpi.h>
#include <iostream>
#include <fstream>
#include <complex>

/**
 A linear function in 2 dimensions: returns a double as a function of (x,y).
*/
class linear2d_function {
public:
	double a,b,c;
	void set(double a_,double b_,double c_) {a=a_;b=b_;c=c_;}
	linear2d_function(double a_,double b_,double c_) {set(a_,b_,c_);}
	double evaluate(double x,double y) const {return x*a+y*b+c;}
};

	const int wid=1000, ht=1000;
	// Set up coordinate system to render the Mandelbrot Set:
	double scale=3.0/wid;
	linear2d_function fx(scale,0.0,-1.0); // returns c given pixels 
	linear2d_function fy(0.0,scale,-1.0);

char render_mset(int x,int y) {
/* Walk this Mandelbrot Set pixel */
	typedef std::complex<double> COMPLEX;
	COMPLEX c(fx.evaluate(x,y),fy.evaluate(x,y));
	COMPLEX z(0.0);
	int count;
	enum {max_count=256};
	for (count=0;count<max_count;count++) {
		z=z*z+c;
		if ((z.real()*z.real()+z.imag()*z.imag())>4.0) break;
	}
		
	return count;
}

class row {
public:
	char data[wid];
};


int main(int argc,char *argv[]) {
	/* MPI's args, MPI's random working directory */
	MPI_Init(&argc,&argv);
	/* Your command line args, your working directory */
	
	int size,rank;
	MPI_Comm_size(MPI_COMM_WORLD,&size);
	MPI_Comm_rank(MPI_COMM_WORLD,&rank);
	
	std::cout<<"I am "<<rank<<" of "<<size<<"\n";
	
	
	int procpiece=ht/size; int gystart=rank*procpiece;
	row limg[procpiece]; /* local copy of the final image */
	
	double start=MPI_Wtime();
	
	/* Render our piece of the image */
	for (int y=0;y<procpiece;y++)
	{
		for (int x=0;x<wid;x++) limg[y].data[x]=render_mset(x,gystart+y);
	}
	double elapsed_compute=MPI_Wtime()-start;
	
	int tag=12378;
	if (rank>0) 
	{ /* send our partial piece to rank 0 */
		//skt_sendN(s[0],&limg[0].data[0],sizeof(row)*procpiece);
		MPI_Send(&limg[0].data[0],sizeof(row)*procpiece,MPI_CHAR,
			0,tag,MPI_COMM_WORLD);
	}
	else
	{ /* rank 0: receive partial pieces from other ranks */
		row gimg[ht];
		for (int r=0;r<size;r++) 
		if (r==0) {
			memcpy(gimg,limg,sizeof(row)*procpiece);
		} else {
			//skt_recvN(s[r],&gimg[r*procpiece].data[0],
			//	sizeof(row)*procpiece);
			MPI_Status status;
			MPI_Recv(&gimg[r*procpiece].data[0],sizeof(row)*procpiece,MPI_CHAR,
				r,tag,MPI_COMM_WORLD,&status);
		}
		/* Print out assembled image */
		std::ofstream of("out.ppm",std::ios_base::binary);
		of<<"P5\n"; // greyscale, binary
		of<<wid<<" "<<ht<<"\n"; // image size
		of<<"255\n"; // byte image
		of.write(&gimg[0].data[0],sizeof(row)*ht);
	}
	
	double elapsed_send=MPI_Wtime()-start;
	std::cout<<"Rank "<<rank<<": "<<1000.0*elapsed_compute<<"ms compute, "<<
		1000.0*elapsed_send<<"ms total\n";
	
	MPI_Finalize();
	return 0;
}

(Try this in NetRun now!)

As of Tuesday night, you can now run MPI code in NetRun!