Practical Problems in Network Communication, and PUP
CS 441/641 Lecture, Dr. Lawlor
Consider two processes exchanging data over a network socket. To
make it easy to run, I'll start both processes myself, using fork.
#include <sys/wait.h> /* for wait() */
#include "osl/socket.h"
#include "osl/socket.cpp"
/* Run child process's code. Socket connects to parent. */
void run_child(SOCKET s) {
cout<<"Child alive! Sending data to parent."<<std::endl;
std::string str="Cybertron";
skt_sendN(s,&str,sizeof(str));
}
/* Run parent process's code. Socket connects to child */
void run_parent(SOCKET s) {
cout<<"Parent alive! Getting data from child"<<std::endl;
std::string str="";
skt_recvN(s,&str,sizeof(str));
cout<<"Parent done. got="<<str<<std::endl;
}
int foo(void) {
unsigned int port=12345; cout.flush();
int newpid=fork();
if (newpid!=0) { /* I'm the parent */
SERVER_SOCKET serv=skt_server(&port);
SOCKET s=skt_accept(serv,0,0); /* connection from child */
run_parent(s);
skt_close(s);
int status=0;
wait(&status); /* wait for child to finish */
} else { /* I'm the child */
SOCKET s=skt_connect(skt_lookup_ip("127.0.0.1"),port,1); /* connect to parent */
usleep(1000); /* slow down child, to avoid corrupted cout! */
run_child(s);
skt_close(s);
exit(0); /* close out child process when done */
}
return 0;
}
(Try this in NetRun now!)
Drat! This crashes! Yet it works fine if we send and
receive integers, floats, or simple flat classes. The problem
with sending a std::string (or std::vector, or map, etc) is this basic
fact:
You can't send pointers over the network.
The problem is my pointer is a reference to a place in my memory.
If we've each got our own separate memory, then you dereferencing my
pointer is not going to work--the best you could hope for is a crash.
And inside a std::string is a pointer to the data. On my machine,
this pointer is "basic_string::_M_dataplus._M_p", in the nearly
unreadable
/usr/include/c++/4.4.3/bits/basic_string.h. Inside
std::vector? Also a pointer.
This is really annoying, because real applications use complicated data
structures like std::vector<std::string> all over the place, and
you'd like to just send them, not break them up into little sendable
pointer-free pieces.
For example, here's one correct way to send a string: first send the length, then send the data.
// Send side:
std::string str="Cybertron";
int length=str.length();
skt_sendN(s,&length,sizeof(length)); // OK because length is an integer
skt_sendN(s,&str[0],length); // OK because now we're sending the string *data*
// Receive side:
std::string str="";
int length=0;
skt_recvN(s,&length,sizeof(length)); // OK because length is an integer
str.resize(length);
skt_recvN(s,&str[0],length); // OK because we reallocated the string
(Try this in NetRun now!)
This works fine, but:
- It's error-prone, because a mismatch between the send and receive sides results in an error at runtime.
- It's ugly, because we need to explicitly break up every object being sent into its consitutent parts.
- It's slow, because we do multiple tiny send operations instead of
one big send. Often this results in lots of tiny network packets,
which is very bad for performance.
PUP: A New Hope
Luckily, there's a design pattern that solves all these issues called
"pup", which stands for pack/unpack. The basic idea is a single
function named "pup" can both send or
receive an object. Since it's the same code on both sides, a
mismatch between send and receive is much harder, fixing (1). We
fix (2) by using "structural recursion"
to break up complex compound objects: for an object A with parts B and
C, A's pup function just calls B's pup function first, then calls C's
pup function. For example:
class A {
B b;
C c;
public:
...
friend void pup(...,A &a) {
pup(...,a.b);
pup(...,a.c);
}
};
Here's a complete example:
/*********** network pup "library" code ****************/
/* Sends data out across the network immediately */
class send_PUPer {
SOCKET s;
public:
send_PUPer(SOCKET s_) :s(s_) {}
// The global "pup" function just sends basic types across the network.
friend void pup(send_PUPer &p,int &v) { skt_sendN(p.s,&v,sizeof(v)); }
friend void pup(send_PUPer &p,char &v) { skt_sendN(p.s,&v,sizeof(v)); }
// and so on for bool, float, etc. You can convert to network byte order too!
};
/* Receives data from the network immediately */
class recv_PUPer {
SOCKET s;
public:
recv_PUPer(SOCKET s_) :s(s_) {}
// The global "pup" function just sends basic types across the network.
friend void pup(recv_PUPer &p,int &v) { skt_recvN(p.s,&v,sizeof(v)); }
friend void pup(recv_PUPer &p,char &v) { skt_recvN(p.s,&v,sizeof(v)); }
// and so on for bool, float, etc. You can convert to network byte order too!
};
// Explain how to pup a std::string.
// This is a little mind-bending, since the same code is used for both send and recv.
template <class PUPer>
void pup(PUPer &p,std::string &v) {
int length=v.length(); // send: actual length. recv: initial length.
pup(p,length);
v.resize(length); // send: does nothing. recv: reallocates array
for (int i=0;i<length;i++) pup(p,v[i]);
}
/************ user code ***********/
/* Run child process's code. Socket connects to parent. */
void run_child(SOCKET s) {
send_PUPer p(s);
cout<<"Child alive! Sending data to parent."<<std::endl;
std::string str="Cybertron";
pup(p,str);
}
/* Run parent process's code. Socket connects to child */
void run_parent(SOCKET s) {
recv_PUPer p(s);
cout<<"Parent alive! Getting data from child"<<std::endl;
std::string str="";
pup(p,str);
cout<<"Parent done. got="<<str<<std::endl;
}
(Try this in NetRun now!)
For a bare string, this isn't very convincing. Let's add std::vector support.
// Explain how to pup a std::vector. This just recurses to the element pup functions,
// so we can automatically pup std::vector<int>, std::vector<string>, std::vector<std::vector<char>>, etc.
template <class PUPer,typename T>
void pup(PUPer &p,std::vector<T> &v) {
int length=v.size(); // send: actual size. recv: initial size.
pup(p,length);
v.resize(length); // send: does nothing. recv: reallocates storage
for (int i=0;i<length;i++) pup(p,v[i]); // might even be recursive!
}
/************ user code ***********/
/* Run child process's code. Socket connects to parent. */
void run_child(SOCKET s) {
send_PUPer p(s);
cout<<"Child alive! Sending data to parent."<<std::endl;
std::vector<std::string> strs;
strs.push_back("Cybertron"); strs.push_back("G6");
pup(p,strs);
}
/* Run parent process's code. Socket connects to child */
void run_parent(SOCKET s) {
recv_PUPer p(s);
cout<<"Parent alive! Getting data from child"<<std::endl;
std::vector<std::string> strs;
pup(p,strs);
for (unsigned int i=0;i<strs.size();i++)
cout<<"got="<<strs[i]<<std::endl;
}
(Try this in NetRun now!)
OK, but what if we want to do one big send instead of dozens of smaller
sends? We just need a new PUPer that accumulates data before
sending it.
/* Sends data out across the network in one big block */
class send_delayed_PUPer {
std::vector<char> data;
public:
send_delayed_PUPer() {}
// The global "pup" function just accumulates our data.
friend void pup(send_delayed_PUPer &p,char &v) { p.data.push_back(v); }
friend void pup(send_delayed_PUPer &p,int &v) {
int i=p.data.size(); // store our old end
p.data.resize(i+sizeof(int)); // make room for one more int
*(int *)&(p.data[i]) = v; // write new value into data buffer
}
// and so on for bool, float, etc.
// send off all our buffered data, and clear it
void send(SOCKET s) {
skt_sendN(s,&data[0],data.size());
data.resize(0);
}
};
... everything else as before ...
/************ user code ***********/
/* Run child process's code. Socket connects to parent. */
void run_child(SOCKET s) {
send_delayed_PUPer p;
cout<<"Child alive! Sending data to parent."<<std::endl;
std::vector<std::string> strs;
strs.push_back("Cybertron"); strs.push_back("G6");
pup(p,strs); // accumulates data locally
p.send(s); // sends across network
}
(Try this in NetRun now!)
Things I've done with PUPers include:
- Total up the size of the needed data buffer, allocate once, make
a second pass to copy the data in, and then send it. This is the
fastest way to send network data.
- Read and write the objects from disk, using a read_PUPer and write_PUPer.
- Monitor objects for changes, using a checksum_PUPer.
- Randomly inject bit errors (to determine cosmic ray error tolerance), using a fault_injection_PUPer.
- Convert C++ objects to and from XML or JSON serialized representations.
- Build an HTML web page listing the object values, and allow browser-based changes to any object field.
It's a surprisingly flexible trick!
Structural Recursion in JavaScript
Most scripting languages don't need a design idiom like "pup" because
the language allows you to loop over the pieces of any object.
For example, here's structural recursion in JavaScript:
/* This recursive function dumps everything inside v */
function printIt(v) {
if (typeof v === "object")
{ /* it's got subobjects */
print("{");
for (f in v) { /* loop over the fields in the object */
print(f+":"); /* print string name of object subfield */
var newV=v[f]; /* extract object subfield's value */
printIt(newV); /* structural recursion! */
}
print("}");
}
else { /* it's a primitive type, like int or string */
print(v+",\n");
}
}
/* Build a complicated object */
var d = {x:3, y:4};
d.woggle={clanker:"ping", z:8};
/* Print it */
printIt(d);
(Try this in NetRun now!)
Like pup, this approach allows you to disassemble and reassemble
arbitrarily complex objects. But unlike C++, no per-object
support is needed.