Basic Network Programming with Sockets
CS 441/641
Lecture, Dr. Lawlor
One can imagine lots of programming interfaces for talking to the
network, and there are in fact lots of totally different
interfaces for
talking via NetBIOS, AppleTalk, etc. But suprisingly there's
basically
only one major programming interface used for talking on a TCP/IP
network, and that's "Berkeley sockets", the original UNIX
interface as
implemented by the good folks at UC Berekeley.
The Berkeley sockets interface is implemented in:
- All flavors of UNIX, including Linux, Mac OS X, Solaris, all
BSD flavors, etc.
- Windows 95 and higher, as "winsock".
Brian Hall, or "Beej", maintains the definitive readable
introduction to Berkeley sockets programming, Beej's Guide to Network
Programming. He's got a zillion examples and a readable
style. Go there.
Sadly, bare Berkeley sockets are fairly tricky and ugly, especially
for
creating connections. The problem is Berkeley sockets support
all
sorts of other protocols, addressing modes, and other features like
"raw
sockets" (used for packet sniffers). But when I write network
code, I find it a lot easier to use my own little library of public
domain utility routines called "osl/socket.h". I'll give
examples
here using my library.
Writing UDP Code
UDP is the User Datagram Protocol. Unlike its more complex
cousin, the connection-oriented TCP, UDP is "connectionless".
This means you can just send data to an IP address at any time,
which
has some advantages and drawbacks:
- UDP itself provides no acknowledgement that the other side
even exists, is running a compatible service, or is willing to
talk. You need to build this yourself.
- UDP responds to packet loss, such as congestion, by just
dropping individual datagrams. If
you respond to dropped datagrams by losing data, that's your
problem. If you respond to dropped datagrams by
transmitting more
datagrams, it's very easy to contribute to the congestion that
might be
causing the packet loss, leading to "congestive collapse".
TCP,
by contrast, has retransmission timeouts with congestion control
backoff, so it scales much more smoothly to real networks; but
these timeouts add latency.
- UDP allows 'multicast' to the network broadcast address, so
several receivers can hear data from one transmitter.
Generally, UDP is more like a postcard (inherently one-way
unreliable
shot in the dark); while TCP is more like a telephone connection
(two-way reliable connection).
My library uses a few funny datatypes to represent the
communication:
- SOCKET: datatype for a "socket", which represents one end of a
network connection. This is actually implemented as just
an int.
- skt_ip_t:
datatype for an IP address. It's just 4 bytes for
IPv4. The
underlying sockets code uses a sockaddr_in for this.
Here I'm using my library to set up a UDP ("datagram") socket, and
then
I use that socket to send and receive data. Since UDP is
connectionless, there's no real problem sending data to myself.
#include "osl/socket.h"
#include "osl/socket.cpp"
int foo(void) {
skt_ip_t ip=skt_lookup_ip("127.0.0.1");
unsigned int sport=37331, dport=37331;
SOCKET s=skt_datagram(&sport,70000); /* set port number. 70000 is the kernel data buffer size. */
{ /* Send UDP packet to ip */
struct sockaddr_in sin=skt_build_addr(ip,dport);
const char *data="There are many like it, but this is my message.";
sendto(s,data,strlen(data),0,
(const struct sockaddr *)&sin,sizeof(sin));
}
printf("Sent off data. Receiving:\n");
{ /* Receive one UDP packet */
struct sockaddr_in sout; socklen_t soutL=sizeof(sout);
int len=1000;
char *data=new char[len];
int n=recvfrom(s,data,len,0,
(struct sockaddr *)&sout,&soutL);
printf("Incoming data of %d bytes from addr len %d:\n",
n,soutL);
if (n>0 && n<len) {
data[n]=0;
printf(" Data: '%s'\n",data);
}
}
return 0;
}
(Try
this in NetRun now!)
Try this!
You can also separate the send and receive sides of this program,
and
send UDP packets from one machine to another machine. If you
try
this among several machines, you'll quickly realize an annoying
problem: network firewalls often filter UDP, especially incoming UDP
packets. Working around firewalls is a fact of life.
Writing TCP Code
TCP has several advantages over UDP:
- TCP is reliable and stream-oriented. This means you can
issue one TCP send with a half gig of data, and the kernel will
guarantee that either the data will make it to the other end
perfectly,
or it will return you an error. UDP, by contrast, has a
maximum
packet size of 64KiB, and even that might not make it.
- TCP allows bidirectional communication through NAT firewalls,
since the firewall sees the connection
being set up and acknowledged. This only works for
*outgoing* TCP
connections (client inside firewall, server outside firewall),
since
the client initiates the communication. This doesn't work
with
most forms of UDP because when some bozo UDP packet shows up
from the
internet uninvited, the firewall has no confirmation that it's a
reply
packet.
To connect to a server "serverName" at TCP port 80, and send some
data to it, you'd call:
- skt_ip_t ip=skt_lookup_ip(serverName); to look up the
server's IP address.
- SOCKET s=skt_connect(ip,80,2); to connect to that
server. "80" is the TCP port number. "2" is the
timeout
in seconds.
- skt_sendN(s,"hello",5);
to send the 5-byte string "hello" to the other side. You
can now
repeatedly send and receive data with the other side.
- skt_close(s); to close the socket afterwards.
Here's an example TCP client in NetRun:
#include "osl/socket.h" /* <- Dr. Lawlor's funky networking library */
#include "osl/socket.cpp"
int foo(void) {
skt_ip_t ip=skt_lookup_ip("127.0.0.1");
unsigned int port=80;
SOCKET s=skt_connect(ip,port,2);
skt_sendN(s,"hello",5);
skt_close(s);
return 0;
}
(executable
NetRun link)
Easy, right? The same program is a great deal longer in pure
Berkeley sockets, since you've got to deal with error handling (and
not
all errors are fatal!), a long and complicated address setup
process,
etc.
This same code works in Windows, too. On NetRun, "Download
this
file as a .tar archive" to get the socket.h and socket.cpp files.
To listen on a socket, you create a server socket and then accept
connections from incoming clients. This program accepts
exactly
one client, but you typically have a loop (or multiple threads)
between
accept and close, to keep accepting clients indefinitely.
#include "osl/socket.h"
#include "osl/socket.cpp" /* include body for easy linking */
int foo(void)
{
unsigned int port=8888;
SERVER_SOCKET serv=skt_server(&port);
std::cout<<"Waiting for connections on port "
<<port<<"\n";
skt_ip_t client_ip; unsigned int client_port;
SOCKET s=skt_accept(serv,&client_ip,&client_port);
std::cout<<"Connection from "
<<skt_print_ip(client_ip)
<<":"<<client_port<<"!\n";
/* Receive some data from the client */
std::string buf(3,'?');
skt_recvN(s,(char *)&buf[0],3);
std::cout<<"Client sent data '"<<buf<<"'\n";
/* Send some data back to the client */
skt_sendN(s,"gdaymate\n",9);
skt_close(s);
std::cout<<"Closed socket to client\n";
skt_close(serv);
return 0;
}
(executable
NetRun link)
If you're on campus, if you're fast enough you can actually connect
to this server from a web browser!
Example Protocol: HTTP
HTTP
is
the protocol used by web pages (that's why URLs usually start with
"http://"). HTTP servers listen on port 80 by default, but you
can actually use the :port syntax to connect to any port you like
(for
example, "https://lawlor.cs.uaf.edu:8888/some_url").
An HTTP client, like a web browser, starts by doing a DNS lookup on
the
server name. That's the "resolving host name" message you see
in
your browser. The browser then does a TCP connection to that
port
on the server ("Connecting to server").
Once connected, the HTTP client usually sends a "GET" request.
Here's the simplest possible GET request:
"GET / HTTP/1.0\r\n\r\n"
Note the DOS newlines, and the extra newline at the end of the
request. You can list a bunch of optional data in your
GET
request, like the languages you're willing to accept
("Accept-Language:
en-us\r\n") and so on. HTTP 1.1 (not 1.0) requires a Host to
be
listed in the request ("Host: www.foobar.com\r\n"), which is used by
virtual hosts.
The HTTP server then sends back some sort of reply.
Officially,
this is supposed to be a "HTTP/1.1 200 OK\r\n" followed by another
set
of line-oriented ASCII optional data, such as the Content-Length in
bytes ("Content-Length: 187\r\n"). But many browsers will
print
out plain ASCII text if you just return that.
Here's an example of a real HTTP exchange between Firefox and
Apache:
Firefox connects to server. Apache accepts the connection.
Firefox, the client, sends this ASCII data, with DOS newlines:
GET /my_name_is_url.html HTTP/1.1
Host: lawlor.cs.uaf.edu:8888
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.9) Gecko/20070126 Ubuntu/dapper-security Firefox/1.5.0.9
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Cookie: __utmz=62224958.1163103248.1.1.utmccn=(direct)|utmcsr=(direct)|utmcmd=(none); __utma=62224958.570638686.1163103248.1163107343.1164832326.3
<- blank line at end of HTTP request headers
Apache, the server, sends this data back:
HTTP/1.1 200 OK
Date: Fri, 06 Apr 2007 20:20:50 GMT
Server: Apache/2.0.55 (Ubuntu)
Accept-Ranges: bytes
Content-Length: 9443
Connection: close
Content-Type: text/html; charset=UTF-8
<- blank line at end of HTTP response headers
<html><head><title>UAF Department of ... rest of web page, total of 9443 bytes after blank line
This is a pretty simple, ASCII-based protocol. The only binary
data is the contents of the web resource, transmitted by the server
after the blank line. The "Content-Length:" field tells the
client how many bytes to expect.
A typical very simple custom web client might look like this:
#include "osl/socket.h"
#include "osl/socket.cpp"
int foo(void) {
skt_ip_t ip=skt_lookup_ip("137.229.25.247"); // lawlor.cs.uaf.edu
SOCKET s=skt_connect(ip,80,2);
/* Send off HTTP request to server (with URL) */
const char *req=
"GET / HTTP/1.1\r\n" // the "/" is the URL I'm requesting
"Host: lawlor.cs.uaf.edu\r\n" // hostname is required for HTTP 1.1
"User-Agent: Raw socket example code (lawlor@alaska.edu)\r\n" // web browser ID string
"\r\n"; // blank line == end of HTTP request
skt_sendN(s,req,strlen(req));
/* Receive HTTP response headers, up to the newline */
std::string response;
int length=0;
while ((response=skt_recv_line(s))!="")
{
std::cout<<response<<"\n";
if (response.substr(0,15)=="Content-Length:")
length=atoi(response.substr(16).c_str());
}
/* Receive HTTP response data, and print it */
std::cout<<"-- bottom line: "<<length<<" bytes of data\n";
if (length>0 && length<10000) { // sanity check
std::vector<char> page(length); // place to store data
skt_recvN(s,&page[0],length); // grab data from server
for (int i=0;i<length;i++) std::cout<<page[i]; // print to screen
}
skt_close(s);
return 0;
}
A typical similarly simplified web server might look like
this. Note it just keeps being a webserver forever:
#include "osl/socket.h"
#include "osl/socket.cpp"
int foo(void) {
/* Make a TCP socket listening on port 8888 */
unsigned int port=8888;
SERVER_SOCKET serv=skt_server(&port);
/* Keep servicing clients */
while (1) {
std::cout<<"Waiting for connections on port "<<port<<"\n";
skt_ip_t client_ip; unsigned int client_port;
SOCKET s=skt_accept(serv,&client_ip,&client_port);
std::cout<<"Connection from "<<skt_print_ip(client_ip)<<":"<<client_port<<"!\n";
// Grab HTTP request line, typically GET /url HTTP/1.1
std::string req=skt_recv_line(s);
// Grab rest of HTTP header info (mostly useless)
std::string hdr;
while ((hdr=skt_recv_line(s))!="") std::cout<<"Client header: "<<hdr<<"\n";
// Prepare HTTP response header
std::string page="<html><body>IT WORKED!</body></html>";
char response[1024];
sprintf(response, // needed to get the page length into the string
"HTTP/1.1 200 OK\r\n"
"Server: Random example code (lawlor@alaska.edu)\r\n"
"Content-Length: %d\r\n"
"\r\n" // blank line: end of header
,(int)page.length());
skt_sendN(s,&response[0],strlen(response));
skt_sendN(s,&page[0],page.length());
skt_close(s);
}
return 0;
}
(Try
this in NetRun now!)
Note that using these for real production servers sounds like a
terrible idea, but they do work for simple testing.