Network Socket Communication
CS 463
Lecture, Dr. Lawlor
One can imagine lots of programming interfaces for talking to the
network, and there are in fact lots of totally different
interfaces for talking via NetBIOS, AppleTalk, etc. But
suprisingly there's basically only one major programming interface
used for talking on a TCP/IP network, and that's "Berkeley
sockets", the original UNIX interface as implemented by the good
folks at UC Berekeley.
The Berkeley sockets interface is implemented in:
- All flavors of UNIX, including Linux, Mac OS X, Solaris, all
BSD flavors, etc.
- Windows 95 and higher, as "winsock".
Brian Hall, or "Beej", maintains the definitive readable
introduction to Berkeley sockets programming, Beej's Guide to Network
Programming. He's got a zillion examples and a readable
style. Go there.
Sadly, bare Berkeley sockets are fairly tricky and ugly, especially
for creating connections. The problem is Berkeley sockets
support all sorts of other protocols, addressing modes, and other
features like "raw sockets" (used for packet sniffers). But
when I write network code, I find it a lot easier to use my own
little library of public domain utility routines called
"osl/socket.h". I'll give examples here using my library; feel
free to read up on the underlying socket calls if you like.
Writing UDP Code
UDP is the User Datagram Protocol. Unlike its more complex
cousin, the connection-oriented TCP, UDP is "connectionless".
This means you can just send data to an IP address at any time,
which has some advantages and drawbacks:
- UDP itself provides no acknowledgement that the other side
even exists, is running a compatible service, or is willing to
talk. You need to build this yourself.
- UDP responds to packet loss, such as congestion, by just
dropping individual datagrams. If you respond to dropped
datagrams by losing data, that's your problem. If you
respond to dropped datagrams by transmitting more datagrams,
it's very easy to contribute to the congestion that might be
causing the packet loss, leading to "congestive collapse".
TCP, by contrast, has retransmission timeouts with congestion
control backoff, so it scales much more smoothly to real
networks; but these timeouts add latency.
- UDP allows 'multicast' to the network broadcast address, so
several receivers can hear data from one transmitter.
Generally, UDP is more like a postcard (inherently one-way
unreliable shot in the dark); while TCP is more like a telephone
connection (two-way reliable connection).
My library uses a few funny datatypes to represent the
communication:
- SOCKET: datatype for a "socket", which represents one end of a
network connection. This is actually implemented as just
an int.
- skt_ip_t: datatype for an IP address. It's just 4 bytes
for IPv4. The underlying sockets code uses a sockaddr_in
for this.
Here I'm using my library to set up a UDP ("datagram") socket, and
then I use that socket to send and receive data. Since UDP is
connectionless, there's no real problem sending data to myself.
#include "osl/socket.h"
#include "osl/socket.cpp"
int foo(void) {
skt_ip_t ip=skt_lookup_ip("127.0.0.1");
unsigned int sport=37331, dport=37331;
SOCKET s=skt_datagram(&sport,70000); /* set port number. 70000 is the kernel data buffer size. */
{ /* Send UDP packet to ip */
struct sockaddr_in sin=skt_build_addr(ip,dport);
const char *data="There are many like it, but this is my message.";
sendto(s,data,strlen(data),0,
(const struct sockaddr *)&sin,sizeof(sin));
}
printf("Sent off data. Receiving:\n");
{ /* Receive one UDP packet */
struct sockaddr_in sout; socklen_t soutL=sizeof(sout);
int len=1000;
char *data=new char[len];
int n=recvfrom(s,data,len,0,
(struct sockaddr *)&sout,&soutL);
printf("Incoming data of %d bytes from addr len %d:\n",
n,soutL);
if (n>0 && n<len) {
data[n]=0;
printf(" Data: '%s'\n",data);
}
}
return 0;
}
(Try
this in NetRun now!)
Try this!
You can also separate the send and receive sides of this program,
and send UDP packets from one machine to another machine. If
you try this among several machines, you'll quickly realize an
annoying problem: network firewalls often filter UDP, especially
incoming UDP packets. Working around firewalls is a fact of
life.
Writing TCP Code
TCP has several advantages over UDP:
- TCP is reliable and stream-oriented. This means you can
issue one TCP send with a half gig of data, and the kernel will
guarantee that either the data will make it to the other end
perfectly, or it will return you an error. UDP, by
contrast, has a maximum packet size of 64KiB, and even that
might not make it.
- TCP allows bidirectional communication through NAT firewalls,
since the firewall sees the connection being set up and
acknowledged. This only works for *outgoing* TCP
connections (client inside firewall, server outside firewall),
since the client initiates the communication. This doesn't
work with most forms of UDP because when some bozo UDP packet
shows up from the internet uninvited, the firewall has no
confirmation that it's a reply packet.
To connect to a server "serverName" at TCP port 80, and send some
data to it, you'd call:
- skt_ip_t ip=skt_lookup_ip(serverName); to look up the
server's IP address.
- SOCKET s=skt_connect(ip,80,2); to connect to that
server. "80" is the TCP port number. "2" is the
timeout in seconds.
- skt_sendN(s,"hello",5); to send the 5-byte string
"hello" to the other side. You can now repeatedly send and
receive data with the other side.
- skt_close(s); to close the socket afterwards.
Here's an example TCP client in NetRun:
#include "osl/socket.h" /* <- Dr. Lawlor's funky networking library */
#include "osl/socket.cpp"
int foo(void) {
skt_ip_t ip=skt_lookup_ip("127.0.0.1");
unsigned int port=80;
SOCKET s=skt_connect(ip,port,2);
skt_sendN(s,"hello",5);
skt_close(s);
return 0;
}
(executable
NetRun link)
Easy, right? The same program is a great deal longer in pure
Berkeley sockets, since you've got to deal with error handling (and
not all errors are fatal!), a long and complicated address setup
process, etc.
This same code works in Windows, too. On NetRun, "Download
this file as a .tar archive" to get the socket.h
and socket.cpp
files.
To listen on a socket, you create a server socket and then accept
connections from incoming clients. This program accepts
exactly one client, but you typically have a loop (or multiple
threads) between accept and close, to keep accepting clients
indefinitely.
#include "osl/socket.h"
#include "osl/socket.cpp" /* include body for easy linking */
int foo(void)
{
unsigned int port=8888;
SERVER_SOCKET serv=skt_server(&port);
std::cout<<"Waiting for connections on port "
<<port<<"\n";
skt_ip_t client_ip; unsigned int client_port;
SOCKET s=skt_accept(serv,&client_ip,&client_port);
std::cout<<"Connection from "
<<skt_print_ip(client_ip)
<<":"<<client_port<<"!\n";
/* Receive some data from the client */
std::string buf(3,'?');
skt_recvN(s,(char *)&buf[0],3);
std::cout<<"Client sent data '"<<buf<<"'\n";
/* Send some data back to the client */
skt_sendN(s,"gdaymate\n",9);
skt_close(s);
std::cout<<"Closed socket to client\n";
skt_close(serv);
return 0;
}
(executable
NetRun link)
If you're on campus (so the firewall will let port 8888 traffic
through), and you're fast enough (NetRun times out after 2 seconds),
you can actually connect to this server from a web browser, at http://sandy.cs.uaf.edu:8888/.
Note we're just sending back bare data without an HTTP header, so
some browsers (like Chrome) won't accept this; Firefox seems to
work.
There are a few other caveats to server sockets:
- Only one program can listen on a given TCP port (like web port
80) at a time. This is because the OS needs to know where
to route incoming connections. The standard solution is to
have one front-end server that hands requests off to whatever
backend is needed--a web server like apache can be configured to
do redirections like this.
- On UNIX systems, only root can create servers on ports with
numbers below 1024, as a way of preventing ordinary users from
running rogue web or email servers. You can work
around this by running the program as root, such as via
setuid (be sure to switch to a less powerful user as soon as the
port is open), or using "sudo setcap cap_net_bind_service=+ep
<yourprogram>".
Example Protocol: HTTP
HTTP
is the protocol used by web pages (that's why URLs usually start
with "http://"). HTTP servers listen on port 80 by default,
but you can actually use the :port syntax to connect to any port you
like (for example, "https://lawlor.cs.uaf.edu:8888/some_url").
An HTTP client, like a web browser, starts by doing a DNS lookup on
the server name. That's the "resolving host name" message you
see in your browser. The browser then does a TCP connection to
that port on the server ("Connecting to server").
Once connected, the HTTP client usually sends a "GET" request (or
sometimes "POST" if there's form data to upload). Here's the
simplest possible GET request:
"GET / HTTP/1.0\r\n\r\n"
Note the DOS \r\n newlines, and the extra newline at the end of the
request--a blank line signifies the end of the headers. You
can list a bunch of optional data in your GET request, like
the languages you're willing to accept ("Accept-Language:
en-us\r\n") and so on. HTTP 1.1 (not 1.0) requires a Host to
be listed in the request ("Host: www.foobar.com\r\n"), which is used
by virtual hosts.
The HTTP server then sends back some sort of reply.
Officially, this is supposed to be a "HTTP/1.1 200 OK\r\n" followed
by another set of line-oriented ASCII optional data, such as the
Content-Length in bytes ("Content-Length: 187\r\n"). But many
browsers will print out plain ASCII text if you just return that.
Here's an example of a real HTTP exchange between Firefox and
Apache:
Firefox connects to server. Apache accepts the connection.
Firefox, the client, sends this ASCII data, with DOS newlines:
GET /my_name_is_url.html HTTP/1.1
Host: lawlor.cs.uaf.edu:8888
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.9) Gecko/20070126 Ubuntu/dapper-security Firefox/1.5.0.9
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Cookie: __utmz=62224958.1163103248.1.1.utmccn=(direct)|utmcsr=(direct)|utmcmd=(none); __utma=62224958.570638686.1163103248.1163107343.1164832326.3
<- blank line at end of HTTP request headers
Apache, the server, sends this data back:
HTTP/1.1 200 OK
Date: Fri, 06 Apr 2007 20:20:50 GMT
Server: Apache/2.0.55 (Ubuntu)
Accept-Ranges: bytes
Content-Length: 9443
Connection: close
Content-Type: text/html; charset=UTF-8
<- blank line at end of HTTP response headers
<html><head><title>UAF Department of ... rest of web page, total of 9443 bytes after blank line
This is a pretty simple, ASCII-based protocol. The only binary
data is the contents of the web resource, transmitted by the server
after the blank line. The "Content-Length:" field tells the
client how many bytes to expect.
A typical very simple custom web client might look like this:
#include "osl/socket.h"
#include "osl/socket.cpp"
int foo(void) {
skt_ip_t ip=skt_lookup_ip("137.229.25.247"); // lawlor.cs.uaf.edu
SOCKET s=skt_connect(ip,80,2);
/* Send off HTTP request to server (with URL) */
const char *req=
"GET / HTTP/1.1\r\n" // the "/" is the URL I'm requesting
"Host: lawlor.cs.uaf.edu\r\n" // hostname is required for HTTP 1.1
"User-Agent: Raw socket example code (lawlor@alaska.edu)\r\n" // web browser ID string
"\r\n"; // blank line == end of HTTP request
skt_sendN(s,req,strlen(req));
/* Receive HTTP response headers, up to the newline */
std::string response;
int length=0;
while ((response=skt_recv_line(s))!="")
{
std::cout<<response<<"\n";
if (response.substr(0,15)=="Content-Length:")
length=atoi(response.substr(16).c_str());
}
/* Receive HTTP response data, and print it */
std::cout<<"-- bottom line: "<<length<<" bytes of data\n";
if (length>0 && length<10000) { // sanity check
std::vector<char> page(length); // place to store data
skt_recvN(s,&page[0],length); // grab data from server
for (int i=0;i<length;i++) std::cout<<page[i]; // print to screen
}
skt_close(s);
return 0;
}
A typical similarly simplified web server might look like
this. Note it just keeps being a webserver forever:
#include "osl/socket.h"
#include "osl/socket.cpp"
int foo(void) {
/* Make a TCP socket listening on port 8888 */
unsigned int port=8888;
SERVER_SOCKET serv=skt_server(&port);
/* Keep servicing clients */
while (1) {
std::cout<<"Waiting for connections on port "<<port<<"\n";
skt_ip_t client_ip; unsigned int client_port;
SOCKET s=skt_accept(serv,&client_ip,&client_port);
std::cout<<"Connection from "<<skt_print_ip(client_ip)<<":"<<client_port<<"!\n";
// Grab HTTP request line, typically GET /url HTTP/1.1
std::string req=skt_recv_line(s);
// Grab rest of HTTP header info (mostly useless)
std::string hdr;
while ((hdr=skt_recv_line(s))!="") std::cout<<"Client header: "<<hdr<<"\n";
// Prepare HTTP response header
std::string page="<html><body>IT WORKED!</body></html>";
char response[1024];
sprintf(response, // needed to get the page length into the string
"HTTP/1.1 200 OK\r\n"
"Server: Random example code (lawlor@alaska.edu)\r\n"
"Content-Length: %d\r\n"
"\r\n" // blank line: end of header
,(int)page.length());
skt_sendN(s,&response[0],strlen(response));
skt_sendN(s,&page[0],page.length());
skt_close(s);
}
return 0;
}
(Try
this in NetRun now!)
Note that using these for real production servers sounds like a
terrible idea, but they do work for simple testing.
Command Line Socket Fun
No web browser? "telnet lawlor.cs.uaf.edu 80" and manually
type in an HTTP request like "GET / HTTP/1.0".
No web server? "nc
-l 8888" and manually type out HTTP response headers.
Need to just grab a web page from the command line?
wget https://lawlor.cs.uaf.edu/
Want to see the SSL certificate verification process?
openssl s_client -connect
www.google.com:443