Semester Project: Part 1

Background Research

CS 321 Homework, Dr. Lawlor, 2006/02/03. Due Friday, Feb 10 at 5pm.

You'll be working on a single software development project all semester long, and you get to choose the topic and the scope. It should be a topic that can be scoped big enough to make sense to work on it for months; but it shouldn't be so big you can't get anything running in one semester. Figuring out the scope of a given task is probably the hardest problem in computer science! Luckily, knowing something about the topic makes it a lot easier--and that's the real point of this assignment.

I'd like you to give me one text file, including one paragraph describing your topic--generally what kind of a thing you'd like to build. See the example projects below for ideas.

In your text file, I'd like you to include two links to websites that describe projects on similar topics. For each link, include a one-sentence summary of what it does, and another sentence describing how your project will be different or better.

I'd also like you to include links to five "reference" websites, that include information you think will be specifically useful in completing the project. Don't just give me a meta reference like "http://google.com" or a website describing processor instructions or assembly programming in general--if there's not an obvious & direct linkage to your project, it doesn't count!

This assignment is not about the scope of the project. I don't want to you to start planning exactly what to do, or how you plan to do it. It's about the topic--figure out what's out there, and how you can add something. This is actually the first step you should take in any real project!

Turn in your text file by attaching it to Blackboard under "proj_bg". 30% of the grade is for the one paragraph topic description, 20% is for the links to two similar projects with short analysis, and the remaining 50% is for the five bare reference links (so they better be good links!). All seven links have to be different from each other (where "different" means at least having different domain names).

Example Projects

Choose one, or make up your own!

Write any interesting Linux or Windows kernel module, to do something useful in kernel space. You've got to figure out how to get data into the kernel module, how to do your processing, and how to get out.
Implement anything interesting using memory mapping: implement your own paged virtual memory, your own software distributed shared memory, your own bigger-than-memory disk-backed hashtable or data structure.
Implement a device driver for any operating system. If you choose to write a user-space driver, it must talk to real hardware (e.g., a USB device--google "libusb"). If you choose to write a kernel-space driver, the "device driver" can actually generate hardcoded or random data, but it must be accessible from outside the kernel (google "linux character device").
Implement your own dynamic allocation routines. These can use any style you like: first-fit like malloc, tree-based, mmap-based, etc. You might do this to add a layer of error checking to find memory leaks, detect heap corruption, etc.
Implement a "file system info" routine, to extract some useful piece of information from the on-disk layout of a filesystem of your choice. For example, you could walk the FAT table on a FAT volume, print the total used inodes in an ext2 volume, or display the size of an HFS volume. I can provide binary files with images of these filesystems for you to test your code. (Google: "FAT filesystem header", "ext2 superblock", "HFS volume header")
Implement a program or library that does something interesting with mapped memory (google "mmap").
Write a "plugin" library, to take a dynamically linked library file name, load the library, and call one of its routines. This exact interface is used by, e.g., browser plugins. (Google: "dlopen", "LoadLibrary").
Write a cross-platform kernel thread library, that can at least create and coordinate threads. You must make the library work on Linux (using pthreads, see pthread_create) and Windows (see CreateThread). This library would be useful to anyone writing cross-platform threaded code. The biggest design decision is to choose a set of synchronization primitives: you may choose to implement locks (mutual exclusion routines).
Very simple example code:
```
#include "mythread.h"
void function1(void *myData) { ... }
void function2(void *myData) { ... }
int main() {
	void *myData=...;
	mythread_start(function1,myData); 
	mythread_start(function2,myData); 
	... /* function1 and function2 should now both be running */
}
```
Write a cross-platform "start this other program alongside me" library. This is only slightly different from the standard "system" call, which suspends the caller, in that both the old and new programs would run simultaniously. This library would be useful to, for example, start a long-running network file copy via ssh, while allowing the calling program to continue running. You must make the library work on both UNIX (using fork) and Windows (using CreateProcess). Decide on how to handle command-line arguments, input/output, and how to deal with the termination of either the creating or created program.
Complete example code:
```
#include "myprocess.h"
int main() {
	myprocess_start("program1 argument1"); 
	myprocess_start("program2 argument2"); 
	... /* program1 and program2 should now both be running */
}
```
Write a deadlock-detecting lock library. The interface could be a wrapper around the lock/unlock routines of some thread library (such as pthreads). Whenever aquiring a lock would lead to deadlock, instead of hanging, your library must print an error message and abort. Be sure to handle all possible types of deadlock, including locking a lock you already hold (self-dependency), and multi-way deadlock (where A waits for B, B waits for C, and C waits for A). This library would be useful if you suspect your multithreaded code may have a deadlock problem. You can restrict yourself to only one type of synchronization primitive, such as locks, and ignore other sorts of synchronization (flags, files, etc.).
Complete example code:
```
#include "mylock.h"
int main() {
	mylock_t l; /* a lock */
	mylock_create(&l); /* initialize the lock */
	mylock_lock(&l); /* lock it */
	mylock_lock(&l); /* whoops!  Locked it twice!  Should cause error. */
}
```
Write a library to dump out the contents of some executable format. For example, you might choose to work with Windows EXE PE format executables, or UNIX 32-bit ELF executables. Make the library be able to at least print out the approximate size of the executable program code (machine code), and the size of the initialized global data (like strings). This library would be useful to estimate the memory cost of loading a program. This project will mostly consist of reading the executable format documentation and doing testing.
Example program run:
```
C:/> dir
 someprogram.exe   mydump.exe
C:/> mydump someprogram.exe
Contents of executable someprogram.exe:
   - 7132 bytes of executable code
   - 452 bytes of initialized global data.
```

Write a segfault response library. Your library should provide a way to, when a program accesses an out-of-bounds memory address, execute some client code that is passed at least the out-of-bounds address. This library would be useful for debugging programs where a regular debugger can't be used, such as on a large parallel machine or in a shipped application. For UNIX, see siginfo & signal(SIGSEGV); for Windows, see SetUnhandledExceptionFilter.

Complete example code:

#include "myfault.h"

void handleFault(void *theBadAddress) {
	fprintf(stderr,"Tried to access bad memory at %p\n",theBadAddress);
	/* could, e.g., save really important file here, 
	  or send error report out over network. */
	exit(1);
}

void doStuff(void) {
	int i, len=1000000;
	int *array=(int *)malloc(len); /* caution! len bytes, not len ints! */
	printf("Allocated array at %p\n",array);
	for (i=0;i<len;i++) array[i]=0; /* will hit bad address */
	printf("Done.\n");
}

int main() {
	myfault_handler(handleFault); /* call handleFault on segfault */
	doStuff();
}

Write a library to allow easy access to the machine's timer interrupt. For example, the library could provide a routine that will execute some client code after the specified number of microseconds have elapsed, without just waiting for that many microseconds. This could be a wrapper around the UNIX asynchronous signal interface setitimer & signal(SIGALRM) or the Windows equivalents.

Example code:

#include "mytimer.h"

class timeHandler : public mytimer_class {
public:
	int stuffDone;
	/* Runs after 2 seconds have passed. */
	virtual void execute(void) {
		if (!stuffDone) {
			printf("This stuff is taking too long!  Goodbye!\n");
			exit(1);
		}
	}
};

int main() {
	timeHandler h;
	h.stuffDone=0;
	mytimer_callafter(h,2.0); /* call h.execute after 2 seconds */
	doStuff(); /* if this takes too long, exit! */
	h.stuffDone=1;
	...
}

Write your own implementation of threads. For example, threads can be created and switched from inside ordinary code using the SYSV Unix makecontext/swapcontext routines (I also have a Windows version of these routines available if you need it); or for the truly hardcore, your own context-switching assembly code. Threads can also be simulated via a big "switch (program_counter)" statement in C/C++. Such a library would be useful to avoid some of the silly limitations of kernel threads--for example, both Linux and Windows have hard limits on the number of kernel threads you can create. You'll have to decide exactly how threads are created and switched, but you can ignore synchronization.
Complete example code:
```
#include "mythread.h"

class sequencer : public myevent_class {
public:
	const char *what; /* what we're doing */
	int n; /* number of steps it takes to do it */
	sequencer(const char *what_,int n_) {what=what_; n=n_;}
	virtual void execute(void) {
		for (i=0;i<n;i++) {
			printf("%d: step %d of %d\n",what,i,n); /* or real code here... */
			mythread_schedule(); /* run other threads if possible */
		}
	}
};

int main() {
	sequencer s1("foo",6), s2("bar",3);
	mythread_create(&s1); /* will run s1.execute() once */
	mythread_create(&s2); /* will run s2.execute() once */
	mythread_run(); /* keep running threads until they all exit */
}
```
Write a library to list the files in a directory--a directory traversal library. On UNIX, you can walk a directory using the opendir/readdir/closedir routines (in dirent.h). Under Windows, you can walk them with _findfirst, _findnext, _findclose (in io.h). You'll have to decide if you want to report just files, or files and directories--ideally, you should be able to ask for both.