Media and Nesting File Formats

CS 321 2007 Lecture, Dr. Lawlor

There's a beautiful idea inside of most media (audio, video, image) file formats. This idea keeps being re-invented at many different levels, so I'm going to approach it in a weird way--please bear with me!

IN THE BEGINNING WAS THE THINGY

So you write a small class that does some work. This is step one.

class thingy {
public:
	std::string stuff;
	void print(void) {std::cout<<"Thingy "<<this<<" contains '"<<stuff<<"'\n";}
	thingy(std::string s) :stuff(s) {}
};

int foo(void) {
	thingy *t=new thingy("lone thingy");
	t->print();
	return 0;
}

(executable NetRun link)

(Here, "thingy" can be any concrete class.)

AND IT WAS GOOD. THEN WE NEEDED SEVERAL THINGYS

Invariably, you find a situation where you need to keep track of more than one instance of this class. So you make a container class (a vector, a list, and so on) that points to each of your thingys.

class thingy {public: /* One thing */
	std::string stuff;      thingy(std::string s) :stuff(s) {}
	void print(void) {std::cout<<"Thingy "<<this<<" contains '"<<stuff<<"'\n";}
};

class tlist {public: /* A list of thingys */
	std::vector<thingy *> v;
	void print(void) {
		std::cout<<"Thingy-list "<<this<<" contains: {\n";
		for (unsigned int i=0;i<v.size();i++)
			v[i]->print();
		std::cout<<"}\n";
	}
};

int foo(void) {
	tlist *l=new tlist;
	l->v.push_back(new thingy("lone thingy"));
	l->print();
	return 0;
}

(executable NetRun link)

THEN WE NEEDED LISTS OF LISTS OF THINGYS

If you need more lists, you can continue creating containers--adding new classes to store pointers to pointers to pointers to thingys, but this gets complicated and error-prone quickly.

There's another curious and powerful solution--make a list itself be a thingy. This lets you put lists-of-thingys as an item inside the list class, which instantly gives you lists-of-lists-of-thingys, lists-of-lists-of-lists-of-...-thingys, and so on, with zero additional code.

class thingy {public: /* One thing */
	std::string stuff;
	thingy(void) {}   thingy(std::string s) :stuff(s) {}
	virtual void print(void) 
		{std::cout<<"Thingy "<<this<<" contains '"<<stuff<<"'\n";}
	virtual ~thingy() {}
};

class tlist : public thingy {public: /* A list of thingys */
	std::vector<thingy *> v;
	void print(void) {
		std::cout<<"Thingy-list "<<this<<" contains {\n";
		for (unsigned int i=0;i<v.size();i++)
			v[i]->print();
		std::cout<<"}\n";
	}
};

int foo(void) {
	tlist *m=new tlist; 
		tlist *l=new tlist; 
			l->v.push_back(new thingy("lone thingy"));
			l->v.push_back(new thingy("loner's buddy"));
		m->v.push_back(l); 
		m->v.push_back(new thingy("The other thingy"));
	m->print();
	return 0;
}

(executable NetRun link)

Note the key feature that makes this work--a list can now contain other lists, because a list is a thingy.

Places where the recursive-list paradigm is used

Directories and files. Files are thingys. Directories are actually files deep down, that contains lists of other files, some of which are marked as directories.
In the language LISP, there are two kinds of object: "atoms" like integers and characters, and lists. A list can contain other lists or atoms inside it. Everything in the language--structs, arrays, even source code--is built from lists of lists and atoms.
In many media file formats, you've got lots of little chunks of assorted information to keep track of. Sometimes one big chunk, like a header, will contain lots of little chunks. These chunks are thus one version of the recursive-list idea.

Tagged Media File Formats

Most media file formats nowdays are "tagged" formats. Basically, the file consists of a bunch of tagged pieces of data, that look like this:

A readable tag name, like "fmt ". This is almost always 4 ASCII characters long, although JPEG uses 2 binary bytes.
The length of the data to follow. This is usually a binary number, either little or big endian (Microsoft and other PC formats are little endian, Mac or UNIX formats are big-endian), and usually just 4 bytes long. This means there's a 4GB limit on these formats, such as AVI movies and TIFF images!
The data stored in the tag. What's in this portion depends on the tag.

The beautiful part about this is the "data" can be anything. You can even make up your own, new tags, put your own data in them, and your data won't mess up old programs that read your modified files, but you can still read your modified data.

The "data" portion of a tagged chunk can even contain other tagged chunks!

Tagged File Example: .wav

For example, a .wav sound file is stored in Microsoft RIFF format, a simple tagged format. Tag names are 4 ASCII bytes, and lengths are 4 little-endian bytes. Every RIFF file contains one top-level tag, the "RIFF" tag, whose length is the length of the whole file. The data part of the file's main RIFF tag consists of a 4-byte file type, followed by a list of other tagged chunks. For a .wav file, there are only two other chunks: a "fmt " (note the space!) chunk, which describes the sound data format (8 bit or 16 bit, mono or stereo or Dolby 5-channel), and the "data" chunk, which has the actual sound data.

> ./catriff ./in.wav
Tag at start of file 'RIFF'
Length of file == 70212 bytes
Tag for file type 'WAVE'
        Tag of chunk 'fmt '
        Length of chunk == 16 bytes
                -- -- -- -- -- +  -- -- -- +  -- -- -- -- -- -- 
                01 00 01 00 11 2B 00 00 11 2B 00 00 01 00 08 00 
        Tag of chunk 'data'
        Length of chunk == 70176 bytes
                t  x  y  v  w  y  w  w  w  t  t  u  s  t  v  t  
                v  x  v  v  x  w  t  v  x  x  z  y  y  y  y  y  
                z  {  }  ++ ++ ~  ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ 
                ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ 
                74 78 79 76 77 79 77 77 77 74 74 75 73 74 76 74 
                76 78 76 76 78 77 74 76 78 78 7A 79 79 79 79 79 
                7A 7B 7D 7F 7F 7E 80 82 82 87 88 89 8D 8A 8C 8B 
                8A 8D 8E 90 94 95 95 95 95 96 95 94 94 95 97 9A 
                ... skipping 70112 bytes ...

Here's a program I wrote that spits out the contents of any RIFF file. Here are some sample RIFF files, including the audio file displayed above.

Tagged File Example: .avi

AVI stands for "Audio Video Interleaved". .avi files are actually just RIFF files, formatted exactly like .wav files except using different RIFF tags. So you can run the "catriff" program on an AVI file as well:

Tag at start of file 'RIFF'
Length of file == 9234802 bytes
Tag for file type 'AVI '
	Tag of chunk 'LIST'
	Length of chunk == 326 bytes
	Tag of sublist 'hdrl'
		Tag of chunk 'avih'
		Length of chunk == 56 bytes
			++ ++ -- -- ++ ++ -- -- -- -- -- -- -- -- -- -- 
			++ -- -- -- -- -- -- -- -- -- -- -- ++ >  -- -- 
			++ -- -- -- ++ -- -- -- -- -- -- -- -- -- -- -- 
			-- -- -- -- -- -- -- -- 
			A0 86 01 00 B2 9F 0C 00 00 00 00 00 10 00 01 00 
			90 00 00 00 00 00 00 00 02 00 00 00 DD 3E 01 00 
			80 02 00 00 E0 01 00 00 00 00 00 00 00 00 00 00 
			00 00 00 00 00 00 00 00 
		Tag of chunk 'LIST'
		Length of chunk == 116 bytes
		Tag of sublist 'strl'
			Tag of chunk 'strh'
			Length of chunk == 56 bytes
				v  i  d  s  m  j  p  g  -- -- -- -- -- -- -- -- 
				-- -- -- -- ++ ++ -- -- @  B  -- -- -- -- -- -- 
				++ -- -- -- ++ >  -- -- -- '  -- -- -- -- -- -- 
				-- -- -- -- ++ -- ++ -- 
				76 69 64 73 6D 6A 70 67 00 00 00 00 00 00 00 00 
				00 00 00 00 A0 86 01 00 40 42 0F 00 00 00 00 00 
				90 00 00 00 DD 3E 01 00 10 27 00 00 00 00 00 00 
				00 00 00 00 80 02 E0 01 
			Tag of chunk 'strf'
			Length of chunk == 40 bytes
				(  -- -- -- ++ -- -- -- ++ -- -- -- -- -- -- -- 
				M  J  P  G  -- -- -- -- -- -- -- -- -- -- -- -- 
				-- -- -- -- -- -- -- -- 
				28 00 00 00 80 02 00 00 E0 01 00 00 01 00 18 00 
				4D 4A 50 47 00 10 0E 00 00 00 00 00 00 00 00 00 
				00 00 00 00 00 00 00 00 
		Tag of chunk 'LIST'
		Length of chunk == 92 bytes
		Tag of sublist 'strl'
			Tag of chunk 'strh'
			Length of chunk == 56 bytes
				a  u  d  s  -- -- -- -- -- -- -- -- -- -- -- -- 
				-- -- -- -- -- -- -- -- -- +  -- -- -- -- -- -- 
				-- l  -- -- -- +  -- -- -- '  -- -- -- -- -- -- 
				-- -- -- -- -- -- -- -- 
				61 75 64 73 00 00 00 00 00 00 00 00 00 00 00 00 
				00 00 00 00 01 00 00 00 10 2B 00 00 00 00 00 00 
				1A 6C 02 00 10 2B 00 00 10 27 00 00 01 00 00 00 
				00 00 00 00 00 00 00 00 
			Tag of chunk 'strf'
			Length of chunk == 16 bytes
				-- -- -- -- -- +  -- -- -- +  -- -- -- -- -- -- 
				01 00 01 00 10 2B 00 00 10 2B 00 00 01 00 08 00 
		Tag of chunk 'IDIT'
		Length of chunk == 26 bytes
			S  a  t     F  e  b     1  8     1  3  :  0  7  
			:  5  1     2  0  0  6  -- -- 
			53 61 74 20 46 65 62 20 31 38 20 31 33 3A 30 37 
			3A 35 31 20 32 30 30 36 0A 00 
	Tag of chunk 'LIST'
	Length of chunk == 24 bytes
	Tag of sublist 'INFO'
		Tag of chunk 'ISFT'
		Length of chunk == 12 bytes
			C  a  n  o  n  M  V  I  0  1  -- -- 
			43 61 6E 6F 6E 4D 56 49 30 31 00 00 
	Tag of chunk 'JUNK'
	Length of chunk == 1662 bytes
		-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
		-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
		-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
		-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
		00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
		00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
		00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
		00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
		... skipping 1598 bytes ...
	Tag of chunk 'LIST'
	Length of chunk == 9230202 bytes
	Tag of sublist 'movi'
		Tag of chunk '01wb'
		Length of chunk == 11024 bytes
			y  {  {  {  |  }  ++ ++ ++ ++ ++ ++ ++ ++ |  {  
			{  z  z  {  {  z  {  |  ++ ++ ++ ++ ++ ++ ++ ++ 
			++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ 
			++ ++ ++ ++ ++ }  }  |  |  {  z  }  ~  ~  ++ ++ 
			79 7B 7B 7B 7C 7D 80 81 81 81 81 80 80 7F 7C 7B 
			7B 7A 7A 7B 7B 7A 7B 7C 7F 81 81 82 84 83 83 85 
			85 86 85 86 86 85 85 85 86 86 86 86 86 86 85 83 
			82 81 82 7F 7F 7D 7D 7C 7C 7B 7A 7D 7E 7E 7F 80 
			... skipping 10960 bytes ...
		Tag of chunk '00dc'
		Length of chunk == 30282 bytes
			++ ++ ++ ++ -- -- A  V  I  1  -- -- -- -- -- -- 
			-- -- -- -- ++ ++ -- -- -- -- ++ ++ -- ++ -- -- 
			-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
			-- -- -- -- -- -- !  -- !  -- -- -- )  )  #  #  
			FF D8 FF E0 00 10 41 56 49 31 00 00 00 00 00 00 
			00 00 00 00 FF DD 00 04 00 00 FF DB 00 84 00 06 
			06 06 08 0A 08 0A 0A 0A 0A 0A 0E 0E 0E 0A 10 12 
			12 12 12 10 15 1F 21 14 21 1F 15 19 29 29 23 23 
			... skipping 30218 bytes ...
		Tag of chunk '00dc'
		Length of chunk == 29368 bytes
			++ ++ ++ ++ -- -- A  V  I  1  -- -- -- -- -- -- 
			-- -- -- -- ++ ++ -- -- -- -- ++ ++ -- ++ -- -- 
			-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
			-- -- -- -- -- -- !  -- !  -- -- -- )  )  #  #  
			FF D8 FF E0 00 10 41 56 49 31 00 00 00 00 00 00 
			00 00 00 00 FF DD 00 04 00 00 FF DB 00 84 00 06 
			06 06 08 0A 08 0A 0A 0A 0A 0A 0E 0E 0E 0A 10 12 
			12 12 12 10 15 1F 21 14 21 1F 15 19 29 29 23 23 
			... skipping 29304 bytes ...
... lots and lots of image frames later ...
	Tag of chunk 'idx1'
	Length of chunk == 2544 bytes
		0  1  w  b  -- -- -- -- -- -- -- -- -- +  -- -- 
		0  0  d  c  -- -- -- -- -- +  -- -- J  v  -- -- 
		0  0  d  c  -- -- -- -- n  ++ -- -- ++ r  -- -- 
		0  0  d  c  -- -- -- -- .  -- -- -- *  ++ -- -- 
		30 31 77 62 00 00 00 00 04 00 00 00 10 2B 00 00 
		30 30 64 63 10 00 00 00 1C 2B 00 00 4A 76 00 00 
		30 30 64 63 10 00 00 00 6E A1 00 00 B8 72 00 00 
		30 30 64 63 10 00 00 00 2E 14 01 00 2A AE 00 00 
		... skipping 2480 bytes ...

The "00dc" tags seem to be JPEG-compressed video frames ("Motion JPEG" or "MJPG"), and the "01wb" frames seem to be 8-bit mono audio samples. Note the complex header, which describes the video and audio compression format. I don't even know what the "JUNK" tag is supposed to contain, but luckily with tagged file formats unknown stuff is easy to skip!

This is an AVI captured by my digital camera, a Canon PowerShot A520. Most other digital cameras capture in approximately the same format, although the latest few seem to be moving to MPEG-4.

Virtually all media file formats have tags and byte counts like this, including MPEG (video and audio MP3), JPEG, Mac PICT, QuickTime, WMV, TIFF, and so on. About the only formats that *aren't* built completely around tags are ancient formats like BMP, PCX, or TARGA.