Media and Nesting File Formats
CS 321 2007 Lecture, Dr. Lawlor
There's a beautiful idea inside of most media (audio, video, image)
file formats. This idea keeps being re-invented at many different
levels, so I'm going to approach it in a weird way--please bear with me!
IN THE BEGINNING WAS THE THINGY
So you write a small class that does some work. This is step one.
class thingy {
public:
std::string stuff;
void print(void) {std::cout<<"Thingy "<<this<<" contains '"<<stuff<<"'\n";}
thingy(std::string s) :stuff(s) {}
};
int foo(void) {
thingy *t=new thingy("lone thingy");
t->print();
return 0;
}
(executable NetRun link)
(Here, "thingy" can be any concrete class.)
AND IT WAS GOOD. THEN WE NEEDED SEVERAL THINGYS
Invariably, you find a situation where you need to keep track of more
than one instance of this class. So you make a container class (a vector, a list,
and so on) that points to each of your thingys.
class thingy {public: /* One thing */
std::string stuff; thingy(std::string s) :stuff(s) {}
void print(void) {std::cout<<"Thingy "<<this<<" contains '"<<stuff<<"'\n";}
};
class tlist {public: /* A list of thingys */
std::vector<thingy *> v;
void print(void) {
std::cout<<"Thingy-list "<<this<<" contains: {\n";
for (unsigned int i=0;i<v.size();i++)
v[i]->print();
std::cout<<"}\n";
}
};
int foo(void) {
tlist *l=new tlist;
l->v.push_back(new thingy("lone thingy"));
l->print();
return 0;
}
(executable NetRun link)
THEN WE NEEDED LISTS OF LISTS OF THINGYS
If you need more lists, you can continue creating containers--adding new classes
to store pointers to pointers to pointers to thingys, but this gets
complicated and error-prone quickly.
There's another curious and powerful
solution--make a list itself be
a thingy. This lets you put lists-of-thingys as an item inside the
list class, which instantly gives you lists-of-lists-of-thingys, lists-of-lists-of-lists-of-...-thingys,
and so on, with zero additional code.
class thingy {public: /* One thing */
std::string stuff;
thingy(void) {} thingy(std::string s) :stuff(s) {}
virtual void print(void)
{std::cout<<"Thingy "<<this<<" contains '"<<stuff<<"'\n";}
virtual ~thingy() {}
};
class tlist : public thingy {public: /* A list of thingys */
std::vector<thingy *> v;
void print(void) {
std::cout<<"Thingy-list "<<this<<" contains {\n";
for (unsigned int i=0;i<v.size();i++)
v[i]->print();
std::cout<<"}\n";
}
};
int foo(void) {
tlist *m=new tlist;
tlist *l=new tlist;
l->v.push_back(new thingy("lone thingy"));
l->v.push_back(new thingy("loner's buddy"));
m->v.push_back(l);
m->v.push_back(new thingy("The other thingy"));
m->print();
return 0;
}
(executable NetRun link)
Note the key feature that makes this work--a list can now contain other lists, because a list is a thingy.
Places where the recursive-list paradigm is used
- Directories and files. Files are thingys.
Directories are actually files deep down, that contains lists of other
files, some of which are marked as directories.
- In the language LISP,
there are two kinds of object: "atoms" like integers and characters,
and lists. A list can contain other lists or atoms inside
it. Everything in the language--structs, arrays, even source
code--is built from lists of lists and atoms.
- In many media
file formats, you've got lots of little chunks of assorted information
to keep track of. Sometimes one big chunk, like a header, will
contain lots of little chunks. These chunks are thus one version
of the recursive-list idea.
Tagged Media File Formats
Most media file formats nowdays are "tagged" formats. Basically,
the file consists of a bunch of tagged pieces of data, that look like
this:
- A readable tag name, like "fmt ". This is almost always 4 ASCII characters long, although JPEG uses 2 binary bytes.
- The length of the data to follow. This is usually a binary
number, either little or big endian (Microsoft and other PC formats are
little endian, Mac or UNIX formats are big-endian), and usually just 4
bytes long. This means there's a 4GB limit on these formats, such
as AVI movies and TIFF images!
- The data stored in the tag. What's in this portion depends on the tag.
The beautiful part about this is the "data" can be anything. You
can even make up your own, new tags, put your own data in them, and
your data won't mess up old programs that read your modified files, but
you can still read your modified data.
The "data" portion of a tagged chunk can even contain other tagged chunks!
Tagged File Example: .wav
For example, a .wav sound file is stored in Microsoft RIFF
format, a simple tagged format. Tag names are 4 ASCII
bytes, and lengths are 4 little-endian bytes. Every RIFF file
contains one top-level tag, the "RIFF" tag, whose length is the length
of the whole file. The data part of the file's main RIFF tag
consists of a 4-byte file type, followed by a list of other
tagged chunks. For a .wav file, there are only two other chunks:
a "fmt " (note the space!) chunk, which describes the sound data format
(8 bit or 16 bit, mono or stereo or Dolby 5-channel), and the "data"
chunk, which has the actual sound data.
> ./catriff ./in.wav
Tag at start of file 'RIFF'
Length of file == 70212 bytes
Tag for file type 'WAVE'
Tag of chunk 'fmt '
Length of chunk == 16 bytes
-- -- -- -- -- + -- -- -- + -- -- -- -- -- --
01 00 01 00 11 2B 00 00 11 2B 00 00 01 00 08 00
Tag of chunk 'data'
Length of chunk == 70176 bytes
t x y v w y w w w t t u s t v t
v x v v x w t v x x z y y y y y
z { } ++ ++ ~ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++
++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++
74 78 79 76 77 79 77 77 77 74 74 75 73 74 76 74
76 78 76 76 78 77 74 76 78 78 7A 79 79 79 79 79
7A 7B 7D 7F 7F 7E 80 82 82 87 88 89 8D 8A 8C 8B
8A 8D 8E 90 94 95 95 95 95 96 95 94 94 95 97 9A
... skipping 70112 bytes ...
Here's a program I wrote that spits out the contents of any RIFF file. Here are some sample RIFF files, including the audio file displayed above.
Tagged File Example: .avi
AVI stands for "Audio Video Interleaved". .avi files are actually
just RIFF files, formatted exactly like .wav files except using
different RIFF tags. So you can run the "catriff" program on an
AVI file as well:
Tag at start of file 'RIFF'
Length of file == 9234802 bytes
Tag for file type 'AVI '
Tag of chunk 'LIST'
Length of chunk == 326 bytes
Tag of sublist 'hdrl'
Tag of chunk 'avih'
Length of chunk == 56 bytes
++ ++ -- -- ++ ++ -- -- -- -- -- -- -- -- -- --
++ -- -- -- -- -- -- -- -- -- -- -- ++ > -- --
++ -- -- -- ++ -- -- -- -- -- -- -- -- -- -- --
-- -- -- -- -- -- -- --
A0 86 01 00 B2 9F 0C 00 00 00 00 00 10 00 01 00
90 00 00 00 00 00 00 00 02 00 00 00 DD 3E 01 00
80 02 00 00 E0 01 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00
Tag of chunk 'LIST'
Length of chunk == 116 bytes
Tag of sublist 'strl'
Tag of chunk 'strh'
Length of chunk == 56 bytes
v i d s m j p g -- -- -- -- -- -- -- --
-- -- -- -- ++ ++ -- -- @ B -- -- -- -- -- --
++ -- -- -- ++ > -- -- -- ' -- -- -- -- -- --
-- -- -- -- ++ -- ++ --
76 69 64 73 6D 6A 70 67 00 00 00 00 00 00 00 00
00 00 00 00 A0 86 01 00 40 42 0F 00 00 00 00 00
90 00 00 00 DD 3E 01 00 10 27 00 00 00 00 00 00
00 00 00 00 80 02 E0 01
Tag of chunk 'strf'
Length of chunk == 40 bytes
( -- -- -- ++ -- -- -- ++ -- -- -- -- -- -- --
M J P G -- -- -- -- -- -- -- -- -- -- -- --
-- -- -- -- -- -- -- --
28 00 00 00 80 02 00 00 E0 01 00 00 01 00 18 00
4D 4A 50 47 00 10 0E 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00
Tag of chunk 'LIST'
Length of chunk == 92 bytes
Tag of sublist 'strl'
Tag of chunk 'strh'
Length of chunk == 56 bytes
a u d s -- -- -- -- -- -- -- -- -- -- -- --
-- -- -- -- -- -- -- -- -- + -- -- -- -- -- --
-- l -- -- -- + -- -- -- ' -- -- -- -- -- --
-- -- -- -- -- -- -- --
61 75 64 73 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 01 00 00 00 10 2B 00 00 00 00 00 00
1A 6C 02 00 10 2B 00 00 10 27 00 00 01 00 00 00
00 00 00 00 00 00 00 00
Tag of chunk 'strf'
Length of chunk == 16 bytes
-- -- -- -- -- + -- -- -- + -- -- -- -- -- --
01 00 01 00 10 2B 00 00 10 2B 00 00 01 00 08 00
Tag of chunk 'IDIT'
Length of chunk == 26 bytes
S a t F e b 1 8 1 3 : 0 7
: 5 1 2 0 0 6 -- --
53 61 74 20 46 65 62 20 31 38 20 31 33 3A 30 37
3A 35 31 20 32 30 30 36 0A 00
Tag of chunk 'LIST'
Length of chunk == 24 bytes
Tag of sublist 'INFO'
Tag of chunk 'ISFT'
Length of chunk == 12 bytes
C a n o n M V I 0 1 -- --
43 61 6E 6F 6E 4D 56 49 30 31 00 00
Tag of chunk 'JUNK'
Length of chunk == 1662 bytes
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
... skipping 1598 bytes ...
Tag of chunk 'LIST'
Length of chunk == 9230202 bytes
Tag of sublist 'movi'
Tag of chunk '01wb'
Length of chunk == 11024 bytes
y { { { | } ++ ++ ++ ++ ++ ++ ++ ++ | {
{ z z { { z { | ++ ++ ++ ++ ++ ++ ++ ++
++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++ ++
++ ++ ++ ++ ++ } } | | { z } ~ ~ ++ ++
79 7B 7B 7B 7C 7D 80 81 81 81 81 80 80 7F 7C 7B
7B 7A 7A 7B 7B 7A 7B 7C 7F 81 81 82 84 83 83 85
85 86 85 86 86 85 85 85 86 86 86 86 86 86 85 83
82 81 82 7F 7F 7D 7D 7C 7C 7B 7A 7D 7E 7E 7F 80
... skipping 10960 bytes ...
Tag of chunk '00dc'
Length of chunk == 30282 bytes
++ ++ ++ ++ -- -- A V I 1 -- -- -- -- -- --
-- -- -- -- ++ ++ -- -- -- -- ++ ++ -- ++ -- --
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
-- -- -- -- -- -- ! -- ! -- -- -- ) ) # #
FF D8 FF E0 00 10 41 56 49 31 00 00 00 00 00 00
00 00 00 00 FF DD 00 04 00 00 FF DB 00 84 00 06
06 06 08 0A 08 0A 0A 0A 0A 0A 0E 0E 0E 0A 10 12
12 12 12 10 15 1F 21 14 21 1F 15 19 29 29 23 23
... skipping 30218 bytes ...
Tag of chunk '00dc'
Length of chunk == 29368 bytes
++ ++ ++ ++ -- -- A V I 1 -- -- -- -- -- --
-- -- -- -- ++ ++ -- -- -- -- ++ ++ -- ++ -- --
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
-- -- -- -- -- -- ! -- ! -- -- -- ) ) # #
FF D8 FF E0 00 10 41 56 49 31 00 00 00 00 00 00
00 00 00 00 FF DD 00 04 00 00 FF DB 00 84 00 06
06 06 08 0A 08 0A 0A 0A 0A 0A 0E 0E 0E 0A 10 12
12 12 12 10 15 1F 21 14 21 1F 15 19 29 29 23 23
... skipping 29304 bytes ...
... lots and lots of image frames later ...
Tag of chunk 'idx1'
Length of chunk == 2544 bytes
0 1 w b -- -- -- -- -- -- -- -- -- + -- --
0 0 d c -- -- -- -- -- + -- -- J v -- --
0 0 d c -- -- -- -- n ++ -- -- ++ r -- --
0 0 d c -- -- -- -- . -- -- -- * ++ -- --
30 31 77 62 00 00 00 00 04 00 00 00 10 2B 00 00
30 30 64 63 10 00 00 00 1C 2B 00 00 4A 76 00 00
30 30 64 63 10 00 00 00 6E A1 00 00 B8 72 00 00
30 30 64 63 10 00 00 00 2E 14 01 00 2A AE 00 00
... skipping 2480 bytes ...
The "00dc" tags seem to be JPEG-compressed video frames ("Motion JPEG"
or "MJPG"), and the "01wb" frames seem to be 8-bit mono audio
samples. Note the complex header, which describes the video and
audio compression format. I don't even know what the "JUNK" tag
is supposed to contain, but luckily with tagged file formats unknown
stuff is easy to skip!
This is an AVI captured by my digital camera, a Canon PowerShot
A520. Most other digital cameras capture in approximately the
same format, although the latest few seem to be moving to MPEG-4.
Virtually all media file formats have tags and byte counts like this,
including MPEG (video and audio MP3), JPEG, Mac PICT, QuickTime, WMV,
TIFF, and so on. About the only formats that *aren't* built
completely around tags are ancient formats like BMP, PCX, or
TARGA.