Fixed-Width and Binary File I/O
Dr. Lawlor, CS
202, CS, UAF
There are several frustrating things about writing ordinary text files.
First, you need to remember to put spaces between each integer.
That is, "cout<<10<<12<<13" writes "101213" into the
file, which reads as one big integer, not three little ones!
Second, each value written into a text file can use a different number
of bytes, which makes it almost impossible to use "seekg" to jump to a
given value in a file. For example, if I've got a million
integers stored in a text file, the integers are all at different byte
offsets, as follows ("_" stands for a space here):
Byte
|
Data
|
Length
|
0
|
7_
|
2
|
2
|
117_
|
4
|
6
|
42_
|
3
|
9
|
6_
|
2
|
And so on. At least the code is easy to write:
fstream f; f.open("test.txt",ios::out);
const int n=4;
int arr[n]={7,117,42,6};
for (int i=0;i<n;i++) {
f<<arr[i]<<" "; // don't forget the space!
}
f.close();
cat("test.txt");
(Try this in NetRun now!)
But we really can't seek to a given integer, because everything is a
different size. This is a bummer--if you've got a billion ints
stored in a file, it's silly to have to read them all just to find the
last one!
Fixed-Width Data allows Seeking
You can get random access in a file if all your data is the same
size. For example, if every int is exactly 5 bytes in the
file, then int "i" is stored at byte "5*i", so you can just
"f.seekg(5*i);" and then read the integer. This is an extremely
powerful method to speed up random access in long files!
Byte
|
Data
|
Length
|
0
|
____7
|
5
|
5
|
__117
|
5
|
10
|
___42
|
5
|
15
|
____6
|
5
|
C++ includes the "setw" manipulator, that lets you set the output
width. Here's the code that writes this type of "fixed width"
file:
fstream f; f.open("test.txt",ios::out);
const int n=4;
int arr[n]={7,117,42,6};
for (int i=0;i<n;i++) {
f<<setw(5)<<arr[i]; // no space needed, IF arr[i]<=9999
}
f.close();
cat("test.txt");
(Try this in NetRun now!)
You can then seek to a given location in the file, and read any integer:
fstream f; f.open("test.txt",ios::out);
const int n=4;
int arr[n]={7,117,42,6};
for (int i=0;i<n;i++) {
f<<setw(5)<<arr[i]; // no space needed, IF arr[i]<=9999
}
f.close();
f.open("test.txt",ios::in); // open file for reading
int i=2; // index of the integer to read
f.seekg(5*i); // seek to integer i
int val=-1; f>>val;
f.close();
return val;
(Try this in NetRun now!)
One downside of fixed-width plain-text files is that people want to go
in and edit the files by hand, for example in notepad, which destroys
the fixed-width property. Plain text is also fairly
space-inefficient, especially with fixed-width mode; for example, an
ordinary integer can hold values in the billions, up to ten digits
worth, so you need at least eleven characters per integer counting the
space character between numbers, or twelve counting a minus sign!
Binary File I/O
Writing your fixed-width file in binary format has a few effects:
- Binary formats are almost always fixed-size, so seeking in the file is almost always possible.
- Binary formats are often space-efficient, with no delimiters or spaces needed.
- Binary format speeds up I/O substantially (fourfold or more in my
measurements), because the binary file format exactly matches the
machine's natural in-memory format.
- However, different machines may use different in-memory
formats; in particular, non-x86 CPUs often use a different byte order
for integers. This can require byte order conversion, or
switching back to a text format.
- A binary format prevents people from editing the files with
ordinary text editors. You need a program or a special "hex
editor" to do anything useful with a binary file.
The syntax is pointer-based, and looks a little bit weird:
"f.write((char *)&i,sizeof(i));". Here are the parts of that
statement:
- The value you want to write is "i".
- A pointer to the value you want to write is "&i". We're
accessing the raw memory inside the object directly here, so we need a
pointer.
- We typecast this pointer to a "char *" for the fstream, so the argument is "(char *)&i".
- "sizeof(i)" gives the number of bytes inside the variable i.
Here's an example where we write a short binary file, and then read it back.
fstream f;
// Write binary data:
f.open("that_file.dat",ios::out|ios::binary);
int i=3;
f.write((char *)&i,sizeof(i));
f.close();
cat("that_file.dat");
// Read binary data:
f.open("that_file.dat",ios::in|ios::binary);
int v=0;
f.read((char *)&v,sizeof(v));
if (f) cout<<"I read the integer "<<v<<" from the file!\n";
f.close();
(Try this in NetRun now!)
Here's a more complex example where we write several integers into a
binary file, and then use seek to read them back several times:
fstream f;
f.open("that_file.dat",ios::out|ios::binary);
const int n=4;
int arr[n]={7,117,42,6};
for (int i=0;i<n;i++) {
int val=arr[i];
f.write((char *)&val,sizeof(val));
}
f.close();
cat("that_file.dat");
f.open("that_file.dat",ios::in|ios::binary);
for (int pass=0;pass<2;pass++) // make several passes through the file
{
f.seekg(0); // back to start of the file
while (f) {
int i=-1;
f.read((char *)&i,sizeof(i));
if (f) cout<<"I read the integer "<<i<<" from the file!\n";
}
f.clear(); // reset error state of f after hitting EOF
}
f.close();
(Try this in NetRun now!)
(I feel guilty about leaving out the error checking in these
examples. We'll see a better way to do error checking called
exception handling before spring break!)