Directories
CS 321 2007 Lecture, Dr. Lawlor
So a directory (a folder) is just a list of files and other
directories. This list is stored as a set of bytes, usually
with one fixed-size structure per file plus a variable-length name
list. So a directory is just a bunch of bytes, and you can store
those bytes (a directory's list
of files) inside another file! That is, a directory is just a
file that's marked "this file's bytes represent other files and
directories". Curious, no?
So every time you do anything to a directory, the OS is changing this list of files--just moving bytes around!
Reading the List of Files in a Directory
Reading the files in a directory is just exactly like reading a file,
although the names have been changed to protect you from the details of
the filesystem.
In UNIX systems: Linux, Mac OS X, etc.
Start with opendir, which takes a directory name and returns a "DIR *".
List each file with readdir, which takes a "DIR *" and returns a
"struct dirent *", which has a "d_name" field telling you the name of
the file.
Finish up with closedir, which frees the "DIR *".
#include <dirent.h> /* UNIX directory-list header */
#include <time.h> /* for "timespec", used in bits/stat.h (& whined about by icpc) */
#include <sys/stat.h> /* to tell if an item is a file or directory */
void unix_list(const char *dirName)
{
DIR *d=opendir(dirName);
if (d==0) return;
struct dirent *de;
while (NULL!=(de=readdir(d))) {
const char *name=de->d_name;
hit_file(dirName,name);
}
closedir(d);
}
In Windows
Start with FindFirstFile, which takes a directory plus filename
pattern, and returns a HANDLE and a WIN32_FIND_DATA. The
WIN32_FIND_DATA struct contains the name of the first matching file in
"cFileName", and the file's attributes (permissions) in
"dwFileAttributes".
A call to FindNextFile will find the next matching file.
Call FindClose when done.
#include <windows.h>
void win_list(const char *dirName)
{
char dirNamePat[1024];
sprintf(dirNamePat,"%s\\*",dirName); /* dirName, with trailing slash-star */
WIN32_FIND_DATA f;
HANDLE h=FindFirstFile(dirNamePat,&f);
if (h==INVALID_HANDLE_VALUE) return;
do {
const char *name=f.cFileName;
if (strcmp(name,".")==0 || strcmp(name,"..")==0)
continue; /* Bogus self links */
// printf("---dirName: %s, file: %s\n",dirNamePat,name);
if (f.dwFileAttributes&FILE_ATTRIBUTE_DIRECTORY)
hit_directory(dirName,name);
else
hit_file(dirName,name);
} while (FindNextFile(h,&f));
FindClose(h);
}
Performance Impact of Unsorted-List Directories
Deep down, the OS stores the list of files as a literal list--and the list isn't even sorted.
So try a little program like this:
#include <fstream>
#include <sstream>
int foo(void)
{
for (int thou=0;thou<10;thou++) {
double start=time_in_seconds();
for (int i=0;i<100;i++) {
std::ostringstream ns;
ns<<"file"<<thou<<"thou"<<i<<".dat";
std::string name=ns.str();
std::ofstream of(name.c_str());
of<<"Ugnh.";
}
double end=time_in_seconds();
std::cout<<" Created "<<thou+1<<"th hundred files: "<<end-start<<" sec\n";
}
return 0;
}
(executable NetRun link)
Because to do anything with a file, the OS has to search through the
huge list of existing files, every additional file slows down the
directory access yet further!
Created 1th hundred files: 0.00534678 sec
Created 2th hundred files: 0.00684118 sec
Created 3th hundred files: 0.00835299 sec
Created 4th hundred files: 0.010195 sec
Created 5th hundred files: 0.011657 sec
Created 6th hundred files: 0.0133369 sec
Created 7th hundred files: 0.014894 sec
Created 8th hundred files: 0.0167079 sec
Created 9th hundred files: 0.0177431 sec
Created 10th hundred files: 0.040544 sec
Creating/opening/deleting a file in a directory that contains just a
thousand other files is *ten times* slower than creating a file in a
directory with just a few dozen files. This is ridiculous, and we
shouldn't accept it, but every OS I know of does this!
One workaround is to create subdirectories, and store a fraction of the
total set of files in each subdirectory. For example, instead of
having a million files named like "foo123456.txt", make a thousand
subdirectories with a thousand files each, like "foo123/456.txt".
A *lot* of real programs end up doing this to work around this old,
common OS bug!
Creating New Directories
You make a new directory with the MS-DOS or UNIX shell command
"mkdir". On UNIX systems, the C/C++ function to call to make a
new directory is ... "mkdir". On Windows, it's
"CreateDirectory". In both cases, you can specify the permissions
you want. I usually write a common interface to Windows and Linux
like this:
#ifdef WIN32
#include <windows.h>
namespace osl {
inline bool mkdir(const char *pathname) {
return 0==CreateDirectory(pathname,0);
}
};
#else /* UNIX-like system */
#include <sys/stat.h>
namespace osl {
inline bool mkdir(const char *pathname) {
return 0==::mkdir(pathname,0777);
}
};
#endif
This lets me do osl::mkdir("foo"); on any system!