Linux Magazine | June 1999 | COMPILE TIME | The Linux File Access Primitives

One of the most important abstractions of the POSIX API is the file. While nearly all operating systems provide files for permanent storage, all versions of UNIX provide access to most system resources through the file abstraction.

More concretely, this means that Linux uses the same set of system calls to provide access to devices (such as floppy disks and tape devices), networking resources (most commonly TCP/IP connections), system terminals, and even kernel status information. Thanks to their ubiquity, fluency in file-related system calls is important for every Linux programmer. Let's examine the basic concepts behind the file API and describe the most important file related system calls.

Linux provides many different kinds of files. The most common type is simply called a regular file, which stores hunks of information for later access. The vast majority of files you work with; such as executables (e.g., /bin/vi), data files (e.g., /etc/ passwd), and system libraries (e.g., /lib/libc.so.6); are all regular files. Usually these reside somewhere on disk, but that may not necessarily be the case (as we'll see later).

Another type of file is the directory, which contains a list of other files and their locations. When you use the ls command to list the files in a directory, it opens the file for that directory and prints out information on all of the files mentioned in it.

Other files include block devices (which represent filesystem-cached devices such as hard drives), character devices (which represent uncached devices like tape drives, mice, and system terminals), pipes and sockets (which allow processes to talk to one another), and symbolic links (which allow files to be given more than one name in the directory hierarchy).

Most files have one or more symbolic names which refer to them. These symbolic names are a set of strings delimited by the / character, and identify the file to the kernel. These are the pathnames with which all Linux users are quite familiar; for example, the pathname /home/ewt/articlerefers to the file that contains the text of this article on my laptop. No two files share the same name (a single file can have more then one name, however), so a pathname uniquely identifies a single file.

Each file a process has access to is identified by a small nonnegative integer, called a "file descriptor". File descriptors are created by system calls which open files and are inherited by new processes which are forked off from the current process. That is, when a process starts a new program, the original process's open files are normally inherited by the new program.

By convention, most programs reserve the first three file descriptors (0, 1, and 2) for a special purposes -- access to the so-called standard input, standard output, and standard error streams. File descriptor 0 is standard input, where many programs expect to receive input from the outside world. File descriptor 1 is standard output. Most programs display normal output there. For output related to error conditions, file descriptor 2 (standard error) is used.

Anyone comfortable with Linux shells has seen the use of the standard in, out, and error file descriptors. Normally, the shell runs commands with file descriptors 0, 1, and 2 all referring to the shell's terminal. When the > character is used to instruct the shell to send a program's output to another file, the shell opens that file as file descriptor 1 before invoking the new program. This causes the program to send its output to the given file rather than the user's terminal -- the beauty is that this is transparent to the program itself!

Similarly, the < character instructs the shell to use a particular file as file descriptor 0. This forces the program to read its input from that file -- in both cases, any errors from the program will still appear on the terminal, as those are sent to standard error on file descriptor 2. (Under the "bash" shell, you can redirect standard error using 2> rather than >.) This type of file redirection is one of the most powerful features of the Linux command line.

Before using any file-related system calls, programs should include <fcntl.h> and <unistd.h>; these provide the function prototypes and constants for the most common file routines. In the example code below, we'll assume that each program begins with

#include <fcntl.h>

#include <unistd.h>