File Descriptors

A File Descriptor (FD) is a number which refers to an open file. Each process has its own private set of FDs, but FDs are inherited by child processes from the parent process.

Every process should inherit three open FDs from its parent: 0 ("standard input"), open for reading; and 1 ("standard output") and 2 ("standard error"), open for writing. A process that is started without one or more of these may behave unpredictably. (So never close stderr. Always redirect to /dev/null instead.)

Processes may open additional FDs as needed (up to whatever limit the operating system imposes). In most languages, when you open a new file, you are given back the FD number that the operating system selects (or a library manages the FD number for you and hides the details). In shell scripts, however, the paradigm is different: you select the FD number first, and then open the file using that FD. This means you, as the script writer, must keep track of which FDs you are using for each task. (In newer versions of bash, you can actually let bash select a file descriptor for you and store it in a variable of your choice. But we'll skip that for now.)

Shells use redirection to work with FDs. For already-existing FDs, output can be sent to them, or input can be read from them, by using file descriptor duplication syntax:

echo "unexpected error: $foo" 1>&2

while read -r line 0<&3; do ...

The Bash Guide and Bash FAQ 55 give an introductory explanation, so we'll remain concise here. In the echo example, we know that echo normally writes to stdout (FD 1), so we override FD 1 to point to where FD 2 is pointing. Thus, echo will be tricked into writing to our stderr (FD 2). In the read example, we know that read normally reads from stdin (FD 0), but we override FD 0 to point to wherever our FD 3 is pointing; and so read will pull its input from there, instead of from our FD 0. These redirections are transient, only applying to the single commands where they appear.

In order to create new FDs, we must open files for them to point to. Typically we want the FD to be available within our shell, so that it can be reused or passed to children as needed. To open a file in a shell script, we use the exec command:

exec 3> myfifo

As we said earlier, you must know at the time you're writing the script which FD number you want to use for each task. In the example above, we open FD 3 for output to a file (or something) named myfifo. Presumably we will use this open FD later, to write information to the file.

Input redirection works the same way:

exec 4< /etc/passwd

An FD can also be opened for both reading and writing:

# Bash
exec 3<> /dev/tcp/www.google.com/80

This is necessary for most socket I/O applications (send a message to a service, and receive a response from it, over a single socket). You'll almost never use it with regular files.

Once an FD has been opened, it can be used for reading/writing using the redirection techniques described earlier on this page. Here is a complete HTTP request in bash:

   1 #!/usr/bin/env bash
   2 exec 3<> /dev/tcp/www.google.com/80 || exit 1
   3 printf 'HEAD / HTTP/1.1\nHost: www.google.com\nConnection: close\n\n' >&3
   4 cat <&3

Here, we open FD 3 to point to a TCP socket; then we write an HTTP request to the socket; then we read the response from the socket.

Working with NamedPipes generally involves similar techniques -- creating the FIFO first, then setting up a reader and a writer. If the writer will be a script, and wishes to write more than once to the FIFO without triggering an EOF condition for the reader, then the script will open an FD:

exec 3> myfifo
echo "something" >&3
...
echo "something else" >&3

When we are finished with an FD, we can close it. We only need to know the FD number.

exec 3>&-   # Close FD 3
exec 4<&-   # Close FD 4

It doesn't matter whether you use >&- or <&- to close a file descriptor. They both do the same thing.

All our FDs will be closed when we exit, but it is a good practice to close them ourselves anyway. (If we wish to close the FD to free the resources before exiting, then we must also do it explicitly.)

Juggling FDs

Compound commands and functions provide something analogous to block level variable scope for file descriptors. When you enter a compound command and provide it redirections, the effect should be similar to starting a subshell process with its own independent file descriptor table so that upon leaving, the original FDs are restored (though you can still close/move/manipulate FDs associated with an outer "scope" using exec and different redirects). Since without forking, the OS maintains only one set of FDs for the entire process, the shell must maintain its own stack of FD mappings in order to simulate nested FD scope.

In this example we're forcing the shell to open and manage two different files on FD 3 at the same time within a single process. Of course, it is impossible to have two files open on the same FD at once, yet things still behave as though you could.

# ksh93
 ~ $ builtin cat fds; type cat
cat is a shell builtin
 ~ $ function f { fds -l; cat /dev/fd/3; }
 ~ $ { f 3<&4; f; } <<<33333 3<&0 <<<44444 4<&0 <&2
00 rw- crw--w---- /dev/pts/8
01 rw- crw--w---- /dev/pts/8
02 rw- crw--w---- /dev/pts/8
03 rw- -rw-r----- /dev/inode/22/53172158
04 rw- -rw-r----- /dev/inode/22/53172158
10 rwx -rw-r----- /dev/inode/22/53172157
44444
00 rw- crw--w---- /dev/pts/8
01 rw- crw--w---- /dev/pts/8
02 rw- crw--w---- /dev/pts/8
03 rw- -rw-r----- /dev/inode/22/53172157
04 rw- -rw-r----- /dev/inode/22/53172158
33333

The cat builtin is asked to open up a special file under Linux that's guaranteed to point to a given FD for the current process so that the shell shouldn't use any internal trickery (like it could with the </dev/fd/* redirects). As you can see, the shell pulls this off by DUPing FD3 to FD10, and remembers to put it back and close the temporary FD (some free FD chosen by the shell that's >= 10) after returning from the function. Entering/leaving a compound command, function, or builtin command is really just sugar for this uglier more explicit form:

$ { exec {savefd}<&3 3<&4; f; exec 3<&"$savefd"-; f; } <<<...

When performing complex tasks it is perfectly possible to trample on these temporary FDs especially when indiscriminately reading from / manipulating "high-numbered" FDs when the shell might be using them. This is one of the reasons ksh93 and Bash 4.1 have the {var}-style redirects to allow the shell to allocate a free FD for you and place it directly into a variable which you can then reference instead of using hard-coded FD numbers.

Examples

Here's an example from FAQ 32, with explanations:

# Keep both stdout and stderr unmolested.
exec 3>&1 4>&2
foo=$( { time bar 1>&3 2>&4; } 2>&1 )  # Captures time only.
exec 3>&- 4>&-

The goal of this code is to capture the results from bash's time command in a variable, while letting the timed command's stdout and stderr go wherever they were originally supposed to go. This is tricky because time also writes to stderr, not to a separate FD. However, since bash's time is a magic keyword that uses its own "scope" for redirections (much like a curly-brace command grouping), we can apply redirections at different levels to get the results we want.

So, the first thing we do is save the shell's current stdout (FD 1) and stderr (FD 2) in two new file descriptors, so we can use them later.

Next, we set up a command substitution, which temporarily redirects FD 1 to an internal pipe for capturing stdout. But we want time's results to go to this destination, and time writes to FD 2, so we use 2>&1 to send FD 2 to the capture-pipe. This is done outside of a block that we set up specifically to isolate time which is, again, magic. This would not be necessary or possible with other, standard, commands.

We set up the curly-brace command grouping to sandbox the time command, whose stdout and stderr are currently going to be captured. time does not write to stdout, but the command we're going to time presumably does, so we want to take care of that.

Inside the braced sandbox, we have the command bar that is going to be timed, and we want that command's stdout and stderr to go to wherever they normally would have gone if we weren't setting up this Rube Goldberg device. Fortunately, we saved those destinations earlier. So, inside the sandbox, we simply tell bar to write to our saved destinations using 1>&3 2>&4, et voila!

Once the timed command has finished, we no longer need FDs 3 and 4, so we close them. And we're all done.


CategoryShell

FileDescriptor (last edited 2018-07-26 21:42:10 by GreyCat)