Differences between revisions 1 and 2
Revision 1 as of 2011-10-27 18:40:53
Size: 4228
Editor: GreyCat
Comment:
Revision 2 as of 2012-08-29 03:44:43
Size: 6793
Editor: ormaaj
Comment: How FDs with compound commands work when there are conflicts.
Deletions are marked like this. Additions are marked like this.
Line 71: Line 71:
== Juggling FDs ==

Compound commands and functions provide something analogous to block level variable scope for file descriptors. When you enter a compound command and provide it redirections, the effect should be similar to starting a subshell process with its own independent file descriptor table so that upon leaving, the original FDs are restored (though, you can still, close/move/manipulate FDs associated with an outer "scope" using `exec` and different redirects). Since without forking, the OS maintains only one set of FDs for the entire process, the shell must maintain its own stack of FD mappings in order to simulate nested FD scope.

In this example we're forcing the shell to open and manage two different files on FD 3 at the same time within a single process. Of course, it is impossible to have two files open on the same FD at once, yet things still behave as though you could.

{{{
# ksh93
 ~ $ builtin cat fds; type cat
cat is a shell builtin
 ~ $ function f { fds -l; cat /dev/fd/3; }
 ~ $ { f 3<&4; f; } <<<33333 3<&0 <<<44444 4<&0 <&2
00 rw- crw--w---- /dev/pts/8
01 rw- crw--w---- /dev/pts/8
02 rw- crw--w---- /dev/pts/8
03 rw- -rw-r----- /dev/inode/22/53172158
04 rw- -rw-r----- /dev/inode/22/53172158
10 rwx -rw-r----- /dev/inode/22/53172157
44444
00 rw- crw--w---- /dev/pts/8
01 rw- crw--w---- /dev/pts/8
02 rw- crw--w---- /dev/pts/8
03 rw- -rw-r----- /dev/inode/22/53172157
04 rw- -rw-r----- /dev/inode/22/53172158
33333
}}}

The `cat` builtin is asked to open up a special file under Linux that's guaranteed to point to a given FD for the current process so that the shell shouldn't use any internal trickery (like it could with the `</dev/fd/*` redirects). As you can see, the shell pulls this off by DUPing FD3 to FD10, and remembers to put it back and close the temporary FD (some free FD chosen by the shell that's >= 10) after returning from the function. Entering/leaving a compound command, function, or builtin command is really just sugar for this uglier more explicit form:

{{{
$ { exec {savefd}<&3 3<&4; f; exec 3<&"$savefd"-; f; } <<<...
}}}

When performing complex tasks it is perfectly possible to trample on these temporary FDs especially when indiscriminately reading from / manipulating "high-numbered" FDs when the shell might be using them. This is one of the reasons ksh93 and Bash 4.1 have the `{var}`-style redirects to allow the shell to allocate a free FD for you and place it directly into a variable which you can then reference instead of using hard-coded FD numbers.

File Descriptors

A File Descriptor (FD) is a number which refers to an open file. Each process has its own private set of FDs, but FDs are inherited by child processes from the parent process.

Every process should inherit three open FDs from its parent: 0 ("standard input"), open for reading; and 1 ("standard output") and 2 ("standard error"), open for writing. A process that is started without one or more of these may behave unpredictably. (So never close stderr. Always redirect to /dev/null instead.)

Processes may open additional FDs as needed (up to whatever limit the operating system imposes). In most languages, when you open a new file, you are given back the FD number that the operating system selects (or a library manages the FD number for you and hides the details). In shell scripts, however, the paradigm is different: you select the FD number first, and then open the file using that FD. This means you, as the script writer, must keep track of which FDs you are using for each task.

Shells use redirection to work with FDs. For already-existing FDs, output can be sent to them, or input can be read from them, by using file descriptor duplication syntax:

echo "unexpected error: $foo" 1>&2

while read -r line 0<&3; do ...

The Bash Guide and Bash FAQ 55 give an introductory explanation, so we'll remain concise here. In the echo example, we know that echo normally writes to stdout (FD 1), so we override FD 1 to point to where FD 2 is pointing. Thus, echo will be tricked into writing to our stderr (FD 2). In the read example, we know that read normally reads from stdin (FD 0), but we override FD 0 to point to wherever our FD 3 is pointing; and so read will pull its input from there, instead of from our FD 0. These redirections are transient, only applying to the single commands where they appear.

In order to create new FDs, we must open files for them to point to. Typically we want the FD to be available within our shell, so that it can be reused or passed to children as needed. To open a file in a shell script, we use the exec command:

exec 3> myfifo

As we said earlier, you must know at the time you're writing the script which FD number you want to use for each task. In the example above, we open FD 3 for output to a file (or something) named myfifo. Presumably we will use this open FD later, to write information to the file.

Input redirection works the same way:

exec 4< /etc/passwd

An FD can also be opened for both reading and writing:

# Bash
exec 3<> /dev/tcp/www.google.com/80

This is necessary for most socket I/O applications (send a message to a service, and receive a response from it, over a single socket).

Once an FD has been opened, it can be used for reading and writing using the redirection techniques described earlier on this page. Here is a complete HTTP request in bash:

   1 #!/usr/bin/env bash
   2 exec 3<> /dev/tcp/www.google.com/80 || exit 1
   3 printf 'HEAD / HTTP/1.1\nHost: www.google.com\nConnection: close\n\n' >&3
   4 cat <&3

Here, we open FD 3 to point to a TCP socket; then we write an HTTP request to the socket; then we read the response from the socket.

Working with NamedPipes generally involves similar techniques -- creating the FIFO first, then setting up a reader and a writer. If the writer will be a script, and wishes to write more than once to the FIFO without triggering an EOF condition for the reader, then the script will open an FD:

exec 3> myfifo
echo "something" >&3
...
echo "something else" >&3

When we are finished with an FD, we can close it. We need to know the number, and whether it was opened for reading or writing.

exec 3>&-   # Close FD 3 which was open for writing
exec 4<&-   # Close FD 4 which was open for reading

All our FDs will be closed when we exit, but it is a good practice to close them ourselves anyway. (If we wish to close the FD to free the resources before exiting, then we must also do it explicitly.)

Juggling FDs

Compound commands and functions provide something analogous to block level variable scope for file descriptors. When you enter a compound command and provide it redirections, the effect should be similar to starting a subshell process with its own independent file descriptor table so that upon leaving, the original FDs are restored (though, you can still, close/move/manipulate FDs associated with an outer "scope" using exec and different redirects). Since without forking, the OS maintains only one set of FDs for the entire process, the shell must maintain its own stack of FD mappings in order to simulate nested FD scope.

In this example we're forcing the shell to open and manage two different files on FD 3 at the same time within a single process. Of course, it is impossible to have two files open on the same FD at once, yet things still behave as though you could.

# ksh93
 ~ $ builtin cat fds; type cat
cat is a shell builtin
 ~ $ function f { fds -l; cat /dev/fd/3; }
 ~ $ { f 3<&4; f; } <<<33333 3<&0 <<<44444 4<&0 <&2
00 rw- crw--w---- /dev/pts/8
01 rw- crw--w---- /dev/pts/8
02 rw- crw--w---- /dev/pts/8
03 rw- -rw-r----- /dev/inode/22/53172158
04 rw- -rw-r----- /dev/inode/22/53172158
10 rwx -rw-r----- /dev/inode/22/53172157
44444
00 rw- crw--w---- /dev/pts/8
01 rw- crw--w---- /dev/pts/8
02 rw- crw--w---- /dev/pts/8
03 rw- -rw-r----- /dev/inode/22/53172157
04 rw- -rw-r----- /dev/inode/22/53172158
33333

The cat builtin is asked to open up a special file under Linux that's guaranteed to point to a given FD for the current process so that the shell shouldn't use any internal trickery (like it could with the </dev/fd/* redirects). As you can see, the shell pulls this off by DUPing FD3 to FD10, and remembers to put it back and close the temporary FD (some free FD chosen by the shell that's >= 10) after returning from the function. Entering/leaving a compound command, function, or builtin command is really just sugar for this uglier more explicit form:

$ { exec {savefd}<&3 3<&4; f; exec 3<&"$savefd"-; f; } <<<...

When performing complex tasks it is perfectly possible to trample on these temporary FDs especially when indiscriminately reading from / manipulating "high-numbered" FDs when the shell might be using them. This is one of the reasons ksh93 and Bash 4.1 have the {var}-style redirects to allow the shell to allocate a free FD for you and place it directly into a variable which you can then reference instead of using hard-coded FD numbers.


CategoryShell

FileDescriptor (last edited 2018-07-26 21:42:10 by GreyCat)