Input And Output

This basic principle of computer science applies just as well to applications started through BASH. BASH makes it fairly easy to play around with the input and output of commands, which gives us great flexibility and incredible opportunities for automation.


1. File Descriptors

Input and output from and to processes always occurs via so called File Descriptors (in short: FDs). FDs are kind of like pointers to sources of data. When something reads from or writes to that FD, the data is being read from or written to the FD's data source. FDs can point to regular files, but they can also point to more abstract data sources, like the input and output source of a process.

By default, every new process has three FDs. They are referred to by the names Standard Input, Standard Output and Standard Error. In short, they are respectively called stdin, stdout and stderr. The Standard Input is where the characters you type on your keyboard usually come from. The Standard Output is where the program sends most of its normal information to so that the user can see it, and the Standard Error is where the program sends its error messages to. Be aware that GUI applications work in the same way; but the actual GUI doesn't work via these FDs. GUI applications can still read and write from and to the standard FDs, but they usually don't. Usually, they do all the user interaction via that GUI; making it hard to control for BASH. As a result, we'll stick to simple console applications. Those we can easily feed data on the "Standard Input" and read data from on its "Standard Output" and "Standard Error".

Let's make these definitions a little more concrete. Here's a demonstration of how "Standard Input" and "Standard Output" work:

    $ read -p "What is your name? " name; echo "Good day, $name.  Would you like some tea?"
    What is your name? lhunath
    Good day, lhunath.  Would you like some tea?

read is a command that reads information from stdin and stores it in a variable. We specified name to be that variable. Once read has read a line of information from stdin, it finished and lets echo display a message. echo uses stdout to send its output to. stdin is connected to your terminal's input device; which is probably going to be your keyboard. stdout is connected to your terminal's output device; which I assume is a computer monitor. As a result; you can type in your name and are then greeted with a friendly message on your monitor, offering you a cup of tea.

So what is stderr? Let's demonstrate:

    $ rm secrets
    rm: cannot remove `secrets': No such file or directory

Unless if you had a file called secrets in your current directory; that rm command will fail and show an error message explaining what went wrong. Error messages like these are by convention displayed on stderr. stderr is also connected to your terminal's output device, just like stdout. As a result, error messages display on your monitor just like the messages on stdout. However, this separation makes it easy to keep errors separated from the application's normal messages. Some people like to use wrappers to make all the output on stderr red, so that they can see the error messages more clearly. This is not generally advisable, but it is a simple example of the many options this separation provides us with.


    echo "Uh oh.  Something went really bad.." >&2



2. Redirection

The most basic form of input/output manipulation in BASH is Redirection. Redirection is used to change the data source or destination of an application's FDs. That way, you can send the application's output to a file instead of the terminal, or have the application read from a file instead of from the keyboard.

Redirection, too, comes in different shapes. There's File Redirection, File Descriptor manipulation, Heredocs and Herestrings.




2.1. File Redirection

File Redirection is probably the most basic form of redirection. I'll start with this so you can grasp the concept of redirection well.

    $ echo "The story of William Tell.
    >
    > It was a cold december night.  Too cold to write." > story
    $ cat story
    The story of William Tell.
    It was a cold december night.  Too cold to write.

As a result; the echo command will not send its output to the terminal, but the > story operation changes the destination of the stdout FD so that it now points to a file called story. Be aware that before the echo command is executed, BASH first checks to see whether that file story actually exists. If it doesn't, it is created as an empty file, so that the FD can be pointed to it. This behaviour can be toggled with Shell Options (see later).

We then use the application cat to print out the contents of that file. cat is an application that reads the contents of all the files you pass it as arguments. It then outputs each file one after another on stdout. In essence, it concatenates the contents of all the files you pass it as arguments.

Warning: Far too many code examples and shell tutorials on the Internet tell you to use cat whenever you need to read the contents of a file. This is highly ill-adviced! cat only serves well to contatenate contents of multiple files together, or as a quick tool on the shell prompt to see what's inside a file. You should NOT use cat to read from files in your scripts. There will almost always be far better ways to do this. Please keep this warning in mind. Useless usage of cat will merely result in an extra process to create, and often results in poorer read speed because cat cannot determine the context of what it's reading and the purpose for that data.

When we use cat without passing any kind of arguments, it obviously doesn't know what files to read the content for. In this case, cat will just read from stdin instead of from a file (much like read). Since stdin is normally not a regular file, starting cat without any arguments will seem to do nothing:

    $ cat

It doesn't even give you back your shell prompt! What's going on? cat is still reading from stdin, which is your keyboard. Anything you type now will be sent to cat. As soon as you hit the Enter key, cat will do what it normally does; it will display what it reads on stdout, just the same way as when it displayed our story on stdout:

    $ cat
    test?
    test?

Why does it say test? twice now? Well, as you type, your terminal shows you all the characters that you send to stdin before sending them there. That results in the first test? that you see. As soon as you hit Enter, cat has read a line from stdin, and shows it on stdout, which is also your terminal; hence, resulting in the second line: test?. You can press Ctrl+D to send cat the End of File character. That'll cause cat to think the file stdin has closed. It will stop reading from it and return you to your prompt. Let's use file redirection to attach a file to stdin, so that stdin is no longer reading from our keyboard, but instead, now reads from the file:

    $ cat < story
    The story of William Tell.
    It was a cold december night.  Too cold to write.

The result of this is exactly the same as the result from our previous cat story; except this time, the way it works is a little different. In our first example, cat opened an FD to the file story and read its contents through that FD. In this recent example, cat simply reads from stdin, just like it did when it was reading from our keyboard. However, this time, the < story operation has modified stdin so that its data source is the file story rather than our keyboard.

Let's summarize:

Redirection operators can take a number. That number denotes the FD that it changes. If the number is not present, the > operator uses FD 1 by default, because that is the number for stdout. < uses FD 0 by default, because that is the number for stdin. The number for the stderr FD is 2. So, let's try sending the output of stderr to a file:

    $ for homedir in /home/*
    > do rm "$homedir/secret"
    > done 2> errors

In this example, we're looping over each file in /home. We then try to delete the file secret in each of them. Some homedirs may not have a secret. As a result, the rm operation will fail and send an error message on stderr.

You may have noticed that our redirection operator isn't on rm, but it's on that done thing. Why is that? Well, this way, the redirection applies to all output to stderr made inside the whole loop.

Let's see what the result of our loop was?

    $ cat errors
    rm: cannot remove `/home/axxo/secret': No such file or directory
    rm: cannot remove `/home/lhunath/secret': No such file or directory

Two error messages in our error log file. Two people that didn't have a secret file in their home directory.

If you're writing a script, and you expect that running a certain command may fail on occasion, but don't want the script's user to be bothered by the possible error messages that command may produce, you can silence an FD. Silencing it is as easy as normal File Redirection. We're just going to send all output to that FD into the system's black hole:

    $ for homedir in /home/*
    > do rm "$homedir/secret"
    > done 2> /dev/null

The file /dev/null is always empty, no matter what you write or read from it. As such, when we write our error messages to it, they just disappear. The /dev/null file remains as empty as ever before. That's because it's not a normal file, it's a virtual device.

There is one last thing you should learn about File Redirection. It's interesting that you can make error log files like this to keep your error messages; but as I mentioned before, BASH makes sure that the file exists before trying to redirect to it. BASH also makes sure the file is empty before redirecting to it. As a result, each time we run our loop to delete secret files, our log file will be truncated empty before we fill it up again with new error messages. What if we'd like to keep a record of any error messages generated by our loop? What if we don't want that file to be truncated each time we start our loop? The solution is achieved by doubling the redirection operator. > becomes >>. >> will not empty a file, it will just append new data to the end of it!

    $ for homedir in /home/*
    > do rm "$homedir/secret"
    > done 2>> errors

Hooray!




2.2. File Descriptor Manipulation

Now that you know how to manipulate process input and output by sending it to and reading it from files, let's make it a little more interesting still.

It's possible to change the source and desination of FDs to point to or from files, as you know. It's also possible to copy one FD to another. Let's prepare a simple testbed:

    $ echo "I am a proud sentence." > file

We've made a file called file, and written a proud sentence into it. It's time I introduce a new application to you. Its name is grep, and it's increadibly powerful. grep is that one thing that you need more than anything else in your household. It basically takes a search string as its first argument and one or more files as extra arguments. Just like cat, grep also uses stdin if you don't specify any files as extra arguments. grep reads the files (or stdin if none were provided) and searches for the search string you gave it. Most versions of grep even support a -r switch, which makes it take directories as well as files as extra arguments, and then searches all the files and directories in those directories that you gave it. Here's an example of how grep can work:

    $ ls house/
    house/drawer  house/closet  house/dustbin  house/sofa
    $ grep -r socks house/
    house/sofa:socks

In this silly example we have a directory called house with several pieces of furniture in it as files. If we're looking for our socks in each of those files, we send grep to search the directory house/. grep will search everything in there, open each file and look through its contents. In our example, grep finds socks in the file house/sofa; presumably tucked away under a pillow. You want a more realistic example? Sure:

    $ grep "$HOSTNAME" /etc/*
    /etc/hosts:127.0.0.1       localhost Lyndir

Here we instruct grep to search for whatever $HOSTNAME expands to in whatever /etc/* expands to. It finds my hostname, which is Lyndir in the file /etc/hosts, and shows me the line in that file that contains the search string.

OK, now that you understand grep, let's continue with our File Descriptor Manipulation. Remeber that we created a file called file, and wrote a proud sentence to it? Let's use grep to find where that proud sentence is now:

    $ grep proud *
    file:I am a proud sentence.

Good! grep found our sentence in file. It writes the result of its operation to stdout which is shown on our terminal. Now let's see if we can make grep send an error message, too:

    $ grep proud file 'not a file'
    file:I am a proud sentence.
    grep: not a file: No such file or directory

This time, we instruct grep to search for the string proud in the files 'file' and 'not a file'. file exists, and the sentence is in there, so grep happily writes the result to stdout. It moves on to the next file to scan, which is 'not a file'. grep can't open this file to read its content, because it doesn't exist. As a result, grep emits an error message on stderr which is still connected to our terminal.

Now, how would you go about silencing this grep statement completely? We'd like to send all the output that appears on the terminal to a file instead; let's call it proud.log:

    $ grep proud file 'not a file' > proud.log 2> proud.log

Does that look about right? We first use > to send stdout to proud.log, and then use 2> to send stderr to proud.log as well. Almost, but not quite. If you run this command and then look in proud.log, you'll see there's only an error message, not the output from stdout. We've created a very bad condition here. After stdout has written its output to the log file, the error message needs to be written to the log file. The stderr redirection truncates the log file and writes its own information there. Things would go even more wrong if after that new information arrives on stdout. stdout's write operation might cancel an active write operation from stderr to continue writing its own information.

We need to prevent having two FDs working on the same destination or source. We can do this by duplicating FDs:

    $ grep proud file 'not a file' > proud.log 2>&1

You need to remember to always read file redirections from left to right. This is the order in which BASH assigns and processes them. First, stdout is changed so that it points to our proud.log. Then, we use the >& syntax to duplicate FD 1 and put this duplicate in FD 2's stead. If this is hard for you to grasp, you could read this as: stdout becomes proud.log and stderr becomes stdout (which is proud.log). As a result, stdout obviously writes its information to proud.log, but stderr writes its information to whatever stdout was when the >& redirector was used; which was proud.log as well. In this case, the handle that writes to proud.log is the same for both stdout and stderr, and no collisions occur.

Be careful not to confuse the order:

    $ grep proud file 'not a file' 2>&1 > proud.log

This would read as stderr becomes stdout (which is the terminal), and then stdout becomes proud.log. As a result, stdout's messages will be logged, but the error messages will still go to the terminal. Oops.

Note:
For compatibility reasons with other shells, BASH also makes yet another form of redirection available to you. The &> redirection operator is actually just a shorter version of what we did here; redirecting both stdout and stderr to a file:'

    $ grep proud file 'not a file' &> proud.log

This is the same as > proud.log 2>&1, but not portable to BourneShell. It is not recommended practice.

TODO: Moving FDs and Opening FDs RW.




2.3. Heredocs And Herestrings

Files aren't all that. They're boring, really. Strings are so much more interesting. They're not permanent like files on a hard disk, but they're easy to work with, easy to make and easy to manipulate.

Heredocs and Herestrings allow you to perform Redirection as you would with files, just by using strings instead. Let's try it out!

    $ grep proud <<END
    > I am a proud sentence.
    > END
    I am a proud sentence.

This is a Heredoc (or Here Document). Heredocs aren't really useful unless you're trying to embed long strings of several lines inside your scripts, which is bad practice. You should keep your logic (your code) and your input (your data) separated, preferably in different files.

The way Heredocs work, is by adding the <<STRING operator at the end of a command. That'll instruct that command's stdin that it has to start reading from the script (or the command line, if you're not in a script). The input of the Heredoc stops as soon as you repeat whatever string you added to the end of the <<. In the example above, I used the string END; but it can really be anything (so long as you quote it if it has whitespace).

Beware that all text following the Heredoc operator is sent to the command's stdin almost literally ($ and backticks have to be escaped with \ or will be expanded). That also means any spaces you use for indenting your script. The terminator string (in our case END) must be in the beginning of the line.

    echo "Let's test abc:"
    if [[ abc = a* ]]; then
        cat <<END
            abc seems to start with an a!
    END
    fi

Will result in:

    Let's test abc:
            abc seems to start with an a!

You can avoid this by temporarily removing the indentation for the lines of your Heredocs. However, that distorts your pretty and consistent indendation. There is an alternative. If you use <<-END instead of <<END as your Heredoc operator, BASH removes any tab characters in the beginning of each line of your Heredoc content before sending it to the command. That way you can still use tabs to indent your Heredoc content with the rest of your code. Those tabs will not be sent to the command that receives your Heredoc. This also means you can use tabs to indent your terminator string.

Let's check out the very similar but more interesting Herestrings:

    $ grep proud <<<"I am a proud sentence"
    I am a proud sentence.

This time, stdin reads its information straight from the string you put after the <<< operator. This is very convenient to send data that's in variables into processes:

    $ grep proud <<<"$USER sits proudly on his throne in $HOSTNAME."
    lhunath sits proudly on his throne in Lyndir.

Herestrings are shorter, less intrusive and overall more convenient than their bulky Heredoc counterpart.

Later on, you will learn about pipes and how they can be used to send the output of a command into another command's stdin. Many people use pipes to send the output of a variable as stdin into a command. However, for this purpose, Herestrings should be preferred. They do not create a subshell and are lighter both to the shell and to the style of your shell script:

    $ echo 'Wrap this silly sentence.' | fmt -t -w 20
    Wrap this silly
       sentence.
    $ fmt -t -w 20 <<< 'Wrap this silly sentence.'
    Wrap this silly
       sentence.


  • Good Practice:
    Long heredocs are usually a bad idea because scripts should contain logic, not data. If you have a large document that your script needs, you should ship it in a separate file along with your script. Herestrings, however, come in handy quite often, especially for sending variable content to processes like grep or sed instead of files.



3. Pipes

Now that you can effortlessly manipulate File Descriptors to direct certain types of output to certain files, it's time you learn some more ingenious tricks available through I/O redirection.

You can use File Redirection to write output to files or read input from files. But what if you want to connect the output of one application directly to the input of another? That way, you could build a sort of chain to process output. If you already know about FIFOs, you could use something like this to that end:

    $ ls
    $ mkfifo myfifo; ls
    myfifo
    $ grep bea myfifo &
    [1] 32635
    $ echo "rat
    > cow
    > deer
    > bear
    > snake" > myfifo
    bear

We use the mkfifo command to create a new file in the current directory named 'myfifo'. This is no ordinary file, however, but a FIFO. FIFOs are files that serve data on a First In, First Out-basis. When you read from a FIFO, you will only receive data as soon as another process writes to it. As such, a FIFO never really contains any data. So long as no process writes data to a file that can be read, any read operation on the FIFO will block as it waits for data to become available. The same works for writes to the file. They will block until another process reads from the FIFO.

In our example, the FIFO called myfifo is read from by grep. grep waits for data to become available on the FIFO. That's why we append the grep command with the & operator, which puts it in the background. That way, we can continue typing and executing commands while grep runs and waits for data. Our echo statement feeds data to the FIFO. As soon as this data becomes available, the running grep command reads it in and processes it. The result is displayed. We have successfully sent data from the echo command to the grep command.

But these temporary files are a real annoyance. You may not have write permissions. You need to remember to clean up any temporary files you create. You need to make sure that data is going in and out, or the FIFO might just end up blocking for no reason.

For these reasons, another feature is made available. This feature is called Pipes. It basically just connects the stdout of one process to the stdin of another; effectively piping the data from one process into another. Let's try our above example again, but using pipes:

    $ echo "rat
    > cow
    > deer
    > bear
    > snake" | grep bea
    bear

The pipe is created using the | operator inbetween two commands that are connected with the pipe. The former command's stdout is connected to the latter command's stdin. As a result, grep can read echo's output and display the result of it's operation, which is bear.

Pipes are widely used as a means of postprocessing application output. FIFOs are, in fact, also referred to as named pipes. They accomplish the same results as the pipe operator, but through a filename.

Note:
The pipe operator creates a subshell environment to run the second process in. This is important to know because any variables that you modify or initialize inside the second command will appear unmodified outside of it. Let's illustrate:

    $ message=Test
    $ echo "Salut, le monde!" | { read message; echo "The message is: $message"; }
    The message is: Salut, le monde!
    $ echo "The message is: $message"
    The message is: Test

Once the piped command ends, so does the subshell that was created for it. Along with that subshell, any modifications made in that subshell are lost. So be careful!


  • Good Practice:
    Pipes are a very attractive means of post-processing application output. You should, however, be careful not to over-use pipes. If you end up making a pipe-chain that consists of three or more applications, it is time to ask yourself whether you're doing things a smart way. You might be able to use more application features of one of the post-processing applications you've used earlier in the pipe. Each new element in a pipe chain causes a new subshell and a new application to be loaded. It also makes it very hard to follow the logic in your script!




4. Miscellaneous Operators (stub)

Feel free to complete this section.