Differences between revisions 10 and 11
Revision 10 as of 2018-03-12 18:45:49
Size: 7498
Editor: GreyCat
Comment: remove "timestamp
Revision 11 as of 2018-04-24 13:18:06
Size: 7852
Editor: GreyCat
Comment: bash 4.4 can wait for procsub
Deletions are marked like this. Additions are marked like this.
Line 35: Line 35:
The process substitution syntax creates a named pipe (or something analogous) and runs the `tee` program ''in the background'', reading from that pipe. `tee` makes two copies of everything it reads -- one for the `mylog` file (which it opens), and one for stdout, which was inherited from the script. Finally, [[http://www.gnu.org/software/bash/manual/bash.html#index-exec-113|exec]] redirects the shell's stdout to the pipe. The ProcessSubstitution syntax creates a named pipe (or something analogous) and runs the `tee` program ''in the background'', reading from that pipe. `tee` makes two copies of everything it reads -- one for the `mylog` file (which it opens), and one for stdout, which was inherited from the script. Finally, [[http://www.gnu.org/software/bash/manual/bash.html#index-exec-113|exec]] redirects the shell's stdout to the pipe.
Line 59: Line 59:
Also, notice that the contents of the file were printed ''after'' the next shell prompt. Some people find this disturbing. Again, there's no clean way to avoid that, since the `tee` is done in a background process, but not one that's under our control. Even adding a `wait` command to the script has no effect. Some people add a `sleep 1` to the end of the script, to give the background `tee` a chance to finish. This works (generally), but some people find it more offensive than the original problem. Also, notice that the contents of the file were printed ''after'' the next shell prompt. Some people find this disturbing. Again, there's no clean way to avoid that, since the `tee` is done in a background process, but not one that's under our control. Even adding a `wait` command to the script has no effect (prior to bash 4.4 -- see below). Some people add a `sleep 1` to the end of the script, to give the background `tee` a chance to finish. This works (generally), but some people find it more offensive than the original problem.
Line 89: Line 89:
This leads to the second variant of this question: '''I want to log both stdout and stderr together, keeping the lines in sync.''' In bash 4.4, a ProcessSubstitution sets the `$!` parameter, so we can explicitly `wait` for that, if we wish. It's still rather clumsy, though:

{{{
# Bash 4.4
exec > >(tee myfile)
pspid=$!
: ... stuff ...

# All done. Close stdout so the proc sub will terminate, and wait for it.
exec >&-
wait $pspid
}}}

Now, consider the second variant of this question: '''I want to log both stdout and stderr together, keeping the lines in sync.'''

Logging! I want to send all of my script's output to a log file. But I want to do it from inside the script. And I want to see it on the terminal too!

Normally, if you want to run a script and send its output to a logfile, you'd simply use Redirection: myscript >log 2>&1. Or to see the output on the screen and also redirect to a file: myscript 2>&1 | tee log (or better still, run your script within the script(1) command if your system has it). If you want to insert commands into a script that cause it to do this kind of logging internally, without altering your invocation, then it gets trickier.

Setting up duplicate logging (same output in two places) within a script requires some file descriptor manipulation, and either a named pipe or Bash's ProcessSubstitution. We're going to focus on the Bash syntax.

Regular logging

As mentioned above, if you just want to run the script and log the output, you would use:

myscript >log 2>&1

If you want this to be done from within the script, then make the top of the script look like this:

#!/usr/bin/env bash
# or whichever shebang you prefer

exec >log 2>&1

Now, all of the commands in the rest of your script will inherit log as their stdout and stderr. See FileDescriptor and Redirection for more details.

Simple duplicate logging

Let's start with what seems like the simplest case: I want to tee my stdout to a logfile, as well as to the screen.

This means we want two copies of everything that gets sent to stdout -- one copy for the screen (or wherever stdout was pointing when the script started), and one for the logfile. The tee program is used for this:

# Bash
exec > >(tee mylog)    

The ProcessSubstitution syntax creates a named pipe (or something analogous) and runs the tee program in the background, reading from that pipe. tee makes two copies of everything it reads -- one for the mylog file (which it opens), and one for stdout, which was inherited from the script. Finally, exec redirects the shell's stdout to the pipe.

Because there is a background job that has to read and process all our output before we see it, this introduces some asynchronous delay. Consider a case like this:

# Bash
exec > >(tee mylog)

echo "A" >&2
cat file
echo "B" >&2

The lines A and B that are written to stderr don't go through the tee process - they are sent directly to stderr. However, the file that we get from cat is sent through our pipe and tee before we see it. If we run this script in a terminal, without any redirections, we're likely (but not guaranteed!) to see something like this:

~$ ./foo
A
B
~$ hi mom

There is really no way to avoid this alteration of the output order. We could slow down stderr in a similar fashion, hoping to get lucky, but there's no guarantee that the lines of each stream will be delayed equally.

Also, notice that the contents of the file were printed after the next shell prompt. Some people find this disturbing. Again, there's no clean way to avoid that, since the tee is done in a background process, but not one that's under our control. Even adding a wait command to the script has no effect (prior to bash 4.4 -- see below). Some people add a sleep 1 to the end of the script, to give the background tee a chance to finish. This works (generally), but some people find it more offensive than the original problem.

If we avoid the Bash syntax, and set up our own named pipe and background process, then we do get control:

# Bash
mkdir -p ~/tmp || exit 1
trap 'rm -f ~/tmp/pipe$$; exit' EXIT
mkfifo ~/tmp/pipe$$
tee mylog < ~/tmp/pipe$$ & pid=$!
exec > ~/tmp/pipe$$

echo A >&2
cat bar
echo B >&2

exec >&-
wait $pid

There's still a desynchronization between stdout and stderr, but at least it no longer writes to the terminal after the script has exited:

~$ ./foo
A
B
hi mom
~$

In bash 4.4, a ProcessSubstitution sets the $! parameter, so we can explicitly wait for that, if we wish. It's still rather clumsy, though:

# Bash 4.4
exec > >(tee myfile)
pspid=$!
: ... stuff ...

# All done.  Close stdout so the proc sub will terminate, and wait for it.
exec >&-
wait $pspid

Now, consider the second variant of this question: I want to log both stdout and stderr together, keeping the lines in sync.

This one is relatively easy, as long as we don't care about destroying the separation between stdout and stderr on the terminal. We just duplicate one of the file descriptors:

# Bash
exec > >(tee mylog) 2>&1

echo A >&2
cat file
echo B >&2

In fact, that's even easier than the first variant. Everything is synchronized correctly, both on the terminal and in the log file:

~$ ./foo
A
hi mom
B
~$ cat mylog
A
hi mom
B
~$ 

There is still a chance of part of the output coming after the next shell prompt, though:

~$ ./foo
A
hi mom
~$ B

(This can be solved with the same named pipe and background process solution we showed before.)

The third variant of this question is also relatively simple: I want to log stdout to one file, and stderr to another file.

This is simple because we don't have the additional restriction that we must maintain synchronization of the two streams on the terminal. We just set up two log writers:

exec > >(tee mylog.stdout) 2> >(tee mylog.stderr >&2)

echo A >&2
cat bar
echo B >&2

And now our streams are logged separately. Since the logs are separate, there's no concern about the order in which the lines are written between them. However, on the terminal, we will get mixed results:

~$ ./foo
A
hi mom
B
~$ ./foo
hi mom
A
B
~$

Complex duplicate logging

Now, some people won't accept either the loss of separation between stdout and stderr, or the desynchronization of lines. They are purists, and so they ask for the most difficult form of all -- I want to log stdout and stderr together into a single file, BUT I also want them to maintain their original, separate destinations.

In order to do this, we first have to make a few notes:

  • If there are going to be two separate stdout and stderr streams, then some process has to write each of them.
  • There is no way to write a process in shell script that reads from two separate FDs whenever one of them has input available, because the shell has no poll(2) or select(2) interface.

  • Therefore, we'll need two separate writer processes.

  • The only way to keep output from two separate writers from destroying each other is to make sure they both open their output in append mode. A FD that is opened in append mode has the guaranteed property that every time data is written to it, it will jump to the end first.

So:

# Bash
> mylog
exec > >(tee -a mylog) 2> >(tee -a mylog >&2)

echo A >&2
cat file
echo B >&2

This ensures that the log file is correct. It does not guarantee that the writers finish before the next shell prompt:

~$ ./foo
A
hi mom
B
~$ cat mylog
A
hi mom
B
~$ ./foo
A
hi mom
~$ B

We could use the same named-pipe-plus-wait trick we did before (left as an exercise for the reader).

This leaves the question of whether the lines which appear on the terminal are guaranteed to appear in the correct order. At this point: I simply don't know.

It's not complicated enough. I want to add timestamps too!

Please see FAQ 107.

BashFAQ/106 (last edited 2023-09-24 17:44:15 by 76)