Diff for "BashFAQ/106"

Differences between revisions 7 and 8

Logging! I want to send all of my script's output to a log file. But I want to do it from inside the script. And hey, let's add timestamps too! Let's make it as hard as possible!

Normally, if you want to run a script and send its output to a logfile, you'd simply use Redirection: myscript >log 2>&1. Or to see the output on the screen and also redirect to a file: myscript 2>&1 | tee log (or better still, run your script within the script(1) command if your system has it). If you want to insert commands into a script that cause it to do this kind of logging internally, without altering your invocation, then it gets trickier.

Setting up duplicate logging (same output in two places) within a script requires some file descriptor manipulation, and either a named pipe or Bash's ProcessSubstitution. We're going to focus on the Bash syntax.

Regular logging

As mentioned above, if you just want to run the script and log the output, you would use:

myscript >log 2>&1

If you want this to be done from within the script, then make the top of the script look like this:

#!/usr/bin/env bash
# or whichever shebang you prefer

exec >log 2>&1

Now, all of the commands in the rest of your script will inherit log as their stdout and stderr. See FileDescriptor and Redirection for more details.

Simple duplicate logging

Let's start with what seems like the simplest case: I want to tee my stdout to a logfile, as well as to the screen.

This means we want two copies of everything that gets sent to stdout -- one copy for the screen (or wherever stdout was pointing when the script started), and one for the logfile. The tee program is used for this:

# Bash
exec > >(tee mylog)

The process substitution syntax creates a named pipe (or something analogous) and runs the tee program in the background, reading from that pipe. tee makes two copies of everything it reads -- one for the mylog file (which it opens), and one for stdout, which was inherited from the script. Finally, exec redirects the shell's stdout to the pipe.

Because there is a background job that has to read and process all our output before we see it, this introduces some asynchronous delay. Consider a case like this:

# Bash
exec > >(tee mylog)

echo "A" >&2
cat file
echo "B" >&2

The lines A and B that are written to stderr don't go through the tee process - they are sent directly to stderr. However, the file that we get from cat is sent through our pipe and tee before we see it. If we run this script in a terminal, without any redirections, we're likely (but not guaranteed!) to see something like this:

~$ ./foo
A
B
~$ hi mom

There is really no way to avoid this alteration of the output order. We could slow down stderr in a similar fashion, hoping to get lucky, but there's no guarantee that the lines of each stream will be delayed equally.

Also, notice that the contents of the file were printed after the next shell prompt. Some people find this disturbing. Again, there's no clean way to avoid that, since the tee is done in a background process, but not one that's under our control. Even adding a wait command to the script has no effect. Some people add a sleep 1 to the end of the script, to give the background tee a chance to finish. This works (generally), but some people find it more offensive than the original problem.

If we avoid the Bash syntax, and set up our own named pipe and background process, then we do get control:

# Bash
mkdir -p ~/tmp || exit 1
trap 'rm -f ~/tmp/pipe$$; exit' EXIT
mkfifo ~/tmp/pipe$$
tee mylog < ~/tmp/pipe$$ & pid=$!
exec > ~/tmp/pipe$$

echo A >&2
cat bar
echo B >&2

exec >&-
wait $pid

There's still a desynchronization between stdout and stderr, but at least it no longer writes to the terminal after the script has exited:

~$ ./foo
A
B
hi mom
~$

This leads to the second variant of this question: I want to log both stdout and stderr together, keeping the lines in sync.

This one is relatively easy, as long as we don't care about destroying the separation between stdout and stderr on the terminal. We just duplicate one of the file descriptors:

# Bash
exec > >(tee mylog) 2>&1

echo A >&2
cat file
echo B >&2

In fact, that's even easier than the first variant. Everything is synchronized correctly, both on the terminal and in the log file:

~$ ./foo
A
hi mom
B
~$ cat mylog
A
hi mom
B
~$

There is still a chance of part of the output coming after the next shell prompt, though:

~$ ./foo
A
hi mom
~$ B

(This can be solved with the same named pipe and background process solution we showed before.)

The third variant of this question is also relatively simple: I want to log stdout to one file, and stderr to another file.

This is simple because we don't have the additional restriction that we must maintain synchronization of the two streams on the terminal. We just set up two log writers:

exec > >(tee mylog.stdout) 2> >(tee mylog.stderr >&2)

echo A >&2
cat bar
echo B >&2

And now our streams are logged separately. Since the logs are separate, there's no concern about the order in which the lines are written between them. However, on the terminal, we will get mixed results:

~$ ./foo
A
hi mom
B
~$ ./foo
hi mom
A
B
~$

Complex duplicate logging

Now, some people won't accept either the loss of separation between stdout and stderr, or the desynchronization of lines. They are purists, and so they ask for the most difficult form of all -- I want to log stdout and stderr together into a single file, BUT I also want them to maintain their original, separate destinations.

In order to do this, we first have to make a few notes:

If there are going to be two separate stdout and stderr streams, then some process has to write each of them.
There is no way to write a process in shell script that reads from two separate FDs whenever one of them has input available, because the shell has no poll(2) or select(2) interface.
Therefore, we'll need two separate writer processes.
The only way to keep output from two separate writers from destroying each other is to make sure they both open their output in append mode. A FD that is opened in append mode has the guaranteed property that every time data is written to it, it will jump to the end first.

So:

# Bash
> mylog
exec > >(tee -a mylog) 2> >(tee -a mylog >&2)

echo A >&2
cat file
echo B >&2

This ensures that the log file is correct. It does not guarantee that the writers finish before the next shell prompt:

~$ ./foo
A
hi mom
B
~$ cat mylog
A
hi mom
B
~$ ./foo
A
hi mom
~$ B

We could use the same named-pipe-plus-wait trick we did before (left as an exercise for the reader).

This leaves the question of whether the lines which appear on the terminal are guaranteed to appear in the correct order. At this point: I simply don't know.

It's not complicated enough. I want to add timestamps too!

Adding timestamps to a stream is a challenge all by itself, because there aren't any standarized tools to do it. You either have to install something specifically for it (e.g. multilog from daemontools), or write a filter in some programming language. Ideally, you do not want to fork a date(1) command for every line of input that you're logging, because it's too slow. You want to use builtins. Older versions of bash cannot do this. You need at least Bash 4.2 for the printf %(...)T option. Otherwise, you could write something in Perl, Python, Tcl, etc. to read lines and write them out with timestamps.

Here's a simple timestamper using Bash 4.2:

# Bash 4.2 or later
stamper() {
  while read -r; do
    printf '%(%Y-%m-%d %H:%M:%S)T %s\n' -1 "$REPLY"
  done
}

You could also use %(%s)T on systems that support %s in strftime(3), for Unix epoch timestamps. Sadly, this is not portable.

Here's a simple example using multilog:

stamper() {
  multilog t e 2>&1
}

The e option of multilog tells it to write each line to stderr, instead of to a file in an automatically rotated directory. multilog writes timestamps in TAI64N format, which is similar in nature to Unix epoch format. Use a conversion tool (e.g. tai64nlocal) to make them human-readable.

Here's an example using Tcl:

#!/usr/bin/env tclsh
while {[gets stdin line] >= 0} {
    set now [clock microseconds]
    set frac [expr {$now % 1000000}]
    set now [expr {$now / 1000000}]
    puts "[clock format $now -format {%Y-%m-%d %H:%M:%S}].$frac $line"
}

That works best as a separate, standalone script. Embedding it in a bash function is also possible, albeit ugly:

stamper() {
  tclsh <(cat <<\EOF
    while {[gets stdin line] >= 0} {
      set now [clock microseconds]
      set frac [expr {$now % 1000000}]
      set now [expr {$now / 1000000}]
      puts "[clock format $now -format {%Y-%m-%d %H:%M:%S}].$frac $line"
    }
EOF
  )
}

Pick one of these, or write another one in your preferred language. Let's call whichever one you chose stamper. For the purposes of this section, let's say we're not completely insane, so we don't care about separation of stdout and stderr. We just want it all in one file, in order, with stdout and stderr mixed together.

Now, it's somewhat easy to run a script so that its output and stderr go to the screen normally, while timestamped lines go to a logfile:

./myscript 2>&1 | tee >(stamper >logfile)

Putting this same thing inside a script is slightly more involved:

#!/bin/bash
exec > >(tee >(stamper >logfile)) 2>&1

while sleep 0.2s; do
    echo "TPS report $RANDOM"
done

And of course, buffering issues may exist, if the parts of the script that write to stamper use stdout buffering. Take that into consideration.

I don't care how slow it is! I want to call date(1) a thousand times a second! My PC doubles as my space-heater!

This is inefficient and horrible.

stamper() {
  while read -r; do
    printf '%s %s\n' "$(date +'%Y-%m-%d %H:%M:%S')" "$REPLY"
  done
}

Don't use this. It's stupid. You were warned.

-  ⇤ ← Revision 7 as of 2018-01-30 17:57:04 → 
  Size: 9952
  Editor: jetchisel
  Comment:
+   ← Revision 8 as of 2018-01-30 20:50:47 → ⇥
  Size: 10907
  Editor: GreyCat
  Comment: *sigh* If you omit the stupid way, someone will add it to the wiki. Then you have to move it, isolate it, explain why it's stupid....
-Deletions are marked like this.
+Additions are marked like this.
 Line 197:
-Adding timestamps to a stream is a challenge all by itself, because there aren't any standarized tools to do it.  You either have to install something specifically for it (e.g. [[http://cr.yp.to/daemontools/multilog.html|multilog]] from daemontools), or write a filter in some programming language.  Older versions of bash ''cannot do this''.  You need at least [[BashFAQ/061|Bash 4.2]] for the `printf %(...)T` option.  Otherwise, try Perl, Python, Tcl, etc.
+Adding timestamps to a stream is a challenge all by itself, because there aren't any standarized tools to do it.  You either have to install something specifically for it (e.g. [[http://cr.yp.to/daemontools/multilog.html|multilog]] from daemontools), or write a filter in some programming language.  Ideally, you '''do not want''' to fork a `date(1)` command for every line of input that you're logging, because it's too slow.  You want to use builtins.  Older versions of bash ''cannot do this''.  You need at least [[BashFAQ/061|Bash 4.2]] for the `printf %(...)T` option.  Otherwise, you could write something in Perl, Python, Tcl, etc. to read lines and write them out with timestamps.
 Line 203:
-while read -r; do
+stamper() {
  while read -r; do
-Line 205:
+Line 206:
-done
}}}

If GNU date is available then one can use it if bash version is lower than 4.2
{{{
printf '%s %s\n' "$(date +['%Y-%m-%d %H:%M:%S'])" "$REPLY"
+  done
}
-Line 218:
+Line 215:
-multilog t e 2>&1
}}}

The `e` option of `multilog` tells it to write each line to stderr, instead of to a file in an automatically rotated directory.  `multilog` writes in [[http://cr.yp.to/libtai/tai64.html|TAI64N]] format, which is similar in nature to Unix epoch format.  Use a conversion tool (e.g. `tai64nlocal`) to make them human-readable.
+stamper() {
  multilog t e 2>&1
}
}}}

The `e` option of `multilog` tells it to write each line to stderr, instead of to a file in an automatically rotated directory.  `multilog` writes timestamps in [[http://cr.yp.to/libtai/tai64.html|TAI64N]] format, which is similar in nature to Unix epoch format.  Use a conversion tool (e.g. `tai64nlocal`) to make them human-readable.
-Line 235:
+Line 234:
-Pick one of these, or write another one in your preferred language.  Let's call whichever one you chose '''stamper''', and let's say it's an external command, not a function, so we have some point of reference.  For the purposes of this section, let's say we're not completely insane, so we don't care about separation of stdout and stderr.  We just want it all in one file, in order, with stdout and stderr mixed together.
+That works best as a separate, standalone script.  Embedding it in a bash function is also ''possible'', albeit ugly:

{{{
stamper() {
  tclsh <(cat <<\EOF
    while {[gets stdin line] >= 0} {
      set now [clock microseconds]
      set frac [expr {$now % 1000000}]
      set now [expr {$now / 1000000}]
      puts "[clock format $now -format {%Y-%m-%d %H:%M:%S}].$frac $line"
    }
EOF
  )
}
}}}

Pick one of these, or write another one in your preferred language.  Let's call whichever one you chose '''stamper'''.  For the purposes of this section, let's say we're not completely insane, so we don't care about separation of stdout and stderr.  We just want it all in one file, in order, with stdout and stderr mixed together.
-Line 253:
+Line 268:
+And of course, [[BashFAQ/009|buffering issues]] may exist, if the parts of the script that write to '''stamper''' use stdout buffering.  Take that into consideration.

==== I don't care how slow it is!  I want to call date(1) a thousand times a second!  My PC doubles as my space-heater! ====

This is inefficient and horrible.

{{{
stamper() {
  while read -r; do
    printf '%s %s\n' "$(date +'%Y-%m-%d %H:%M:%S')" "$REPLY"
  done
}
}}}

Don't use this.  It's stupid.  You were warned.