Differences between revisions 1 and 7 (spanning 6 versions)
Revision 1 as of 2010-11-19 22:10:43
Size: 1807
Editor: GreyCat
Comment: I want to tee my stdout to a log file from inside the script. And maybe stderr too. (partial)
Revision 7 as of 2018-01-30 17:57:04
Size: 9952
Editor: jetchisel
Comment:
Deletions are marked like this. Additions are marked like this.
Line 2: Line 2:
== I want to tee my stdout to a log file from inside the script. And maybe stderr too. ==
This requires some tricky file descriptor manipulation, and either a [[NamedPipes|named pipe]] or Bash's ProcessSubstitution. We're going to show the Bash syntax only.

Let's start with the simplest case: ''I want to tee my stdout to a logfile, as well as to the screen.''
== Logging! I want to send all of my script's output to a log file. But I want to do it from inside the script. And hey, let's add timestamps too! Let's make it as hard as possible! ==
Normally, if you want to run a script and send its output to a logfile, you'd simply use [[Redirection]]: `myscript >log 2>&1`. Or to see the output on the screen ''and also'' redirect to a file: `myscript 2>&1 | tee log` (or better still, run your script within the `script(1)` command if your system has it). If you want to insert commands ''into'' a script that cause it to do this kind of logging internally, without altering your invocation, then it gets trickier.

Setting up duplicate logging (same output in two places) within a script requires some [[FileDescriptor|file descriptor]] manipulation, and either a [[NamedPipes|named pipe]] or Bash's ProcessSubstitution. We're going to focus on the Bash syntax.

=== Regular logging ===
As mentioned above, if you just want to ''run'' the script and log the output, you would use:

{{{
myscript >log 2>&1
}}}

If you want this to be done from within the script, then make the top of the script look like this:

{{{#!format plain
#!/usr/bin/env bash
# or whichever shebang you prefer

exec >log 2>&1
}}}

Now, all of the commands in the rest of your script will inherit `log` as their stdout and stderr. See FileDescriptor and [[Redirection]] for more details.

=== Simple duplicate logging ===
Let's start with what seems like the simplest case: '''I want to tee my stdout to a logfile, as well as to the screen.'''
Line 14: Line 35:
The process substitution syntax creates a named pipe (or something analogous) and runs the `tee` program ''in the background'', reading from that pipe. `tee` makes two copies of everything it reads -- one for the `mylog` file (which it opens), and one for stdout, which was inherited from the script. The process substitution syntax creates a named pipe (or something analogous) and runs the `tee` program ''in the background'', reading from that pipe. `tee` makes two copies of everything it reads -- one for the `mylog` file (which it opens), and one for stdout, which was inherited from the script. Finally, [[http://www.gnu.org/software/bash/manual/bash.html#index-exec-113|exec]] redirects the shell's stdout to the pipe.
Line 27: Line 48:
The lines A and B that are written to stderr don't go through the `tee` process - they are sent directly to stderr. However, the `file` that we get from `cat` is sent through our pipe and `tee` before we see it. If we run this script in a terminal, without any redirections, we're [[RaceCondition|likely]] (not guaranteed!) to see something like this:

{{{
~$ ./y
The lines A and B that are written to stderr don't go through the `tee` process - they are sent directly to stderr. However, the `file` that we get from `cat` is sent through our pipe and `tee` before we see it. If we run this script in a terminal, without any redirections, we're ''likely'' (but [[RaceCondition|not guaranteed]]!) to see something like this:

{{{
~$ ./foo
Line 36: Line 57:
There is really no way to avoid this. We could slow down stderr in a similar fashion, hoping to get lucky, but there's no guarantee that all the lines will be delayed equally.

(More to come later!)
There is really no way to avoid this alteration of the output order. We could slow down stderr in a similar fashion, hoping to get lucky, but there's no guarantee that the lines of each stream will be delayed equally.

Also, notice that the contents of the file were printed ''after'' the next shell prompt. Some people find this disturbing. Again, there's no clean way to avoid that, since the `tee` is done in a background process, but not one that's under our control. Even adding a `wait` command to the script has no effect. Some people add a `sleep 1` to the end of the script, to give the background `tee` a chance to finish. This works (generally), but some people find it more offensive than the original problem.

If we avoid the Bash syntax, and set up our own named pipe and background process, then we do get control:

{{{
# Bash
mkdir -p ~/tmp || exit 1
trap 'rm -f ~/tmp/pipe$$; exit' EXIT
mkfifo ~/tmp/pipe$$
tee mylog < ~/tmp/pipe$$ & pid=$!
exec > ~/tmp/pipe$$

echo A >&2
cat bar
echo B >&2

exec >&-
wait $pid
}}}

There's still a desynchronization between stdout and stderr, but at least it no longer writes to the terminal after the script has exited:

{{{
~$ ./foo
A
B
hi mom
~$
}}}

This leads to the second variant of this question: '''I want to log both stdout and stderr together, keeping the lines in sync.'''

This one is relatively easy, as long as we don't care about destroying the separation between stdout and stderr on the terminal. We just duplicate one of the file descriptors:

{{{
# Bash
exec > >(tee mylog) 2>&1

echo A >&2
cat file
echo B >&2
}}}

In fact, that's even ''easier than the first variant''. Everything is synchronized correctly, both on the terminal and in the log file:

{{{
~$ ./foo
A
hi mom
B
~$ cat mylog
A
hi mom
B
~$
}}}

There is still a chance of part of the output coming after the next shell prompt, though:

{{{
~$ ./foo
A
hi mom
~$ B
}}}

(This can be solved with the same named pipe and background process solution we showed before.)

The third variant of this question is also relatively simple: '''I want to log stdout to one file, and stderr to another file.'''

This is simple because we ''don't'' have the additional restriction that we must maintain synchronization of the two streams on the terminal. We just set up two log writers:

{{{
#!/bin/bash
exec > >(tee mylog.stdout) 2> >(tee mylog.stderr >&2)

echo A >&2
cat bar
echo B >&2
}}}

And now our streams are logged separately. Since the logs are separate, there's no concern about the order in which the lines are written between them. However, on the terminal, we will get mixed results:

{{{
~$ ./foo
A
hi mom
B
~$ ./foo
hi mom
A
B
~$
}}}

=== Complex duplicate logging ===
Now, some people won't accept either the loss of separation between stdout and stderr, or the desynchronization of lines. They are purists, and so they ask for the most difficult form of all -- '''I want to log stdout and stderr together into a single file, BUT I also want them to maintain their original, separate destinations.'''

In order to do this, we first have to make a few notes:
 * If there are going to be two separate stdout and stderr streams, then some process has to write each of them.
 * There is no way to write a process in shell script that reads from ''two'' separate FDs whenever one of them has input available, because the shell has no `poll(2)` or `select(2)` interface.
 * Therefore, we'll need ''two'' separate writer processes.
 * The only way to keep output from two separate writers from destroying each other is to make sure they both open their output in ''append'' mode. A FD that is opened in append mode has the guaranteed property that every time data is written to it, it will jump to the end first.

So:

{{{
# Bash
> mylog
exec > >(tee -a mylog) 2> >(tee -a mylog >&2)

echo A >&2
cat file
echo B >&2
}}}

This ensures that the log file is correct. It does not guarantee that the writers finish before the next shell prompt:

{{{
~$ ./foo
A
hi mom
B
~$ cat mylog
A
hi mom
B
~$ ./foo
A
hi mom
~$ B
}}}

We could use the same named-pipe-plus-wait trick we did before (left as an exercise for the reader).

This leaves the question of whether the lines which appear on the terminal are guaranteed to appear in the correct order. At this point: I simply don't know.

=== It's not complicated enough. I want to add timestamps too! ===
Adding timestamps to a stream is a challenge all by itself, because there aren't any standarized tools to do it. You either have to install something specifically for it (e.g. [[http://cr.yp.to/daemontools/multilog.html|multilog]] from daemontools), or write a filter in some programming language. Older versions of bash ''cannot do this''. You need at least [[BashFAQ/061|Bash 4.2]] for the `printf %(...)T` option. Otherwise, try Perl, Python, Tcl, etc.

Here's a simple timestamper using Bash 4.2:

{{{
# Bash 4.2 or later
while read -r; do
    printf '%(%Y-%m-%d %H:%M:%S)T %s\n' -1 "$REPLY"
done
}}}

If GNU date is available then one can use it if bash version is lower than 4.2
{{{
printf '%s %s\n' "$(date +['%Y-%m-%d %H:%M:%S'])" "$REPLY"
}}}

You could also use `%(%s)T` on systems that support `%s` in `strftime(3)`, for [[BashFAQ/070|Unix epoch]] timestamps. Sadly, this is not portable.

Here's a simple example using multilog:

{{{
multilog t e 2>&1
}}}

The `e` option of `multilog` tells it to write each line to stderr, instead of to a file in an automatically rotated directory. `multilog` writes in [[http://cr.yp.to/libtai/tai64.html|TAI64N]] format, which is similar in nature to Unix epoch format. Use a conversion tool (e.g. `tai64nlocal`) to make them human-readable.

Here's an example using Tcl:

{{{#!format plain
#!/usr/bin/env tclsh
while {[gets stdin line] >= 0} {
    set now [clock microseconds]
    set frac [expr {$now % 1000000}]
    set now [expr {$now / 1000000}]
    puts "[clock format $now -format {%Y-%m-%d %H:%M:%S}].$frac $line"
}
}}}

Pick one of these, or write another one in your preferred language. Let's call whichever one you chose '''stamper''', and let's say it's an external command, not a function, so we have some point of reference. For the purposes of this section, let's say we're not completely insane, so we don't care about separation of stdout and stderr. We just want it all in one file, in order, with stdout and stderr mixed together.

Now, it's ''somewhat'' easy to run a script so that its output and stderr go to the screen normally, while timestamped lines go to a logfile:

{{{
./myscript 2>&1 | tee >(stamper >logfile)
}}}

Putting this same thing ''inside'' a script is slightly more involved:

{{{#!format plain
#!/bin/bash
exec > >(tee >(stamper >logfile)) 2>&1

while sleep 0.2s; do
    echo "TPS report $RANDOM"
done
}}}

Logging! I want to send all of my script's output to a log file. But I want to do it from inside the script. And hey, let's add timestamps too! Let's make it as hard as possible!

Normally, if you want to run a script and send its output to a logfile, you'd simply use Redirection: myscript >log 2>&1. Or to see the output on the screen and also redirect to a file: myscript 2>&1 | tee log (or better still, run your script within the script(1) command if your system has it). If you want to insert commands into a script that cause it to do this kind of logging internally, without altering your invocation, then it gets trickier.

Setting up duplicate logging (same output in two places) within a script requires some file descriptor manipulation, and either a named pipe or Bash's ProcessSubstitution. We're going to focus on the Bash syntax.

Regular logging

As mentioned above, if you just want to run the script and log the output, you would use:

myscript >log 2>&1

If you want this to be done from within the script, then make the top of the script look like this:

#!/usr/bin/env bash
# or whichever shebang you prefer

exec >log 2>&1

Now, all of the commands in the rest of your script will inherit log as their stdout and stderr. See FileDescriptor and Redirection for more details.

Simple duplicate logging

Let's start with what seems like the simplest case: I want to tee my stdout to a logfile, as well as to the screen.

This means we want two copies of everything that gets sent to stdout -- one copy for the screen (or wherever stdout was pointing when the script started), and one for the logfile. The tee program is used for this:

# Bash
exec > >(tee mylog)    

The process substitution syntax creates a named pipe (or something analogous) and runs the tee program in the background, reading from that pipe. tee makes two copies of everything it reads -- one for the mylog file (which it opens), and one for stdout, which was inherited from the script. Finally, exec redirects the shell's stdout to the pipe.

Because there is a background job that has to read and process all our output before we see it, this introduces some asynchronous delay. Consider a case like this:

# Bash
exec > >(tee mylog)

echo "A" >&2
cat file
echo "B" >&2

The lines A and B that are written to stderr don't go through the tee process - they are sent directly to stderr. However, the file that we get from cat is sent through our pipe and tee before we see it. If we run this script in a terminal, without any redirections, we're likely (but not guaranteed!) to see something like this:

~$ ./foo
A
B
~$ hi mom

There is really no way to avoid this alteration of the output order. We could slow down stderr in a similar fashion, hoping to get lucky, but there's no guarantee that the lines of each stream will be delayed equally.

Also, notice that the contents of the file were printed after the next shell prompt. Some people find this disturbing. Again, there's no clean way to avoid that, since the tee is done in a background process, but not one that's under our control. Even adding a wait command to the script has no effect. Some people add a sleep 1 to the end of the script, to give the background tee a chance to finish. This works (generally), but some people find it more offensive than the original problem.

If we avoid the Bash syntax, and set up our own named pipe and background process, then we do get control:

# Bash
mkdir -p ~/tmp || exit 1
trap 'rm -f ~/tmp/pipe$$; exit' EXIT
mkfifo ~/tmp/pipe$$
tee mylog < ~/tmp/pipe$$ & pid=$!
exec > ~/tmp/pipe$$

echo A >&2
cat bar
echo B >&2

exec >&-
wait $pid

There's still a desynchronization between stdout and stderr, but at least it no longer writes to the terminal after the script has exited:

~$ ./foo
A
B
hi mom
~$

This leads to the second variant of this question: I want to log both stdout and stderr together, keeping the lines in sync.

This one is relatively easy, as long as we don't care about destroying the separation between stdout and stderr on the terminal. We just duplicate one of the file descriptors:

# Bash
exec > >(tee mylog) 2>&1

echo A >&2
cat file
echo B >&2

In fact, that's even easier than the first variant. Everything is synchronized correctly, both on the terminal and in the log file:

~$ ./foo
A
hi mom
B
~$ cat mylog
A
hi mom
B
~$ 

There is still a chance of part of the output coming after the next shell prompt, though:

~$ ./foo
A
hi mom
~$ B

(This can be solved with the same named pipe and background process solution we showed before.)

The third variant of this question is also relatively simple: I want to log stdout to one file, and stderr to another file.

This is simple because we don't have the additional restriction that we must maintain synchronization of the two streams on the terminal. We just set up two log writers:

exec > >(tee mylog.stdout) 2> >(tee mylog.stderr >&2)

echo A >&2
cat bar
echo B >&2

And now our streams are logged separately. Since the logs are separate, there's no concern about the order in which the lines are written between them. However, on the terminal, we will get mixed results:

~$ ./foo
A
hi mom
B
~$ ./foo
hi mom
A
B
~$

Complex duplicate logging

Now, some people won't accept either the loss of separation between stdout and stderr, or the desynchronization of lines. They are purists, and so they ask for the most difficult form of all -- I want to log stdout and stderr together into a single file, BUT I also want them to maintain their original, separate destinations.

In order to do this, we first have to make a few notes:

  • If there are going to be two separate stdout and stderr streams, then some process has to write each of them.
  • There is no way to write a process in shell script that reads from two separate FDs whenever one of them has input available, because the shell has no poll(2) or select(2) interface.

  • Therefore, we'll need two separate writer processes.

  • The only way to keep output from two separate writers from destroying each other is to make sure they both open their output in append mode. A FD that is opened in append mode has the guaranteed property that every time data is written to it, it will jump to the end first.

So:

# Bash
> mylog
exec > >(tee -a mylog) 2> >(tee -a mylog >&2)

echo A >&2
cat file
echo B >&2

This ensures that the log file is correct. It does not guarantee that the writers finish before the next shell prompt:

~$ ./foo
A
hi mom
B
~$ cat mylog
A
hi mom
B
~$ ./foo
A
hi mom
~$ B

We could use the same named-pipe-plus-wait trick we did before (left as an exercise for the reader).

This leaves the question of whether the lines which appear on the terminal are guaranteed to appear in the correct order. At this point: I simply don't know.

It's not complicated enough. I want to add timestamps too!

Adding timestamps to a stream is a challenge all by itself, because there aren't any standarized tools to do it. You either have to install something specifically for it (e.g. multilog from daemontools), or write a filter in some programming language. Older versions of bash cannot do this. You need at least Bash 4.2 for the printf %(...)T option. Otherwise, try Perl, Python, Tcl, etc.

Here's a simple timestamper using Bash 4.2:

# Bash 4.2 or later
while read -r; do
    printf '%(%Y-%m-%d %H:%M:%S)T %s\n' -1 "$REPLY"
done

If GNU date is available then one can use it if bash version is lower than 4.2

printf '%s %s\n' "$(date +['%Y-%m-%d %H:%M:%S'])" "$REPLY"

You could also use %(%s)T on systems that support %s in strftime(3), for Unix epoch timestamps. Sadly, this is not portable.

Here's a simple example using multilog:

multilog t e 2>&1

The e option of multilog tells it to write each line to stderr, instead of to a file in an automatically rotated directory. multilog writes in TAI64N format, which is similar in nature to Unix epoch format. Use a conversion tool (e.g. tai64nlocal) to make them human-readable.

Here's an example using Tcl:

#!/usr/bin/env tclsh
while {[gets stdin line] >= 0} {
    set now [clock microseconds]
    set frac [expr {$now % 1000000}]
    set now [expr {$now / 1000000}]
    puts "[clock format $now -format {%Y-%m-%d %H:%M:%S}].$frac $line"
}

Pick one of these, or write another one in your preferred language. Let's call whichever one you chose stamper, and let's say it's an external command, not a function, so we have some point of reference. For the purposes of this section, let's say we're not completely insane, so we don't care about separation of stdout and stderr. We just want it all in one file, in order, with stdout and stderr mixed together.

Now, it's somewhat easy to run a script so that its output and stderr go to the screen normally, while timestamped lines go to a logfile:

./myscript 2>&1 | tee >(stamper >logfile)

Putting this same thing inside a script is slightly more involved:

#!/bin/bash
exec > >(tee >(stamper >logfile)) 2>&1

while sleep 0.2s; do
    echo "TPS report $RANDOM"
done

BashFAQ/106 (last edited 2023-09-24 17:44:15 by 76)