Differences between revisions 6 and 35 (spanning 29 versions)
Revision 6 as of 2010-04-08 16:09:28
Size: 3598
Editor: WillDye
Comment: Added a reference to 'coproc', and rephrased some of the text
Revision 35 as of 2023-02-17 13:07:23
Size: 6117
Editor: emanuele6
Comment: fix typo: i meant to write "in the next version of POSIX"; let's just say "in POSIX Issue 8"
Deletions are marked like this. Additions are marked like this.
Line 2: Line 2:
== My command line produces no output: tail -f logfile | grep 'foo bar' ==
Most standard Unix commands buffer their output when used non-interactively.
This means that they don't write each character (or even each line) as the input arrives,
but instead collect a larger number of characters
(often 4 kilobytes) before printing anything at all.
In the case above, the {{{tail}}} command buffers its output,
and therefore {{{grep}}} only gets its input in e.g. 4K blocks.
== What is buffering? Or, why does my command line produce no output: tail -f logfile | grep 'foo bar' | awk ... ==
Most standard Unix commands buffer their output when used non-interactively. This means that they don't write each character (or even each line) immediately, but instead collect a larger number of characters (often 4 kilobytes) before printing anything at all. In the case above, the `grep` command buffers its output, and therefore `awk` only gets its input in large chunks.
Line 10: Line 5:
Buffering greatly increases the efficiency of I/O operations,
and it's usually done in a way that doesn't visibly affect the user.
A simple "tail -f" from an interactive terminal session works just fine,
but when commands are in scripts, functions, or part of a complicated set of pipes,
the command might not recognize that the final output is being used interactively.
Fortunately, there are several techniques available for controlling I/O buffering behavior.
Buffering greatly increases the efficiency of I/O operations, and it's usually done in a way that doesn't visibly affect the user. A simple `tail -f` from an interactive terminal session works just fine, but when a command is part of a complicated pipeline, the command might not recognize that the final output is needed in (near) real time. Fortunately, there are several techniques available for controlling I/O buffering behavior.

The most important thing to understand about buffering is that it's the ''writer'' who's doing it, not the reader.

==== Eliminate unnecessary commands ====
In the question, we have the pipeline `tail -f logfile | grep 'foo bar' | awk ...` (with the actual AWK command being unspecified). There is no problem if we simply run `tail -f logfile`, because `tail -f` never buffers its output. Nor is there a problem if we run `tail -f logfile | grep 'foo bar'` interactively, because `grep` does not buffer its output if its standard output is a terminal. However, if the output of `grep` is being piped into something else (such as an AWK command), it starts buffering to improve efficiency.

In this particular example, the `grep` is actually redundant. We can remove it, and have AWK perform the filtering in addition to whatever else it's doing:

{{{#!highlight bash
tail -f logfile | awk '/foo bar/ ...'
}}}
In other cases, this sort of consolidation may not be possible. But you should always look for the simplest solution first.
Line 19: Line 21:
||awk (GNU awk, nawk, busybox awk, mawk) ||use the `fflush()` function. [[https://austingroupbugs.net/view.php?id=634|It will be defined in POSIX Issue 8]]||
||awk (mawk) ||`-W interactive` ||
||find (GNU) ||use `-printf` with the `\c` escape ||
||grep (e.g. GNU version 2.5.1) ||`--line-buffered` ||
||jq ||`--unbuffered` ||
||python ||`-u` ||
||sed (e.g. GNU version 4.0.6) ||`-u,--unbuffered` ||
||tcpdump, tethereal ||`-l` ||
Line 20: Line 30:
||grep (e.g. GNU version 2.5.1)||{{{--line-buffered}}}||
||sed (e.g. GNU version 4.0.6)||{{{-u,--unbuffered}}}||
||awk (some GNU versions)||{{{-W interactive, or use the fflush() function}}}||
||tcpdump, tethereal||{{{-l}}}||

Each command that writes to a pipe would have to be told to disable buffering, in order for the entire pipeline to run in (near) real time. The last command in the pipeline, if it's writing to a terminal, will not typically need any special consideration.

==== Disabling buffering in a C application ====
If the buffering application is written in C, and is either your own or one whose source you can modify, you can disable the buffering with:

{{{#!highlight c
setvbuf(stdout, 0, _IONBF, 0);
}}}
Often, you can simply add this at the top of the `main()` function, but if the application closes and reopens stdout, or explicitly calls `setvbuf()` later, you may need to exercise more discretion.
Line 26: Line 42:
The {{{expect}}} package has an
[[http://expect.nist.gov/example/unbuffer.man.html|unbuffer]]
program which effectively tricks other programs into always behaving
as if they were being used interactively (which should disable buffering).
Here's a simple example:
{{{
    unbuffer tail -f logfile | grep 'foo bar'
The [[http://expect.sourceforge.net/|expect]] package has an [[http://expect.sourceforge.net/example/unbuffer.man.html|unbuffer]] program which effectively tricks other programs into always behaving as if they were being used interactively (which may often disable buffering). Here's a simple example:

{{{#!highlight bash
tail -f logfile | unbuffer grep 'foo bar' | awk ...
Line 34: Line 47:
{{{expect}}} and {{{unbuffer}}} may already be installed on your system.
If not, the {{{expect}}} package can be found at: http://expect.nist.gov/
==== tee ====
At least the GNU version of {{{tee}}} appears to produce unbuffered output. For example:
{{{
   $ program | tee -a program.log
`expect` and `unbuffer` are not standard POSIX tools, but they may already be installed on your system.
Line 41: Line 49:
   In another window:
   $ tail -f program.log | grep whatever
==== stdbuf ====
Recent versions of [[http://www.gnu.org/software/coreutils/|GNU coreutils]] (from 7.5 onwards) come with a nice utility called [[http://www.gnu.org/software/coreutils/manual/coreutils.html#stdbuf-invocation|stdbuf]], which can be used among other things to "unbuffer" the standard output of a command. Here's the basic usage for our example:

{{{#!highlight bash
tail -f logfile | stdbuf -oL grep 'foo bar' | awk ...
Line 44: Line 55:
This has only been tested on GNU {{{tee}}}, so [[http://en.wiktionary.org/wiki/your_mileage_may_vary|YMMV]]. In the above code, `-oL` makes stdout line buffered; you can even use `-o0` to entirely disable buffering. The man and info pages have all the details.

`stdbuf` is not a standard POSIX tool, but it may already be installed on your system (if you're using a recent GNU/Linux distribution, it will probably be present).
Line 47: Line 60:
If you simply wanted to highlight the search term,
rather than filter out non-matching lines, you can use the {{{less}}} program instead of Bash:
{{{
   $ less program.log
If you simply wanted to highlight the search term, rather than filter out non-matching lines, you can use the `less` program instead of a filtered `tail -f`:

{{{#!highlight bash
less program.log
Line 52: Line 65:
 * Inside {{{less}}}, start a search with the '/' command (similar to searching in vi).  * Inside `less`, start a search with the '/' command (similar to searching in vi). Or start less with the `-p pattern` option.
Line 54: Line 67:
 * Now put {{{less}}} into "follow" mode, which by default is bound to shift+f.  * Now put `less` into "follow" mode, which by default is bound to shift+f.
Line 57: Line 70:
"follow" mode is stopped with an interrupt, which is probably control+c on your system.
The '/' command accepts regular expressions,
so you could do things like highlight the entire line on which a term appears.
For details, consult {{{man less}}}.
"Follow" mode is stopped with an interrupt, which is probably control+c on your system. The '/' command accepts regular expressions, so you could do things like highlight the entire line on which a term appears. For details, consult `man less`.
Line 63: Line 73:
If you're using ksh or Bash 4.0+,
whatever you're really trying to do with {{{tail -f}}} might benefit from using
[[http://bash-hackers.org/wiki/doku.php/syntax/keywords/coproc|coproc]]
and fflush() to create a coprocess.
Note well that {{{coproc}}} does '''not''' itself address buffering issues
(in fact it's prone to buffering problems -- hence the reference to fflush).
{{{coproc}}}
is only mentioned here because whenever someone is trying to
continuously monitor and react to a still-growing file (or pipe),
they might be trying to do something which would benefit from coprocesses.
If you're using ksh or Bash 4.0+, whatever you're really trying to do with `tail -f` might benefit from using [[http://wiki.bash-hackers.org/syntax/keywords/coproc|coproc]] and fflush() to create a coprocess. Note well that `coproc` does '''not''' itself address buffering issues (in fact it's prone to buffering problems -- hence the reference to fflush). `coproc` is only mentioned here because whenever someone is trying to continuously monitor and react to a still-growing file (or pipe), they might be trying to do something which would benefit from coprocesses.

==== Further reading ====
 * http://www.pixelbeat.org/programming/stdio_buffering

What is buffering? Or, why does my command line produce no output: tail -f logfile | grep 'foo bar' | awk ...

Most standard Unix commands buffer their output when used non-interactively. This means that they don't write each character (or even each line) immediately, but instead collect a larger number of characters (often 4 kilobytes) before printing anything at all. In the case above, the grep command buffers its output, and therefore awk only gets its input in large chunks.

Buffering greatly increases the efficiency of I/O operations, and it's usually done in a way that doesn't visibly affect the user. A simple tail -f from an interactive terminal session works just fine, but when a command is part of a complicated pipeline, the command might not recognize that the final output is needed in (near) real time. Fortunately, there are several techniques available for controlling I/O buffering behavior.

The most important thing to understand about buffering is that it's the writer who's doing it, not the reader.

Eliminate unnecessary commands

In the question, we have the pipeline tail -f logfile | grep 'foo bar' | awk ... (with the actual AWK command being unspecified). There is no problem if we simply run tail -f logfile, because tail -f never buffers its output. Nor is there a problem if we run tail -f logfile | grep 'foo bar' interactively, because grep does not buffer its output if its standard output is a terminal. However, if the output of grep is being piped into something else (such as an AWK command), it starts buffering to improve efficiency.

In this particular example, the grep is actually redundant. We can remove it, and have AWK perform the filtering in addition to whatever else it's doing:

   1 tail -f logfile | awk '/foo bar/ ...'

In other cases, this sort of consolidation may not be possible. But you should always look for the simplest solution first.

Your command may already support unbuffered output

Some programs provide special command line options specifically for this sort of problem:

awk (GNU awk, nawk, busybox awk, mawk)

use the fflush() function. It will be defined in POSIX Issue 8

awk (mawk)

-W interactive

find (GNU)

use -printf with the \c escape

grep (e.g. GNU version 2.5.1)

--line-buffered

jq

--unbuffered

python

-u

sed (e.g. GNU version 4.0.6)

-u,--unbuffered

tcpdump, tethereal

-l

Each command that writes to a pipe would have to be told to disable buffering, in order for the entire pipeline to run in (near) real time. The last command in the pipeline, if it's writing to a terminal, will not typically need any special consideration.

Disabling buffering in a C application

If the buffering application is written in C, and is either your own or one whose source you can modify, you can disable the buffering with:

   1 setvbuf(stdout, 0, _IONBF, 0);

Often, you can simply add this at the top of the main() function, but if the application closes and reopens stdout, or explicitly calls setvbuf() later, you may need to exercise more discretion.

unbuffer

The expect package has an unbuffer program which effectively tricks other programs into always behaving as if they were being used interactively (which may often disable buffering). Here's a simple example:

   1 tail -f logfile | unbuffer grep 'foo bar' | awk ...

expect and unbuffer are not standard POSIX tools, but they may already be installed on your system.

stdbuf

Recent versions of GNU coreutils (from 7.5 onwards) come with a nice utility called stdbuf, which can be used among other things to "unbuffer" the standard output of a command. Here's the basic usage for our example:

   1 tail -f logfile | stdbuf -oL grep 'foo bar' | awk ...

In the above code, -oL makes stdout line buffered; you can even use -o0 to entirely disable buffering. The man and info pages have all the details.

stdbuf is not a standard POSIX tool, but it may already be installed on your system (if you're using a recent GNU/Linux distribution, it will probably be present).

less

If you simply wanted to highlight the search term, rather than filter out non-matching lines, you can use the less program instead of a filtered tail -f:

   1 less program.log
  • Inside less, start a search with the '/' command (similar to searching in vi). Or start less with the -p pattern option.

  • This should highlight any instances of the search term.
  • Now put less into "follow" mode, which by default is bound to shift+f.

  • You should get an unfiltered tail of the specified file, with the search term highlighted.

"Follow" mode is stopped with an interrupt, which is probably control+c on your system. The '/' command accepts regular expressions, so you could do things like highlight the entire line on which a term appears. For details, consult man less.

coproc

If you're using ksh or Bash 4.0+, whatever you're really trying to do with tail -f might benefit from using coproc and fflush() to create a coprocess. Note well that coproc does not itself address buffering issues (in fact it's prone to buffering problems -- hence the reference to fflush). coproc is only mentioned here because whenever someone is trying to continuously monitor and react to a still-growing file (or pipe), they might be trying to do something which would benefit from coprocesses.

Further reading


CategoryShell

BashFAQ/009 (last edited 2024-03-07 20:19:09 by emanuele6)