Differences between revisions 5 and 6
Revision 5 as of 2008-11-22 14:09:30
Size: 1402
Editor: localhost
Comment: converted to 1.6 markup
Revision 6 as of 2009-05-26 20:29:42
Size: 3247
Editor: GreyCat
Comment: add more examples and explanation
Deletions are marked like this. Additions are marked like this.
Line 3: Line 3:
Process Substitution is a very useful BASH extension. It is similar to awk's {{{"command" | getline}}} and is especially important to get round subshell restrictions in pipes, eg: Process substitution is a very useful BASH extension. It is similar to awk's {{{"command" | getline}}} and is especially important to get round [[SubShell|subshells]] caused by pipelines.

Process substitution comes in two forms: `<(some command)` and `>(some command)`. Each form either causes a [[NamedPipes|FIFO]] to be created under `/tmp`, or uses a file descriptor special device, depending on the operating system. The substitution syntax is replaced by the name of the FIFO or FD, and the command inside it is run in the background.

One of the most common uses of this feature is to avoid the creation of temporary files, e.g. when using `diff(1)`:

{{{
diff <(sort < list1) <(sort < list2)
}}}

Another common use is avoiding the loss of variables inside a loop that is part of a pipeline. For example, this will fail:

{{{
# This example will fail, unless run in ksh88/ksh93
i=0
sort list1 | while read line; do
  i=$(($i + 1))
  ...
done
echo "$i lines processed" # FAILS
}}}

But this works:

{{{
# Working example, using bash syntax.
i=0
while read line; do
  ((i++))
  ...
done < <(sort list1)
echo "$i lines processed"
}}}

The difference between `<(...)` and `>(...)` is merely which way the redirections are done. With `<(...)` one is expected to ''read'' from the substitution, and the command is set up to write to it. With `>(...)` one is expected to ''write'' to the substitution, and the command inside is set up to read from it.

`>(...)` is used less frequently; the most common situation is in conjunction with `tee(1)`.

{{{
exec 3>&1 > >(tee logfile >&3) 3>&-

# Rest of script goes here
# Stdout of everything is logged, and also falls through to real stdout
# Beware of buffering issues, especially if you also have stderr (e.g.
# prompts for user input may appear before the previous line of stdout).
}}}

Here's a more complicated example:
Line 9: Line 56:
do [[ $line ]] || continue
 case $line in
 '!!! '*) errMsg "${line#'!!! '}"
;;
*important* ) echo "$line"
;;
* ) if [[ $line =~ $hasFile ]]; then
do
   
[[ $line ]] || continue
    case $line in
 '!!! '*)
           
errMsg "${line#'!!! '}" ;;
        
*important* )
           
echo "$line" ;;
        
* )
           
if [[ $line =~ $hasFile ]]; then
Line 16: Line 67:
     else spin      else
               
spin
Line 18: Line 70:
;; esac
done < <(command $options "${param[@]}" 2>&1|tee "$logfile")
            ;;
    
esac
done < <(command $options "${param[@]}" 2>&1 | tee "$logfile")

Process Substitution

Process substitution is a very useful BASH extension. It is similar to awk's "command" | getline and is especially important to get round subshells caused by pipelines.

Process substitution comes in two forms: <(some command) and >(some command). Each form either causes a FIFO to be created under /tmp, or uses a file descriptor special device, depending on the operating system. The substitution syntax is replaced by the name of the FIFO or FD, and the command inside it is run in the background.

One of the most common uses of this feature is to avoid the creation of temporary files, e.g. when using diff(1):

diff <(sort < list1) <(sort < list2)

Another common use is avoiding the loss of variables inside a loop that is part of a pipeline. For example, this will fail:

# This example will fail, unless run in ksh88/ksh93
i=0
sort list1 | while read line; do
  i=$(($i + 1))
  ...
done
echo "$i lines processed"   # FAILS

But this works:

# Working example, using bash syntax.
i=0
while read line; do
  ((i++))
  ...
done < <(sort list1)
echo "$i lines processed"

The difference between <(...) and >(...) is merely which way the redirections are done. With <(...) one is expected to read from the substitution, and the command is set up to write to it. With >(...) one is expected to write to the substitution, and the command inside is set up to read from it.

>(...) is used less frequently; the most common situation is in conjunction with tee(1).

exec 3>&1 > >(tee logfile >&3) 3>&-

# Rest of script goes here
# Stdout of everything is logged, and also falls through to real stdout
# Beware of buffering issues, especially if you also have stderr (e.g.
# prompts for user input may appear before the previous line of stdout).

Here's a more complicated example:

hasFile='Note: the (top-|highly )?secret plans are backed up at:(.*)'
criticalFile=
while read -r line
do
    [[ $line ]] || continue
    case $line in
        '!!! '*)
            errMsg "${line#'!!! '}" ;;
        *important* )
            echo "$line" ;;
        * )
            if [[ $line =~ $hasFile ]]; then
                criticalFile=${BASH_REMATCH[2]}
                warn "File at $criticalFile"
            else
                spin
            fi
            ;;
    esac
done < <(command $options "${param[@]}" 2>&1 | tee "$logfile")
[[ $criticalFile ]] || abort 'File not found'

Piping the command to a while loop would mean any variables set would be lost. Note that the actual command can be a pipeline. In fact you can continue to type a whole script in that side as well. Be aware that this is running in a subshell, and also that it will continue to run when your script exits (unless you manage your child processes.)

In the above example the regex could as easily be done with a case:

        'Note: the '*'secret plans are backed up at:'*) criticalFile=${line#*'secret plans are backed up at:'}

Process substitution where the external is an awk command, is particularly powerful and flexible.

Portability

Process substitution is definitely not portable.

ProcessSubstitution (last edited 2016-09-22 12:53:39 by geirha)