Differences between revisions 14 and 15
Revision 14 as of 2010-07-30 15:00:45
Size: 4040
Editor: GreyCat
Comment: coprocesses in bash 4 also
Revision 15 as of 2011-09-10 10:54:14
Size: 6352
Editor: ormaaj
Comment: Refactor. De-emphesize loops somewhat as this is a general problem with pipelines. Dilute all the random shell examples with some Bash. This is a bash wiki after all.
Deletions are marked like this. Additions are marked like this.
Line 2: Line 2:
== I set variables in a loop. Why do they suddenly disappear after the loop terminates? Or, why can't I pipe data to read? == == I set variables in a loop. Why do they disappear after the loop terminates? Or, why can't I pipe data to read? ==
Line 6: Line 6:
In most shells, each command of a pipeline is executed in a separate SubShell. Each command of a pipeline of at least two commands - where "command" can be any of: a simple or compound command, or pipeline - is executed asynchronously in a subshell. Or more simply, in most shells, each chunk of code separated by a pipe operator, including [[/CompoundCommands|compound commands]] (which includes {{{while/for/until}}} loops) are forked off and executed at the same time in separate SubShell processes, which like all subshells, each have their own isolated environment and variable scope.
Line 8: Line 8:
Non-working example:
Line 9: Line 10:
    # Non-working example (except in ksh88/ksh93)
    linecnt=0
    printf "%s\n" foo bar  | while read -r line
    # Works only in ksh88/ksh93
    typeset -i linecnt=0
    printf '%s\n' foo bar | while read -r line
Line 15: Line 17:
    echo "total number of lines: $linecnt" # prints 0

    # the problem also occurs without a loop
    var=0
    echo 2 | read -r var
    echo $var # also prints 0
    printf 'total number of lines: %s\n' "$linecnt" # prints 0
Line 23: Line 20:
The reason for this surprising behaviour is that a {{{while/for/until}}} loop runs in a SubShell when it's part of a pipeline. For the {{{while}}} loop above, a new subshell with its own copy of the variable {{{linecnt}}} is created (initial value, taken from the parent shell: "0"). This copy then is used for counting. When the {{{while}}} loop is finished, the subshell copy is discarded, and the original variable {{{linecnt}}} of the parent (whose value has not changed) is used in the {{{echo}}} command. The reason for this potentially surprising behaviour, as described above, is that each SubShell introduces a new variable context and environment. The {{{while}}} loop above is executed in a new subshell with its own copy of the variable {{{linecnt}}} created with the initial value of '0' taken from the parent shell. This copy then is used for counting. When the {{{while}}} loop is finished, the subshell copy is discarded, and the original variable {{{linecnt}}} of the parent (whose value hasn't changed) is used in the {{{echo}}} command.
Line 25: Line 22:
Different shells behave differently when using redirection or pipes with a loop: Different shells exhibit different behaviors in this situation:
Line 27: Line 24:
 * [[BASH]] creates a new process only if the loop is part of a pipeline
 * KornShell creates it only if the loop is part of a pipeline, but ''not'' if the loop is the last part of it.  (The example above actually ''works'' in ksh88 and ksh93!)
 * [[BASH]] creates a new process only if the loop is part of a pipeline.
 * KornShell creates it only if the loop is part of a pipeline, but ''not'' if the loop is the last part of it. The read example above actually ''works'' in ksh88 and ksh93! (but not mksh)
Line 30: Line 27:

More broken stuff:
{{{
    # Bash 4
    # The problem also occurs without a loop
    printf '%s\n' foo bar | mapfile -t line
    printf 'total number of lines: %s\n' "${#line[@]}" # prints 0
}}}

{{{
    f() {
        if [[ -t 0 ]]; then
            echo "$1"
        else
            read -r var
        fi
    };

    f 'hello' | f
    echo "$var" # prints nothing
}}}

Again, in both cases the pipeline causes {{{read}}} or some containing command to run in a subshell, so its effect is never witnessed in the parent process.

 It should be stressed that this issue isn't specific to loops. It's a general property of all pipes, though the "{{{while/read}}}" loop might be considered the cannonical example that crops up over and over when people read the help or manpage description of the {{{read}}} builtin and notice that it accepts data on stdin. They might recall that data redirected into a compound command is available throughout that command, but not understand why all the fancy process substitutions and redirects they run across in places like [[BashFAQ/001|FAQ #1]] are necessary. Naturally they proceed to put their funstuff directly into a pipeline, and confusion insues.
Line 33: Line 55:
Several possibilities to avoid the subshell exist:

 * If the input is a file, remove the '''useless use of cat''':
 * If the input is a file, a simple redirect will suffice:
Line 38: Line 58:
  # POSIX
  while read -r line; do linecnt=$(($linecnt+1)); done < file
  echo $linecnt
   # POSIX
   while read -r line; do linecnt=$(($linecnt+1)); done < file
    echo $linecnt
Line 43: Line 63:
 Unfortunately this doesn't work with a Bourne shell; see [[http://heirloom.sourceforge.net/sh/sh.1.html#20|sh(1) from the Heirloom Bourne Shell]] for a workaround.  Unfortunately, this doesn't work with a Bourne shell; see [[http://heirloom.sourceforge.net/sh/sh.1.html#20|sh(1) from the Heirloom Bourne Shell]] for a workaround.
Line 45: Line 65:
 * '''Group the commands''' and do it all in the subshell:  * Use [[BashGuide/CompoundCommands#Command_grouping|command grouping]] and do everything in the subshell:
Line 48: Line 68:
  # POSIX
  linecnt=0
  cat /etc/passwd |
 
{
   while read -r line ; do
         linecnt=$((linecnt+1))
      done
   echo "total number of lines: $linecnt"
  }
   # POSIX
    linecnt=0
   cat /etc/passwd | {
    while read -r line ; do
        linecnt=$((linecnt+1))
    done
    echo "total number of lines: $linecnt"
    }
Line 59: Line 78:
 * Use ProcessSubstitution (BASH only):  This doesn't really change the subshell situation, but if nothing from the subshell is needed in the rest of your code then destroying the local environment after you're through with it could be just what you want anyway.

 * Use ProcessSubstitution (Bash only):
Line 64: Line 85:
       linecnt=$((linecnt+1))         ((linecnt++))
Line 69: Line 90:
 See also [[BashFAQ/001|FAQ #1]]  This is essentially identical to the first workaround above. We still redirect a file, only this time the file happens to be a named pipe temporarily created by our process substitution to transport the output of grep.
Line 71: Line 92:
 * Use a [[NamedPipes|named pipe]] (POSIX):  * Use a [[NamedPipes|named pipe]]:
Line 73: Line 95:
   # POSIX
   mkfifo mypipe
   grep PATH /etc/profile > mypipe &
   while read -r line;do
       linecnt=$(($linecnt+1))
   done < mypipe
   echo "total number of lines: $linecnt"
  # POSIX
  mkfifo mypipe
  grep PATH /etc/profile > mypipe &
    while read -r line;do
  linecnt=$(($linecnt+1))
    done < mypipe
  echo "total number of lines: $linecnt"
Line 83: Line 105:
Line 84: Line 107:
  # ksh
  grep PATH /etc/profile |&
  while read -r -p line; do
    linecnt=$((linecnt+1))
  done
  echo "total number of lines: $linecnt"
   # ksh
   grep PATH /etc/profile |&
   while read -r -p line; do
        linecnt=$((linecnt+1))
    done
   echo "total number of lines: $linecnt"
Line 92: Line 115:
 * Another useful trick (using Bash/ksh93 syntax) is breaking a variable into words using {{{read}}}:  * Use a HereString (Bash only):
Line 95: Line 118:
  # Bash
  echo "$foo" | read -r a b c # this doesn't work
  read -r a b c <<< "$foo" # but this does
     read -ra words <<< 'hi ho hum'
     printf 'total number of words: %d' "${#words[@]}"
Line 100: Line 122:
 Again, the pipeline causes the {{{read}}} command in the first example to run in a subshell, so its effect is never witnessed in the parent process. The second example does not create any subshells, so it works as we expect. The {{{<<<}}} operator is specific to bash (2.05b and later), and the input which follows it is usually called a "here string".  The {{{<<<}}} operator is specific to bash (2.05b and later), however it is a very clean and handy way to specify a small string of literal input to a command.
Line 102: Line 124:
 For more examples of how to break input into words, see [[BashFAQ/001|FAQ #1]].  * With a POSIX shell, or for longer multi-line data, you can use a here document instead:
Line 104: Line 126:
 * With a POSIX shell you can use a here document instead:
Line 106: Line 127:
  # POSIX
  read -r a b c << EOF
  $foo
  EOF
    # Bash
    declare -i linecnt
    while read -r; do
        ((linecnt++))
    done <<EOF
    hi
    ho
    hum
    EOF
    printf 'total number of lines: %d' "$linecnt"
Line 111: Line 138:

 * Use lastpipe (Bash 4.2)

 {{{
     # Bash 4.2
     set +m
     shopt -s lastpipe

     printf '%s\n' hi{,,,,,} | while read -r "lines[x++]"; do :; done
     printf 'total number of lines: %d' "${#lines[@]}"
 }}}

 Bash 4.2 introduces the aforementioned ksh-like behavior to Bash. The one caveat is that job control must not be enabled, thereby limiting its usefulness in an interactive shell.

For more related examples of how to read input and break it into words, see [[BashFAQ/001|FAQ #1]].

I set variables in a loop. Why do they disappear after the loop terminates? Or, why can't I pipe data to read?

The problem

Each command of a pipeline of at least two commands - where "command" can be any of: a simple or compound command, or pipeline - is executed asynchronously in a subshell. Or more simply, in most shells, each chunk of code separated by a pipe operator, including compound commands (which includes while/for/until loops) are forked off and executed at the same time in separate SubShell processes, which like all subshells, each have their own isolated environment and variable scope.

Non-working example:

    # Works only in ksh88/ksh93
    typeset -i linecnt=0
    printf '%s\n' foo bar | while read -r line

    do
        linecnt=$((linecnt+1))
    done
    printf 'total number of lines: %s\n' "$linecnt" # prints 0

The reason for this potentially surprising behaviour, as described above, is that each SubShell introduces a new variable context and environment. The while loop above is executed in a new subshell with its own copy of the variable linecnt created with the initial value of '0' taken from the parent shell. This copy then is used for counting. When the while loop is finished, the subshell copy is discarded, and the original variable linecnt of the parent (whose value hasn't changed) is used in the echo command.

Different shells exhibit different behaviors in this situation:

  • BourneShell creates a subshell when the input or output of anything (loops, case etc..) but a simple command is redirected, either by using a pipeline or by a redirection operator ('<', '>').

  • BASH creates a new process only if the loop is part of a pipeline.

  • KornShell creates it only if the loop is part of a pipeline, but not if the loop is the last part of it. The read example above actually works in ksh88 and ksh93! (but not mksh)

  • POSIX specifies the bash behaviour, but as an extension allows any or all of the parts of the pipeline to run without a subshell (thus permitting the KornShell behaviour, as well).

More broken stuff:

    # Bash 4
    # The problem also occurs without a loop
    printf '%s\n' foo bar | mapfile -t line  
    printf 'total number of lines: %s\n' "${#line[@]}" # prints 0

    f() {
        if [[ -t 0 ]]; then
            echo "$1"
        else
            read -r var
        fi
    };

    f 'hello' | f
    echo "$var" # prints nothing

Again, in both cases the pipeline causes read or some containing command to run in a subshell, so its effect is never witnessed in the parent process.

  • It should be stressed that this issue isn't specific to loops. It's a general property of all pipes, though the "while/read" loop might be considered the cannonical example that crops up over and over when people read the help or manpage description of the read builtin and notice that it accepts data on stdin. They might recall that data redirected into a compound command is available throughout that command, but not understand why all the fancy process substitutions and redirects they run across in places like FAQ #1 are necessary. Naturally they proceed to put their funstuff directly into a pipeline, and confusion insues.

Workarounds

  • If the input is a file, a simple redirect will suffice:
        # POSIX
        while read -r line; do linecnt=$(($linecnt+1)); done < file
        echo $linecnt

    Unfortunately, this doesn't work with a Bourne shell; see sh(1) from the Heirloom Bourne Shell for a workaround.

  • Use command grouping and do everything in the subshell:

        # POSIX
        linecnt=0
        cat /etc/passwd | {
        while read -r line ; do
            linecnt=$((linecnt+1))
        done
        echo "total number of lines: $linecnt"
        }
    This doesn't really change the subshell situation, but if nothing from the subshell is needed in the rest of your code then destroying the local environment after you're through with it could be just what you want anyway.
  • Use ProcessSubstitution (Bash only):

        # Bash
        while read -r line; do
            ((linecnt++))
        done < <(grep PATH /etc/profile)
        echo "total number of lines: $linecnt"
    This is essentially identical to the first workaround above. We still redirect a file, only this time the file happens to be a named pipe temporarily created by our process substitution to transport the output of grep.
  • Use a named pipe:

        # POSIX
        mkfifo mypipe
        grep PATH /etc/profile > mypipe &
        while read -r line;do
            linecnt=$(($linecnt+1))
        done < mypipe
        echo "total number of lines: $linecnt"
  • Use a coprocess (ksh, even pdksh, bash 4, oksh, mksh..):

        # ksh
        grep PATH /etc/profile |&
        while read -r -p line; do
            linecnt=$((linecnt+1))
        done
        echo "total number of lines: $linecnt"
  • Use a HereString (Bash only):

         read -ra words <<< 'hi ho hum'
         printf 'total number of words: %d' "${#words[@]}"

    The <<< operator is specific to bash (2.05b and later), however it is a very clean and handy way to specify a small string of literal input to a command.

  • With a POSIX shell, or for longer multi-line data, you can use a here document instead:
        # Bash
        declare -i linecnt
        while read -r; do
            ((linecnt++))
        done <<EOF
        hi
        ho
        hum
        EOF
        printf 'total number of lines: %d' "$linecnt"
  • Use lastpipe (Bash 4.2)
         # Bash 4.2
         set +m
         shopt -s lastpipe
    
         printf '%s\n' hi{,,,,,} | while read -r "lines[x++]"; do :; done
         printf 'total number of lines: %d' "${#lines[@]}"
    Bash 4.2 introduces the aforementioned ksh-like behavior to Bash. The one caveat is that job control must not be enabled, thereby limiting its usefulness in an interactive shell.

For more related examples of how to read input and break it into words, see FAQ #1.


CategoryShell

BashFAQ/024 (last edited 2023-12-12 13:15:33 by 195)