Diff for "BashFAQ/024"

Differences between revisions 4 and 31 (spanning 27 versions)

I set variables in a loop that's in a pipeline. Why do they disappear after the loop terminates? Or, why can't I pipe data to read?

In most shells, each command of a pipeline is executed in a separate SubShell. Non-working example:

    # Works only in ksh88/ksh93, or bash 4.2 with lastpipe enabled
    # In other shells, this will print 0
    linecount=0

    printf '%s\n' foo bar |
    while read -r line
    do
        linecount=$((linecount + 1))
    done

    echo "total number of lines: $linecount"

The reason for this potentially surprising behaviour, as described above, is that each SubShell introduces a new variable context and environment. The while loop above is executed in a new subshell with its own copy of the variable linecount created with the initial value of '0' taken from the parent shell. This copy then is used for counting. When the while loop is finished, the subshell copy is discarded, and the original variable linecount of the parent (whose value hasn't changed) is used in the echo command.

Different shells exhibit different behaviors in this situation:

BourneShell creates a subshell when the input or output of anything (loops, case etc..) but a simple command is redirected, either by using a pipeline or by a redirection operator ('<', '>').
BASH creates a new process only if the loop is part of a pipeline.
KornShell creates it only if the loop is part of a pipeline, but not if the loop is the last part of it. The read example above actually works in ksh88 and ksh93! (but not mksh)
POSIX specifies the bash behaviour, but as an extension allows any or all of the parts of the pipeline to run without a subshell (thus permitting the KornShell behaviour, as well).

More broken stuff:

    # Bash 4
    # The problem also occurs without a loop
    printf '%s\n' foo bar | mapfile -t line
    printf 'total number of lines: %s\n' "${#line[@]}" # prints 0

    f() {
        if [[ -t 0 ]]; then
            echo "$1"
        else
            read -r var
        fi
    };

    f 'hello' | f
    echo "$var" # prints nothing

Again, in both cases the pipeline causes read or some containing command to run in a subshell, so its effect is never witnessed in the parent process.

It should be stressed that this issue isn't specific to loops. It's a general property of all pipes, though the "while/read" loop might be considered the canonical example that crops up over and over when people read the help or manpage description of the read builtin and notice that it accepts data on stdin. They might recall that data redirected into a compound command is available throughout that command, but not understand why all the fancy process substitutions and redirects they run across in places like FAQ #1 are necessary. Naturally they proceed to put their funstuff directly into a pipeline, and confusion ensues.

Workarounds

If the input is a file, a simple redirect will suffice:
```
    # POSIX
    while read -r line; do linecount=$((linecount + 1)); done < file
    echo $linecount
```
Unfortunately, this doesn't work with a Bourne shell; see sh(1) from the Heirloom Bourne Shell for a workaround.

Use command grouping and do everything in the subshell:

    # POSIX
    linecount=0

    cat /etc/passwd |
    {
        while read -r line
        do
            linecount=$((linecount + 1))
        done

        echo "total number of lines: $linecount" ;
    }

This doesn't really change the subshell situation, but if nothing from the subshell is needed in the rest of your code then destroying the local environment after you're through with it could be just what you want anyway.

Use ProcessSubstitution (Bash only):
```
    # Bash
    while read -r line
    do
        ((linecount++))
    done < <(grep PATH /etc/profile)

    echo "total number of lines: $linecount"
```
This is essentially identical to the first workaround above. We still redirect a file, only this time the file happens to be a named pipe temporarily created by our process substitution to transport the output of grep.

Use a named pipe:

    # POSIX
    mkfifo mypipe
    grep PATH /etc/profile > mypipe &

    while read -r line
    do
        linecount=$((linecount + 1))
    done < mypipe

    echo "total number of lines: $linecount"

Use a coprocess (ksh, even pdksh, oksh, mksh..):

    # ksh
    grep PATH /etc/profile |&

    while read -r -p line
    do
        linecount=$((linecount + 1))
    done

    echo "total number of lines: $linecount"

Bash 4 also has coproc, but its syntax is very different from ksh's syntax, and not really applicable for this task.

Use a HereString (Bash only):

     # Options:
     # -r Backslash does not act as an escape character; \n is not taken as LF.
     # -a The words are assigned to sequential indices of the array "words"

     read -ra words <<< 'hi ho hum'
     printf 'total number of words: %d' "${#words[@]}"

The <<< operator is specific to bash (2.05b and later), however it is a very clean and handy way to specify a small string of literal input to a command.

With a POSIX shell, or for longer multi-line data, you can use a here document instead:

    # Bash
    declare -i linecount

    while read -r; do
        ((linecount++))
    done <<EOF
    hi
    ho
    hum
    EOF

    printf 'total number of lines: %d' "$linecount"

Use lastpipe (Bash 4.2)

     # Bash 4.2
     # +m: Disable monitor mode (job control). Background processes display their
     #     exit status upon completion when in monitor mode (we don't want that).
     set +m
     shopt -s lastpipe

     printf '%s\n' hi{,,,,,} | while read -r "lines[x++]"; do :; done
     printf 'total number of lines: %d' "${#lines[@]}"

Bash 4.2 introduces the aforementioned ksh-like behavior to Bash. The one caveat is that job control must not be enabled, thereby limiting its usefulness in an interactive shell.

For more related examples of how to read input and break it into words, see FAQ #1.

CategoryShell

-  ⇤ ← Revision 4 as of 2008-05-15 19:09:34 → 
  Size: 4746
  Editor: GreyCat
  Comment: clean up
+   ← Revision 31 as of 2013-06-07 18:44:06 → ⇥
  Size: 6549
  Editor: GreyCat
  Comment: +m *disables* monitor mode, not enables
-Deletions are marked like this.
+Additions are marked like this.
 Line 1:
-[[Anchor(faq24)]]
== I set variables in a loop. Why do they suddenly disappear after the loop terminates? Or, why can't I pipe data to read? ==

The following command always prints "total number of lines: 0", although the variable {{{linecnt}}} has a larger value in the {{{while}}} loop:
+<<Anchor(faq24)>>
== I set variables in a loop that's in a pipeline. Why do they disappear after the loop terminates? Or, why can't I pipe data to read? ==
In most shells, each command of a pipeline is executed in a separate SubShell.  Non-working example:
-Line 7:
+Line 6:
-    # Non-working example (except in ksh88/ksh93)
    linecnt=0
    cat /etc/passwd | while read line
+    # Works only in ksh88/ksh93, or bash 4.2 with lastpipe enabled
    # In other shells, this will print 0
    linecount=0

    printf '%s\n' foo bar |
    while read -r line
-Line 11:
+Line 13:
-        linecnt=`expr $linecnt + 1`
+        linecount=$((linecount + 1))
-Line 13:
+Line 15:
-    echo "total number of lines: $linecnt"
+    echo "total number of lines: $linecount"
-Line 16:
+Line 19:
-The reason for this surprising behaviour is that a {{{while/for/until}}} loop runs in a SubShell when it's part of a pipeline. For the {{{while}}} loop above, a new subshell with its own copy of the variable {{{linecnt}}} is created (initial value, taken from the parent shell: "0"). This copy then is used for counting. When the {{{while}}} loop is finished, the subshell copy is discarded, and the original variable {{{linecnt}}} of the parent (whose value has not changed) is used in the {{{echo}}} command.
+The reason for this potentially surprising behaviour, as described above, is that each SubShell introduces a new variable context and environment. The {{{while}}} loop above is executed in a new subshell with its own copy of the variable {{{linecount}}} created with the initial value of '0' taken from the parent shell. This copy then is used for counting. When the {{{while}}} loop is finished, the subshell copy is discarded, and the original variable {{{linecount}}} of the parent (whose value hasn't changed) is used in the {{{echo}}} command.
-Line 18:
+Line 21:
-Different shells behave differently when using redirection or pipes with a loop:
 * BourneShell creates a subshell when the input or output of a loop is redirected, either by using a pipeline or by a redirection operator ('<', '>').
 * ["BASH"] creates a new process only if the loop is part of a pipeline
 * KornShell creates it only if the loop is part of a pipeline, but ''not'' if the loop is the last part of it.  (The example above actually ''works'' in ksh88 and ksh93!)
+Different shells exhibit different behaviors in this situation:
 * BourneShell creates a subshell when the input or output of anything (loops, case etc..) but a simple command is redirected, either by using a pipeline or by a redirection operator ('<', '>').
 * [[BASH]] creates a new process only if the loop is part of a pipeline.
 * KornShell creates it only if the loop is part of a pipeline, but ''not'' if the loop is the last part of it. The read example above actually ''works'' in ksh88 and ksh93! (but not mksh)
 * POSIX specifies the bash behaviour, but as an extension allows any or all of the parts of the pipeline to run without a subshell (thus permitting the KornShell behaviour, as well).
-Line 23:
+Line 27:
-To solve this, either use a method that works without a subshell, or make sure you do all processing inside that subshell (a bit of a kludge, but often easier to work with):
+More broken stuff:
{{{
    # Bash 4
    # The problem also occurs without a loop
    printf '%s\n' foo bar | mapfile -t line
    printf 'total number of lines: %s\n' "${#line[@]}" # prints 0
}}}
-Line 26:
+Line 36:
-    # POSIX
    linecnt=0
    cat /etc/passwd |
    (
        while read line ; do
                linecnt=$(($linecnt+1))
        done
        echo "total number of lines: $linecnt"
    )
+    f() {
        if [[ -t 0 ]]; then
            echo "$1"
        else
            read -r var
        fi
    };

    f 'hello' | f
    echo "$var" # prints nothing
-Line 37:
+Line 48:
-To avoid the subshell completely (not easily possible if the other part of the pipe is a command!), use redirection, which does not have this problem (at least for ["BASH"] and KornShell):
+Again, in both cases the pipeline causes {{{read}}} or some containing command to run in a subshell, so its effect is never witnessed in the parent process.
-Line 39:
+Line 50:
-{{{
+ It should be stressed that this issue isn't specific to loops. It's a general property of all pipes, though the "{{{while/read}}}" loop might be considered the canonical example that crops up over and over when people read the help or manpage description of the {{{read}}} builtin and notice that it accepts data on stdin. They might recall that data redirected into a compound command is available throughout that command, but not understand why all the fancy process substitutions and redirects they run across in places like [[BashFAQ/001|FAQ #1]] are necessary. Naturally they proceed to put their funstuff directly into a pipeline, and confusion ensues.

=== Workarounds ===

 * If the input is a file, a simple redirect will suffice:

 {{{
-Line 41:
+Line 58:
-    linecnt=0
    while read line ; do
        linecnt=$(($linecnt+1))
   done < /etc/passwd
   echo "total number of lines: $linecnt"
}}}
+    while read -r line; do linecount=$((linecount + 1)); done < file
    echo $linecount
 }}}
-Line 48:
+Line 62:
-For ["BASH"], when the input of the pipe is a command rather than a file, you can use ProcessSubstitution:
+ Unfortunately, this doesn't work with a Bourne shell; see [[http://heirloom.sourceforge.net/sh/sh.1.html#20|sh(1) from the Heirloom Bourne Shell]] for a workaround.
-Line 50:
+Line 64:
-{{{
+ * Use [[BashGuide/CompoundCommands#Command_grouping|command grouping]] and do everything in the subshell:

 {{{
    # POSIX
    linecount=0

    cat /etc/passwd |
    {
 while read -r line
 do
     linecount=$((linecount + 1))
 done

 echo "total number of lines: $linecount" ;
    }
 }}}

 This doesn't really change the subshell situation, but if nothing from the subshell is needed in the rest of your code then destroying the local environment after you're through with it could be just what you want anyway.

 * Use ProcessSubstitution (Bash only):

 {{{
-Line 52:
+Line 87:
-    while read LINE; do
        echo "-> $LINE"
+    while read -r line
    do
        ((linecount++))
-Line 55:
+Line 91:
-}}}
-Line 57:
+Line 92:
-If you're reading from a plain file, a portable and common work-around is to redirect the standard input of the script using {{{exec}}}:
+    echo "total number of lines: $linecount"
 }}}
-Line 59:
+Line 95:
-{{{
    # Bourne
    linecnt=0
    exec < /etc/passwd    # redirect standard input from the file /etc/passwd
    while read line       # "read" gets its input from the file /etc/passwd
+ This is essentially identical to the first workaround above. We still redirect a file, only this time the file happens to be a named pipe temporarily created by our process substitution to transport the output of grep.

 * Use a [[NamedPipes|named pipe]]:

 {{{
    # POSIX
    mkfifo mypipe
    grep PATH /etc/profile > mypipe &

    while read -r line
-Line 65:
+Line 106:
-        linecnt=`expr $linecnt + 1`
    done
    echo "total number of lines: $linecnt"
}}}
+        linecount=$((linecount + 1))
    done < mypipe
-Line 70:
+Line 109:
-This works as expected, and prints a line count for the file {{{/etc/passwd}}}. But the input is redirected from that file permanently. What if we need to read the original standard input sometime later again? In that case we have to save a copy of the original standard input file descriptor, which we later can restore:
+    echo "total number of lines: $linecount"
 }}}
-Line 72:
+Line 112:
-{{{
    # Bourne
    exec 3<&0             # save original stdin file descriptor 0 as FD 3
    exec 0</etc/passwd    # redirect stdin from the file /etc/passwd
+ * Use a [[http://wiki.bash-hackers.org/syntax/keywords/coproc|coprocess]] (ksh, even pdksh, oksh, mksh..):
-Line 77:
+Line 114:
-    linecnt=0
    while read line       # "read" gets its input from the file /etc/passwd
+ {{{
    # ksh
    grep PATH /etc/profile |&

    while read -r -p line
-Line 80:
+Line 120:
-        linecnt=`expr $linecnt + 1`
+        linecount=$((linecount + 1))
-Line 83:
+Line 123:
-    exec 0<&3             # restore saved stdin (FD 0) from FD 3
    exec 3<&-             # close the no-longer-needed FD 3
+    echo "total number of lines: $linecount"
 }}}
 Bash 4 also has coproc, but its syntax is very different from ksh's syntax, and not really applicable for this task.
-Line 86:
+Line 127:
-    echo "total number of lines: $linecnt"
}}}
+ * Use a HereString (Bash only):
-Line 89:
+Line 129:
-Subsequent {{{exec}}} commands can be combined into one line, which is interpreted left-to-right:
+ {{{
     # Options:
     # -r Backslash does not act as an escape character; \n is not taken as LF.
     # -a The words are assigned to sequential indices of the array "words"
-Line 91:
+Line 134:
-{{{
    exec 3<&0
    exec 0</etc/passwd
    _...read redirected standard input..._
    exec 0<&3
    exec 3<&-
}}}
+     read -ra words <<< 'hi ho hum'
     printf 'total number of words: %d' "${#words[@]}"
 }}}
-Line 99:
+Line 138:
-is equivalent to
+ The {{{<<<}}} operator is specific to bash (2.05b and later), however it is a very clean and handy way to specify a small string of literal input to a command.
-Line 101:
+Line 140:
-{{{
    exec 3<&0 0</etc/passwd
    _...read redirected standard input..._
    exec 0<&3 3<&-
}}}
+ * With a POSIX shell, or for longer multi-line data, you can use a here document instead:
-Line 107:
+Line 142:
-Another useful trick (using Bash syntax) is breaking a variable into words using {{{read}}}:
+ {{{
    # Bash
    declare -i linecount
-Line 109:
+Line 146:
-{{{
    # Bash
    echo "$foo" | read a b c      # this doesn't work
    read a b c <<< "$foo"         # but this does
}}}
+    while read -r; do
        ((linecount++))
    done <<EOF
    hi
    ho
    hum
    EOF
-Line 115:
+Line 154:
-Again, the pipeline causes the {{{read}}} command in the first example to run in a subshell, so its effect is never witnessed in the parent process.  The second example does not create any subshells, so it works as we expect.  The {{{<<<}}} operator is specific to bash (2.05b and later), and the input which follows it is usually called a "here string".
+    printf 'total number of lines: %d' "$linecount"
 }}}
-Line 117:
+Line 157:
-For more examples of how to break input into words, see [:BashFAQ/001:FAQ #1].
+ * Use lastpipe (Bash 4.2)

 {{{
     # Bash 4.2
     # +m: Disable monitor mode (job control). Background processes display their
     #     exit status upon completion when in monitor mode (we don't want that).
     set +m
     shopt -s lastpipe

     printf '%s\n' hi{,,,,,} | while read -r "lines[x++]"; do :; done
     printf 'total number of lines: %d' "${#lines[@]}"
 }}}

 Bash 4.2 introduces the aforementioned ksh-like behavior to Bash. The one caveat is that job control must not be enabled, thereby limiting its usefulness in an interactive shell.

For more related examples of how to read input and break it into words, see [[BashFAQ/001|FAQ #1]].

----
CategoryShell