4931
Comment: Mention Posix spec for
|
6352
Refactor. De-emphesize loops somewhat as this is a general problem with pipelines. Dilute all the random shell examples with some Bash. This is a bash wiki after all.
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
[[Anchor(faq24)]] == I set variables in a loop. Why do they suddenly disappear after the loop terminates? Or, why can't I pipe data to read? == |
<<Anchor(faq24)>> == I set variables in a loop. Why do they disappear after the loop terminates? Or, why can't I pipe data to read? == |
Line 4: | Line 4: |
The following command always prints "total number of lines: 0", although the variable {{{linecnt}}} has a larger value in the {{{while}}} loop: | === The problem === Each command of a pipeline of at least two commands - where "command" can be any of: a simple or compound command, or pipeline - is executed asynchronously in a subshell. Or more simply, in most shells, each chunk of code separated by a pipe operator, including [[/CompoundCommands|compound commands]] (which includes {{{while/for/until}}} loops) are forked off and executed at the same time in separate SubShell processes, which like all subshells, each have their own isolated environment and variable scope. Non-working example: {{{ # Works only in ksh88/ksh93 typeset -i linecnt=0 printf '%s\n' foo bar | while read -r line do linecnt=$((linecnt+1)) done printf 'total number of lines: %s\n' "$linecnt" # prints 0 }}} The reason for this potentially surprising behaviour, as described above, is that each SubShell introduces a new variable context and environment. The {{{while}}} loop above is executed in a new subshell with its own copy of the variable {{{linecnt}}} created with the initial value of '0' taken from the parent shell. This copy then is used for counting. When the {{{while}}} loop is finished, the subshell copy is discarded, and the original variable {{{linecnt}}} of the parent (whose value hasn't changed) is used in the {{{echo}}} command. Different shells exhibit different behaviors in this situation: * BourneShell creates a subshell when the input or output of anything (loops, case etc..) but a simple command is redirected, either by using a pipeline or by a redirection operator ('<', '>'). * [[BASH]] creates a new process only if the loop is part of a pipeline. * KornShell creates it only if the loop is part of a pipeline, but ''not'' if the loop is the last part of it. The read example above actually ''works'' in ksh88 and ksh93! (but not mksh) * POSIX specifies the bash behaviour, but as an extension allows any or all of the parts of the pipeline to run without a subshell (thus permitting the KornShell behaviour, as well). More broken stuff: {{{ # Bash 4 # The problem also occurs without a loop printf '%s\n' foo bar | mapfile -t line printf 'total number of lines: %s\n' "${#line[@]}" # prints 0 }}} |
Line 7: | Line 37: |
# Non-working example (except in ksh88/ksh93) | f() { if [[ -t 0 ]]; then echo "$1" else read -r var fi }; f 'hello' | f echo "$var" # prints nothing }}} Again, in both cases the pipeline causes {{{read}}} or some containing command to run in a subshell, so its effect is never witnessed in the parent process. It should be stressed that this issue isn't specific to loops. It's a general property of all pipes, though the "{{{while/read}}}" loop might be considered the cannonical example that crops up over and over when people read the help or manpage description of the {{{read}}} builtin and notice that it accepts data on stdin. They might recall that data redirected into a compound command is available throughout that command, but not understand why all the fancy process substitutions and redirects they run across in places like [[BashFAQ/001|FAQ #1]] are necessary. Naturally they proceed to put their funstuff directly into a pipeline, and confusion insues. === Workarounds === * If the input is a file, a simple redirect will suffice: {{{ # POSIX while read -r line; do linecnt=$(($linecnt+1)); done < file echo $linecnt }}} Unfortunately, this doesn't work with a Bourne shell; see [[http://heirloom.sourceforge.net/sh/sh.1.html#20|sh(1) from the Heirloom Bourne Shell]] for a workaround. * Use [[BashGuide/CompoundCommands#Command_grouping|command grouping]] and do everything in the subshell: {{{ # POSIX |
Line 9: | Line 70: |
cat /etc/passwd | while read line do linecnt=`expr $linecnt + 1` |
cat /etc/passwd | { while read -r line ; do linecnt=$((linecnt+1)) |
Line 14: | Line 75: |
}}} | } }}} |
Line 16: | Line 78: |
The reason for this surprising behaviour is that a {{{while/for/until}}} loop runs in a SubShell when it's part of a pipeline. For the {{{while}}} loop above, a new subshell with its own copy of the variable {{{linecnt}}} is created (initial value, taken from the parent shell: "0"). This copy then is used for counting. When the {{{while}}} loop is finished, the subshell copy is discarded, and the original variable {{{linecnt}}} of the parent (whose value has not changed) is used in the {{{echo}}} command. | This doesn't really change the subshell situation, but if nothing from the subshell is needed in the rest of your code then destroying the local environment after you're through with it could be just what you want anyway. |
Line 18: | Line 80: |
Different shells behave differently when using redirection or pipes with a loop: * BourneShell creates a subshell when the input or output of a loop is redirected, either by using a pipeline or by a redirection operator ('<', '>'). * ["BASH"] creates a new process only if the loop is part of a pipeline * KornShell creates it only if the loop is part of a pipeline, but ''not'' if the loop is the last part of it. (The example above actually ''works'' in ksh88 and ksh93!) * POSIX specifies the bash behaviour, but as an extension allows any or all of the parts of the pipeline to run without a subshell (thus permitting the KornShell behaviour, as well). |
* Use ProcessSubstitution (Bash only): |
Line 24: | Line 82: |
To solve this, either use a method that works without a subshell, or make sure you do all processing inside that subshell (a bit of a kludge, but often easier to work with): | {{{ # Bash while read -r line; do ((linecnt++)) done < <(grep PATH /etc/profile) echo "total number of lines: $linecnt" }}} |
Line 26: | Line 90: |
{{{ | This is essentially identical to the first workaround above. We still redirect a file, only this time the file happens to be a named pipe temporarily created by our process substitution to transport the output of grep. * Use a [[NamedPipes|named pipe]]: {{{ |
Line 28: | Line 96: |
linecnt=0 cat /etc/passwd | ( while read line ; do linecnt=$(($linecnt+1)) done echo "total number of lines: $linecnt" ) }}} |
mkfifo mypipe grep PATH /etc/profile > mypipe & while read -r line;do linecnt=$(($linecnt+1)) done < mypipe echo "total number of lines: $linecnt" }}} |
Line 38: | Line 104: |
To avoid the subshell completely (not easily possible if the other part of the pipe is a command!), use redirection, which does not have this problem (at least for ["BASH"] and KornShell): | * Use a '''coprocess''' (ksh, even pdksh, bash 4, oksh, mksh..): |
Line 40: | Line 106: |
{{{ # POSIX linecnt=0 while read line ; do linecnt=$(($linecnt+1)) done < /etc/passwd echo "total number of lines: $linecnt" }}} For ["BASH"], when the input of the pipe is a command rather than a file, you can use ProcessSubstitution: {{{ # Bash while read LINE; do echo "-> $LINE" done < <(grep PATH /etc/profile) }}} If you're reading from a plain file, a portable and common work-around is to redirect the standard input of the script using {{{exec}}}: {{{ # Bourne linecnt=0 exec < /etc/passwd # redirect standard input from the file /etc/passwd while read line # "read" gets its input from the file /etc/passwd do linecnt=`expr $linecnt + 1` |
{{{ # ksh grep PATH /etc/profile |& while read -r -p line; do linecnt=$((linecnt+1)) |
Line 69: | Line 113: |
}}} | }}} |
Line 71: | Line 115: |
This works as expected, and prints a line count for the file {{{/etc/passwd}}}. But the input is redirected from that file permanently. What if we need to read the original standard input sometime later again? In that case we have to save a copy of the original standard input file descriptor, which we later can restore: | * Use a HereString (Bash only): |
Line 73: | Line 117: |
{{{ # Bourne exec 3<&0 # save original stdin file descriptor 0 as FD 3 exec 0</etc/passwd # redirect stdin from the file /etc/passwd |
{{{ read -ra words <<< 'hi ho hum' printf 'total number of words: %d' "${#words[@]}" }}} |
Line 78: | Line 122: |
linecnt=0 while read line # "read" gets its input from the file /etc/passwd do linecnt=`expr $linecnt + 1` done |
The {{{<<<}}} operator is specific to bash (2.05b and later), however it is a very clean and handy way to specify a small string of literal input to a command. |
Line 84: | Line 124: |
exec 0<&3 # restore saved stdin (FD 0) from FD 3 exec 3<&- # close the no-longer-needed FD 3 |
* With a POSIX shell, or for longer multi-line data, you can use a here document instead: |
Line 87: | Line 126: |
echo "total number of lines: $linecnt" }}} |
{{{ # Bash declare -i linecnt while read -r; do ((linecnt++)) done <<EOF hi ho hum EOF printf 'total number of lines: %d' "$linecnt" }}} |
Line 90: | Line 139: |
Subsequent {{{exec}}} commands can be combined into one line, which is interpreted left-to-right: | * Use lastpipe (Bash 4.2) |
Line 92: | Line 141: |
{{{ exec 3<&0 exec 0</etc/passwd _...read redirected standard input..._ exec 0<&3 exec 3<&- }}} |
{{{ # Bash 4.2 set +m shopt -s lastpipe |
Line 100: | Line 146: |
is equivalent to | printf '%s\n' hi{,,,,,} | while read -r "lines[x++]"; do :; done printf 'total number of lines: %d' "${#lines[@]}" }}} |
Line 102: | Line 150: |
{{{ exec 3<&0 0</etc/passwd _...read redirected standard input..._ exec 0<&3 3<&- }}} |
Bash 4.2 introduces the aforementioned ksh-like behavior to Bash. The one caveat is that job control must not be enabled, thereby limiting its usefulness in an interactive shell. |
Line 108: | Line 152: |
Another useful trick (using Bash syntax) is breaking a variable into words using {{{read}}}: | For more related examples of how to read input and break it into words, see [[BashFAQ/001|FAQ #1]]. |
Line 110: | Line 154: |
{{{ # Bash echo "$foo" | read a b c # this doesn't work read a b c <<< "$foo" # but this does }}} Again, the pipeline causes the {{{read}}} command in the first example to run in a subshell, so its effect is never witnessed in the parent process. The second example does not create any subshells, so it works as we expect. The {{{<<<}}} operator is specific to bash (2.05b and later), and the input which follows it is usually called a "here string". For more examples of how to break input into words, see [:BashFAQ/001:FAQ #1]. |
---- CategoryShell |
I set variables in a loop. Why do they disappear after the loop terminates? Or, why can't I pipe data to read?
The problem
Each command of a pipeline of at least two commands - where "command" can be any of: a simple or compound command, or pipeline - is executed asynchronously in a subshell. Or more simply, in most shells, each chunk of code separated by a pipe operator, including compound commands (which includes while/for/until loops) are forked off and executed at the same time in separate SubShell processes, which like all subshells, each have their own isolated environment and variable scope.
Non-working example:
# Works only in ksh88/ksh93 typeset -i linecnt=0 printf '%s\n' foo bar | while read -r line do linecnt=$((linecnt+1)) done printf 'total number of lines: %s\n' "$linecnt" # prints 0
The reason for this potentially surprising behaviour, as described above, is that each SubShell introduces a new variable context and environment. The while loop above is executed in a new subshell with its own copy of the variable linecnt created with the initial value of '0' taken from the parent shell. This copy then is used for counting. When the while loop is finished, the subshell copy is discarded, and the original variable linecnt of the parent (whose value hasn't changed) is used in the echo command.
Different shells exhibit different behaviors in this situation:
BourneShell creates a subshell when the input or output of anything (loops, case etc..) but a simple command is redirected, either by using a pipeline or by a redirection operator ('<', '>').
BASH creates a new process only if the loop is part of a pipeline.
KornShell creates it only if the loop is part of a pipeline, but not if the loop is the last part of it. The read example above actually works in ksh88 and ksh93! (but not mksh)
POSIX specifies the bash behaviour, but as an extension allows any or all of the parts of the pipeline to run without a subshell (thus permitting the KornShell behaviour, as well).
More broken stuff:
# Bash 4 # The problem also occurs without a loop printf '%s\n' foo bar | mapfile -t line printf 'total number of lines: %s\n' "${#line[@]}" # prints 0
f() { if [[ -t 0 ]]; then echo "$1" else read -r var fi }; f 'hello' | f echo "$var" # prints nothing
Again, in both cases the pipeline causes read or some containing command to run in a subshell, so its effect is never witnessed in the parent process.
It should be stressed that this issue isn't specific to loops. It's a general property of all pipes, though the "while/read" loop might be considered the cannonical example that crops up over and over when people read the help or manpage description of the read builtin and notice that it accepts data on stdin. They might recall that data redirected into a compound command is available throughout that command, but not understand why all the fancy process substitutions and redirects they run across in places like FAQ #1 are necessary. Naturally they proceed to put their funstuff directly into a pipeline, and confusion insues.
Workarounds
- If the input is a file, a simple redirect will suffice:
# POSIX while read -r line; do linecnt=$(($linecnt+1)); done < file echo $linecnt
Unfortunately, this doesn't work with a Bourne shell; see sh(1) from the Heirloom Bourne Shell for a workaround.
Use command grouping and do everything in the subshell:
# POSIX linecnt=0 cat /etc/passwd | { while read -r line ; do linecnt=$((linecnt+1)) done echo "total number of lines: $linecnt" }
This doesn't really change the subshell situation, but if nothing from the subshell is needed in the rest of your code then destroying the local environment after you're through with it could be just what you want anyway.Use ProcessSubstitution (Bash only):
# Bash while read -r line; do ((linecnt++)) done < <(grep PATH /etc/profile) echo "total number of lines: $linecnt"
This is essentially identical to the first workaround above. We still redirect a file, only this time the file happens to be a named pipe temporarily created by our process substitution to transport the output of grep.Use a named pipe:
# POSIX mkfifo mypipe grep PATH /etc/profile > mypipe & while read -r line;do linecnt=$(($linecnt+1)) done < mypipe echo "total number of lines: $linecnt"
Use a coprocess (ksh, even pdksh, bash 4, oksh, mksh..):
# ksh grep PATH /etc/profile |& while read -r -p line; do linecnt=$((linecnt+1)) done echo "total number of lines: $linecnt"
Use a HereString (Bash only):
read -ra words <<< 'hi ho hum' printf 'total number of words: %d' "${#words[@]}"
The <<< operator is specific to bash (2.05b and later), however it is a very clean and handy way to specify a small string of literal input to a command.
- With a POSIX shell, or for longer multi-line data, you can use a here document instead:
# Bash declare -i linecnt while read -r; do ((linecnt++)) done <<EOF hi ho hum EOF printf 'total number of lines: %d' "$linecnt"
- Use lastpipe (Bash 4.2)
# Bash 4.2 set +m shopt -s lastpipe printf '%s\n' hi{,,,,,} | while read -r "lines[x++]"; do :; done printf 'total number of lines: %d' "${#lines[@]}"
Bash 4.2 introduces the aforementioned ksh-like behavior to Bash. The one caveat is that job control must not be enabled, thereby limiting its usefulness in an interactive shell.
For more related examples of how to read input and break it into words, see FAQ #1.