Differences between revisions 31 and 32
Revision 31 as of 2013-06-07 18:44:06
Size: 6549
Editor: GreyCat
Comment: +m *disables* monitor mode, not enables
Revision 32 as of 2013-07-25 14:34:43
Size: 6392
Comment: a few corrections. The job control issue with lastpipe is now fixed.
Deletions are marked like this. Additions are marked like this.
Line 3: Line 3:
In most shells, each command of a pipeline is executed in a separate SubShell. Non-working example: Components of a pipeline are executed concurrently, so in different processes. In many shells, '''all''' the components are executed in a child SubShell. Non-working example:
Line 6: Line 6:
    # Works only in ksh88/ksh93, or bash 4.2 with lastpipe enabled     # Works only in AT&T implementations of ksh, zsh, or bash 4.2 with lastpipe enabled
Line 19: Line 19:
The reason for this potentially surprising behaviour, as described above, is that each SubShell introduces a new variable context and environment. The {{{while}}} loop above is executed in a new subshell with its own copy of the variable {{{linecount}}} created with the initial value of '0' taken from the parent shell. This copy then is used for counting. When the {{{while}}} loop is finished, the subshell copy is discarded, and the original variable {{{linecount}}} of the parent (whose value hasn't changed) is used in the {{{echo}}} command. The reason for this potentially surprising behaviour, as described above, is that each SubShell being a different process introduces a new variable context and environment. The {{{while}}} loop above is executed in a new subshell with its own copy of the variable {{{linecount}}} created with the initial value of '0' taken from the parent shell. This copy then is used for counting. When the {{{while}}} loop is finished, the subshell copy is discarded, and the original variable {{{linecount}}} of the parent (whose value hasn't changed) is used in the {{{echo}}} command.
Line 24: Line 24:
 * KornShell creates it only if the loop is part of a pipeline, but ''not'' if the loop is the last part of it. The read example above actually ''works'' in ksh88 and ksh93! (but not mksh)
 * POSIX specifies the bash behaviour, but as an extension allows any or all of the parts of the pipeline to run without a subshell (thus permitting the KornShell behaviour, as well).
 * AT&T implementations of the KornShell (not the public domain based ones) and zsh create it only if the loop is part of a pipeline, but ''not'' if the loop is the last part of it.
 * POSIX leaves unspecified which component of a pipeline, if any may be executed in the current shell process, allowing `bash` or `ksh` behavior but not guaranteeing either.
Line 83: Line 83:
 * Use ProcessSubstitution (Bash only):  * Use ProcessSubstitution (Bash/Zsh/ksh93 only):
Line 127: Line 127:
 * Use a HereString (Bash only):  * Use a HereString (Bash/zsh/ksh93 only):
Line 143: Line 143:
    # Bash
    declare -i linecount
    # POSIX
    linecount=0
Line 147: Line 147:
        ((linecount++))         linecount=$((linecount+1))
Line 154: Line 154:
    printf 'total number of lines: %d' "$linecount"     printf 'total number of lines: %d\n' "$linecount"
Line 161: Line 161:
     # +m: Disable monitor mode (job control). Background processes display their
     # exit status upon completion when in monitor mode (we don't want that).
     set +m
Line 167: Line 165:
     printf 'total number of lines: %d' "${#lines[@]}"      printf 'total number of lines: %d\n' "${#lines[@]}"
Line 170: Line 168:
 Bash 4.2 introduces the aforementioned ksh-like behavior to Bash. The one caveat is that job control must not be enabled, thereby limiting its usefulness in an interactive shell.  Bash 4.2 introduces the aforementioned ksh-like behavior to Bash.

I set variables in a loop that's in a pipeline. Why do they disappear after the loop terminates? Or, why can't I pipe data to read?

Components of a pipeline are executed concurrently, so in different processes. In many shells, all the components are executed in a child SubShell. Non-working example:

    # Works only in AT&T implementations of ksh, zsh, or bash 4.2 with lastpipe enabled
    # In other shells, this will print 0
    linecount=0

    printf '%s\n' foo bar |
    while read -r line
    do
        linecount=$((linecount + 1))
    done

    echo "total number of lines: $linecount"

The reason for this potentially surprising behaviour, as described above, is that each SubShell being a different process introduces a new variable context and environment. The while loop above is executed in a new subshell with its own copy of the variable linecount created with the initial value of '0' taken from the parent shell. This copy then is used for counting. When the while loop is finished, the subshell copy is discarded, and the original variable linecount of the parent (whose value hasn't changed) is used in the echo command.

Different shells exhibit different behaviors in this situation:

  • BourneShell creates a subshell when the input or output of anything (loops, case etc..) but a simple command is redirected, either by using a pipeline or by a redirection operator ('<', '>').

  • BASH creates a new process only if the loop is part of a pipeline.

  • AT&T implementations of the KornShell (not the public domain based ones) and zsh create it only if the loop is part of a pipeline, but not if the loop is the last part of it.

  • POSIX leaves unspecified which component of a pipeline, if any may be executed in the current shell process, allowing bash or ksh behavior but not guaranteeing either.

More broken stuff:

    # Bash 4
    # The problem also occurs without a loop
    printf '%s\n' foo bar | mapfile -t line
    printf 'total number of lines: %s\n' "${#line[@]}" # prints 0

    f() {
        if [[ -t 0 ]]; then
            echo "$1"
        else
            read -r var
        fi
    };

    f 'hello' | f
    echo "$var" # prints nothing

Again, in both cases the pipeline causes read or some containing command to run in a subshell, so its effect is never witnessed in the parent process.

  • It should be stressed that this issue isn't specific to loops. It's a general property of all pipes, though the "while/read" loop might be considered the canonical example that crops up over and over when people read the help or manpage description of the read builtin and notice that it accepts data on stdin. They might recall that data redirected into a compound command is available throughout that command, but not understand why all the fancy process substitutions and redirects they run across in places like FAQ #1 are necessary. Naturally they proceed to put their funstuff directly into a pipeline, and confusion ensues.

Workarounds

  • If the input is a file, a simple redirect will suffice:
        # POSIX
        while read -r line; do linecount=$((linecount + 1)); done < file
        echo $linecount

    Unfortunately, this doesn't work with a Bourne shell; see sh(1) from the Heirloom Bourne Shell for a workaround.

  • Use command grouping and do everything in the subshell:

        # POSIX
        linecount=0
    
        cat /etc/passwd |
        {
            while read -r line
            do
                linecount=$((linecount + 1))
            done
    
            echo "total number of lines: $linecount" ;
        }
    This doesn't really change the subshell situation, but if nothing from the subshell is needed in the rest of your code then destroying the local environment after you're through with it could be just what you want anyway.
  • Use ProcessSubstitution (Bash/Zsh/ksh93 only):

        # Bash
        while read -r line
        do
            ((linecount++))
        done < <(grep PATH /etc/profile)
    
        echo "total number of lines: $linecount"
    This is essentially identical to the first workaround above. We still redirect a file, only this time the file happens to be a named pipe temporarily created by our process substitution to transport the output of grep.
  • Use a named pipe:

        # POSIX
        mkfifo mypipe
        grep PATH /etc/profile > mypipe &
    
        while read -r line
        do
            linecount=$((linecount + 1))
        done < mypipe
    
        echo "total number of lines: $linecount"
  • Use a coprocess (ksh, even pdksh, oksh, mksh..):

        # ksh
        grep PATH /etc/profile |&
    
        while read -r -p line
        do
            linecount=$((linecount + 1))
        done
    
        echo "total number of lines: $linecount"
    Bash 4 also has coproc, but its syntax is very different from ksh's syntax, and not really applicable for this task.
  • Use a HereString (Bash/zsh/ksh93 only):

         # Options:
         # -r Backslash does not act as an escape character; \n is not taken as LF.
         # -a The words are assigned to sequential indices of the array "words"
    
         read -ra words <<< 'hi ho hum'
         printf 'total number of words: %d' "${#words[@]}"

    The <<< operator is specific to bash (2.05b and later), however it is a very clean and handy way to specify a small string of literal input to a command.

  • With a POSIX shell, or for longer multi-line data, you can use a here document instead:
        # POSIX
        linecount=0
    
        while read -r; do
            linecount=$((linecount+1))
        done <<EOF
        hi
        ho
        hum
        EOF
    
        printf 'total number of lines: %d\n' "$linecount"
  • Use lastpipe (Bash 4.2)
         # Bash 4.2
    
         shopt -s lastpipe
    
         printf '%s\n' hi{,,,,,} | while read -r "lines[x++]"; do :; done
         printf 'total number of lines: %d\n' "${#lines[@]}"
    Bash 4.2 introduces the aforementioned ksh-like behavior to Bash.

For more related examples of how to read input and break it into words, see FAQ #1.


CategoryShell

BashFAQ/024 (last edited 2023-12-12 13:15:33 by 195)