One of the fundamental concepts of shell programming is the subshell.
Quick summary
Unofficial (but realistic) definition: a subshell happens when a shell fork()s a child process, and that child process does not exec anything.
The following commands create subshells:
- (cmd)
- cmd1 | cmd2
$(cmd) or `cmd`
cmd &
<(cmd) or >(cmd)
Detailed explanations
Every process on a Unix system has its own parcel of memory, for holding its variables, its file descriptors, its copy of the Environment inherited from its parent process, and so on. The changes to the variables (and other private information) in one process do not affect any other processes currently running on the system.
This becomes important when writing shell scripts, because so much of the work gets done by child processes of the script. There are obvious cases, like this:
The perl program is run as a child of the shell, and therefore the change made to the environment variable foo by perl is seen only by perl's child process (the one invoked by system), but not by perl's parent process (our example script). However, that isn't what we generally mean by "subshell".
A subshell is a child process, like perl, but it's one that inherits more than a normal external command does. It can see all the variables of your script, not just the ones that have been exported to the environment; more on this below.
Here's an example of a forced subshell:
foo=old (foo=bar) echo "$foo"
The parentheses around the second command in the script force the command to be run in a subshell, and therefore the change it makes to variable foo is not seen by the main script.
This can be extremely useful. For example, you might want to suppress an environment variable temporarily:
(unset -v http_proxy; wget ...)
Or you might have a task that must run in a specific working directory, and you want the remainder of the script (or your interactive shell) not to be affected by that:
(cd /foo || exit 1; tar ...)
Subshells can also be a pitfall for the unwary. There are many instances in which a shell creates a subshell without parentheses being placed by the programmer. The most common ones are pipelines, and command substitutions. First, the pipeline:
echo hello | read -r a echo "$a"
In this example, the variable a is assigned a value by read, but only in its own process. If we're in a Bourne shell, a POSIX shell, pdksh, or Bash, the read takes place in a subshell (because every command in a pipeline is run in its own subshell), and therefore the variable a in the main script is unaffected.
However, in ksh88 or ksh93, the last command in a pipeline is not run in a subshell. In those shells, the example above would cause the variable a in the main script to hold the value hello.
Command substitution is usually a problem only when it's combined with functions. For example,
f() { count=$((count+1)) echo something } count=0 value=$(f) echo "$count"
The global variable count is incremented whenever function f is called. However, when f is called in a command substitution as shown above, the function runs in a subshell. This means, as I'm sure you've guessed by now, the counter does not get incremented in the main part of the script.
For workarounds to these sorts of problems, see Bash FAQ #24.
For an example of the difference between a subshell and a child process that happens to be a shell:
unset -v a; a=1 (echo "a is $a in the subshell") sh -c 'echo "a is $a in the child shell"'
In the subshell, the regular shell variable a is visible; but because it is not exported, the full child process does not see it.