Differences between revisions 1 and 14 (spanning 13 versions)
Revision 1 as of 2008-05-31 13:37:58
Size: 5651
Editor: GreyCat
Comment: first version
Revision 14 as of 2023-09-21 06:22:37
Size: 8950
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
= Word Splitting =

== Introduction ==
Line 4: Line 8:
For additional information on word splitting and argument handling in Bash, consider reading [[Arguments]].

== What is Word Splitting? ==
Line 5: Line 13:
{{{
#!/bin/bash
echo -n "$# args: "
for i; do
  echo -n "'$i' "
done
echo}}}
{{{#!nl numbers=off
#!/bin/sh -
printf "%d args:" "$#"
[ "$#" -eq 0 ] || printf " <%s>" "$@"
echo
}}}
If you create a file named '''args''' with the above contents, make it executable with ''chmod a+x args'' , and put it in one of the directories listed in ''echo "$PATH"'' , then you could run the following command with the following output:
Line 15: Line 22:
3 args: 'hello' 'world' 'how are you?' }}} 3 args: <hello> <world> <how are you?>
}}}
Line 17: Line 25:
The ultimate result of most shell commands is to execute some program with a specific set of arguments (as well as setting up environment variables, opening file descriptors, etc.). Word splitting is part of the process that determines what each of those arguments will be -- after word splitting and pathname expansion, every resulting ''word'' becomes an argument to the program that the shell executes. Our helper program above receives the argument list as constructed by the shell, and shows it to us. Our helper program above receives the argument list as constructed by the shell, and shows it to us.
Line 19: Line 27:
Word splitting is performed on the results of almost all ''unquoted'' expansions. The result of the expansion is broken into separate words based on the characters of the `IFS` variable. If `IFS` is not set, then it will be performed as if `IFS` contained a space, a tab, and a newline. For example: The ultimate result of most shell commands is to execute some program with a specific set of arguments (as well as setting up environment variables, opening file descriptors, etc.).

The above command is code in the shell language where space is used to delimit words/tokens, and quotes are used so remove the special role of some characters (in this instance of space as word delimiter and `?` as a glob operator). That line split into word is '''not''' what we call ''Word Splitting'', it is just shell code syntax parsing.

Word splitting is a separate process that happens after syntax parsing into tokens and before pathname expansion (aka filename generation or globbing) and that is performed on the results of almost all ''unquoted'' expansions. The result of the expansion is broken into separate words based on the characters of the [[IFS]] variable. If `IFS` is not set, then it will be performed as if `IFS` contained a space, a tab, and a newline. For example:
Line 23: Line 35:
4 args: 'This' 'is' 'a' 'variable' }}} 4 args: <This> <is> <a> <variable>
}}}


An example combining word splitting and pathname expansion performed both upon parameter expansion:

{{{
griffon:~$ var="Some wildcard /b* characters"
griffon:~$ args $var
5 args: <Some> <wildcard> </bin> </boot> <characters>
}}}
Line 29: Line 51:
5 args: '' 'var' 'log' 'qmail' 'current'
griffon:~$ unset IFS}}}
5 args: <> <var> <log> <qmail> <current>
griffon:~$ unset IFS
}}}
Line 37: Line 60:
griffon:/music/Yello$ args $(ls)
4 args: 'Yello' '-' 'Oh' 'Yeah.mp3' }}}
Line 40: Line 61:
As you can see above, we usually do ''not'' want to let word splitting occur when filenames are involved. (See BashPitfalls for a discussion of this particular issue.) griffon:/music/Yello$ args $(ls -l)
11 args: <-rw-r--r--> <1> <greg> <greg> <2919154> <2001-05-23> <00:48> <Yello> <-> <Oh> <Yeah.mp3>
}}}
Line 42: Line 65:
Double quoting an expansion suppresses word splitting, except in the special cases of `"$@"` and `"${array[@]}"`: An example with ArithmeticExpansion:

{{{
griffon:~$ IFS=2
griffon:~$ args $(( 11 * 11 ))
2 args: <1> <1>
}}}

== Controlling Word Splitting ==

As you can see above, we usually do ''not'' want to let word splitting nor pathname expansion occur upon expansions when filenames are involved. (See BashPitfalls for a discussion of this particular issue.)

[[Quotes|Double quoting]] an expansion suppresses word splitting, except in the special cases of `"$@"` and `"${array[@]}"` as well as pathname expansion:
Line 45: Line 80:
1 args: 'This is a variable' 1 args: <This is a variable>
Line 47: Line 83:
3 args: 'testing,' 'testing,' '1 2 3' }}} 3 args: <testing,> <testing,> <1 2 3>
}}}
Line 49: Line 86:
`"$@"` causes each positional parameter to be expanded to a separate word; its [:BashFAQ/005:array] equivalent likewise causes each element of the array to be expanded to a separate word. `"$@"` causes each positional parameter to be expanded to a separate word; its [[BashFAQ/005|array]] equivalent likewise causes each element of the array to be expanded to a separate word.
Line 51: Line 88:
There are very complicated rules involving whitespace characters in `IFS`. Quoting the man page again: There are very complicated rules involving whitespace characters in [[IFS]]. Quoting the man page again:
Line 58: Line 95:
Line 59: Line 97:
7 args: 'sshd' 'x' '100' '65534' '' '/var/run/sshd' '/usr/sbin/nologin'
griffon:~$ unset IFS}}}
7 args: <sshd> <x> <100> <65534> <> </var/run/sshd> </usr/sbin/nologin>
griffon:~$ unset IFS
}}}
Line 62: Line 101:
(There was another empty word generated in one of our previous examples, where `IFS` was set to `/`. The observant reader will have noticed, therefore, that non-whitespace `IFS` characters are ''not'' ignored at the beginning and end of expansions, the way whitespace `IFS` characters are.) There was another empty word generated in one of our previous examples, where `IFS` was set to `:`. The observant reader will have noticed, therefore, that non-whitespace `IFS` characters are ''not'' ignored at the beginning and end of expansions, the way whitespace `IFS` characters are.

Whitespace `IFS` characters get consolidated. Multiple spaces in a row, for example, have the same effect as a single space, when `IFS` contains a space (or is not set at all
). Newlines also count as whitespace for this purpose, which has important ramifications when attempting to [[BashFAQ/005|load an array]] with lines of input.
Line 68: Line 109:
Line 69: Line 111:
737 args: 'qmaild' '00INDEX.lsof' '03' '037_ftpd.patch' ... 
griffon:~$ unset IFS}}}
737 args: <qmaild> <00INDEX.lsof> <03> <037_ftpd.patch> ...
griffon:~$ unset IFS
}}}
Line 72: Line 115:
The `*` word, produced by the shell's word splitting, was then expanded as a [:glob:], resulting in several hundred new and exciting words. This can be disastrous if it happens unexpectedly. As with most of the dangerous features of the shell, it is retained because "it's always worked that way". In fact, it can be used for good, if you're very careful: The `*` word, produced by the shell's word splitting, was then expanded as a [[glob]], resulting in several hundred new and exciting words. This can be disastrous if it happens unexpectedly. As with most of the dangerous features of the shell, it is retained because "it's always worked that way". In fact, it could be used for good, if you're very careful:
Line 76: Line 119:
2 args: 'Yello - Oh Yeah.mp3' '*.ogg' }}} 2 args: <Yello - Oh Yeah.mp3> <*.ogg>
}}}
Line 78: Line 122:
Pathname expansion can be disabled with `set -f`. Though with a shell with array support such as Bash, you'd generally want `$files` being an array containing the files rather than a space-separated list of patterns:
Line 80: Line 124:
A few other notes:
 * Word splitting is not performed on expansions inside Bash's `[[ ... ]]` command.
 * Word splitting is not performed on expansions in a `name=value` variable assignment. Thus, one does not need to quote anything in a command like `foo=$bar` (but quoting won't hurt, either).
 * Word splitting ''is'' performed when using the `read` command, but only when there are multiple variable names given to `read`, or when `read -a` is used to populate an array.
{{{
griffon:/music/Yello$ files=(*.mp3 *.ogg)
griffon:/music/Yello$ args "${files[@]}"
2 args: <Yello - Oh Yeah.mp3> <*.ogg>
}}}


If word splitting is desired, then you generally also need to disable pathname expansion with `set -o noglob` or `set -f`.

With bash 4.4 or newer, option settings can be made local to a function which allows the `noglob` option in addition to the `$IFS` parameter to be set temporarily for the purpose of doing one splitting operation without affecting the overall behaviour of the shell with `local -` (copied from the Almquist shell where the idea is to make `$-` local):

{{{
getuser() {
  local - IFS=: user="$1"
  set -o noglob
  user_fields=( $(getent -- passwd "$user") )
}
}}}


== Notes ==

 * Word splitting is not performed on expansions inside Bash keywords such as `[[ ... ]]` and `case`.
 * Word splitting is not performed on expansions in scalar variable assignments. Thus, one does not need to quote anything in a command like these:
   * `foo=$bar`
   * `bar=$(a command)`
   * `logfile=$logdir/foo-$(date +%Y%m%d)`
   * `PATH=/usr/local/bin:$PATH ./myscript`
  But does need to quote in `foo=( "$var1" "$var2" )`,
 * In either case, quoting anyway will not break anything. So if in doubt, [[Quotes|quote]]!
 * When using the `read` command (which you generally don't want to use without `-r`), word splitting ''is'' performed on the input. That's generally wanted when `read -a` is used (to populate an array) or when passing more than one variable to `read`. Some amount of ''word splitting'' is still done when only one variable is given in that leading and trailing IFS whitespace characters are stripped and one trailing non-whitespace IFS character (possibly surrounded by IFS whitespace characters) would be stripped if there was no other separator in the input (like in `IFS=": " read -r var <<< "input : "`). Quoting is irrelevant here, though this behavior can be disabled by emptying IFS, typically with `IFS= read -r var`

----
CategoryShell

Word Splitting

Introduction

The shell's parser performs several operations on your commands before finally executing them. Understanding how your original command will be transformed by the shell is of paramount importance in writing robust scripts. From the bash man page:

  • The order of expansions is: brace expansion, tilde expansion, parameter, variable and arithmetic expansion and command substitution (done in a left-to-right fashion), word splitting, and pathname expansion.

For additional information on word splitting and argument handling in Bash, consider reading Arguments.

What is Word Splitting?

This page will focus on word splitting, of course. Before we get into the technical details, let's write a little helper script that will show us the arguments as passed by the shell:

#!/bin/sh -
printf "%d args:" "$#"
[ "$#" -eq 0 ] || printf " <%s>" "$@"
echo

If you create a file named args with the above contents, make it executable with chmod a+x args , and put it in one of the directories listed in echo "$PATH" , then you could run the following command with the following output:

griffon:~$ args hello world "how are you?"
3 args: <hello> <world> <how are you?>

Our helper program above receives the argument list as constructed by the shell, and shows it to us.

The ultimate result of most shell commands is to execute some program with a specific set of arguments (as well as setting up environment variables, opening file descriptors, etc.).

The above command is code in the shell language where space is used to delimit words/tokens, and quotes are used so remove the special role of some characters (in this instance of space as word delimiter and ? as a glob operator). That line split into word is not what we call Word Splitting, it is just shell code syntax parsing.

Word splitting is a separate process that happens after syntax parsing into tokens and before pathname expansion (aka filename generation or globbing) and that is performed on the results of almost all unquoted expansions. The result of the expansion is broken into separate words based on the characters of the IFS variable. If IFS is not set, then it will be performed as if IFS contained a space, a tab, and a newline. For example:

griffon:~$ var="This is a variable"
griffon:~$ args $var
4 args: <This> <is> <a> <variable>

An example combining word splitting and pathname expansion performed both upon parameter expansion:

griffon:~$ var="Some wildcard /b* characters"
griffon:~$ args $var
5 args: <Some> <wildcard> </bin> </boot> <characters>

An example using IFS:

griffon:~$ log=/var/log/qmail/current IFS=/
griffon:~$ args $log
5 args: <> <var> <log> <qmail> <current>
griffon:~$ unset IFS

An example with CommandSubstitution:

griffon:/music/Yello$ ls -l
total 2864
-rw-r--r-- 1 greg greg 2919154 2001-05-23 00:48 Yello - Oh Yeah.mp3

griffon:/music/Yello$ args $(ls -l)
11 args: <-rw-r--r--> <1> <greg> <greg> <2919154> <2001-05-23> <00:48> <Yello> <-> <Oh> <Yeah.mp3>

An example with ArithmeticExpansion:

griffon:~$ IFS=2
griffon:~$ args $(( 11 * 11 ))
2 args: <1> <1>

Controlling Word Splitting

As you can see above, we usually do not want to let word splitting nor pathname expansion occur upon expansions when filenames are involved. (See BashPitfalls for a discussion of this particular issue.)

Double quoting an expansion suppresses word splitting, except in the special cases of "$@" and "${array[@]}" as well as pathname expansion:

griffon:~$ var="This is a variable"; args "$var"
1 args: <This is a variable>

griffon:~$ array=(testing, testing, "1 2 3"); args "${array[@]}"
3 args: <testing,> <testing,> <1 2 3>

"$@" causes each positional parameter to be expanded to a separate word; its array equivalent likewise causes each element of the array to be expanded to a separate word.

There are very complicated rules involving whitespace characters in IFS. Quoting the man page again:

  • If IFS is unset, or its value is exactly <space><tab><newline>, the default, then any sequence of IFS characters serves to delimit words. If IFS has a value other than the default, then sequences of the whitespace characters space and tab are ignored at the beginning and end of the word, as long as the whitespace character is in the value of IFS (an IFS whitespace character). Any character in IFS that is not IFS whitespace, along with any adjacent IFS whitespace characters, delimits a field. A sequence of IFS whitespace characters is also treated as a delimiter. If the value of IFS is null, no word splitting occurs.

We won't explore those rules in depth here, except to note the part about sequences of non-whitespace characters. If IFS contains non-whitespace characters, then empty words can be generated:

griffon:~$ getent passwd sshd
sshd:x:100:65534::/var/run/sshd:/usr/sbin/nologin

griffon:~$ IFS=:; args $(getent passwd sshd)
7 args: <sshd> <x> <100> <65534> <> </var/run/sshd> </usr/sbin/nologin>
griffon:~$ unset IFS

There was another empty word generated in one of our previous examples, where IFS was set to :. The observant reader will have noticed, therefore, that non-whitespace IFS characters are not ignored at the beginning and end of expansions, the way whitespace IFS characters are.

Whitespace IFS characters get consolidated. Multiple spaces in a row, for example, have the same effect as a single space, when IFS contains a space (or is not set at all). Newlines also count as whitespace for this purpose, which has important ramifications when attempting to load an array with lines of input.

Finally, we note that pathname expansion happens after word splitting, and can produce some very shocking results.

griffon:~$ getent passwd qmaild
qmaild:*:994:998::/var/qmail:/sbin/nologin

griffon:~$ IFS=:; args $(getent passwd qmaild)
737 args: <qmaild> <00INDEX.lsof> <03> <037_ftpd.patch> ...
griffon:~$ unset IFS

The * word, produced by the shell's word splitting, was then expanded as a glob, resulting in several hundred new and exciting words. This can be disastrous if it happens unexpectedly. As with most of the dangerous features of the shell, it is retained because "it's always worked that way". In fact, it could be used for good, if you're very careful:

griffon:/music/Yello$ files='*.mp3 *.ogg'
griffon:/music/Yello$ args $files
2 args: <Yello - Oh Yeah.mp3> <*.ogg>

Though with a shell with array support such as Bash, you'd generally want $files being an array containing the files rather than a space-separated list of patterns:

griffon:/music/Yello$ files=(*.mp3 *.ogg)
griffon:/music/Yello$ args "${files[@]}"
2 args: <Yello - Oh Yeah.mp3> <*.ogg>

If word splitting is desired, then you generally also need to disable pathname expansion with set -o noglob or set -f.

With bash 4.4 or newer, option settings can be made local to a function which allows the noglob option in addition to the $IFS parameter to be set temporarily for the purpose of doing one splitting operation without affecting the overall behaviour of the shell with local - (copied from the Almquist shell where the idea is to make $- local):

getuser() {
  local - IFS=: user="$1"
  set -o noglob
  user_fields=( $(getent -- passwd "$user") )
}

Notes

  • Word splitting is not performed on expansions inside Bash keywords such as [[ ... ]] and case.

  • Word splitting is not performed on expansions in scalar variable assignments. Thus, one does not need to quote anything in a command like these:
    • foo=$bar

    • bar=$(a command)

    • logfile=$logdir/foo-$(date +%Y%m%d)

    • PATH=/usr/local/bin:$PATH ./myscript

    • But does need to quote in foo=( "$var1" "$var2" ),

  • In either case, quoting anyway will not break anything. So if in doubt, quote!

  • When using the read command (which you generally don't want to use without -r), word splitting is performed on the input. That's generally wanted when read -a is used (to populate an array) or when passing more than one variable to read. Some amount of word splitting is still done when only one variable is given in that leading and trailing IFS whitespace characters are stripped and one trailing non-whitespace IFS character (possibly surrounded by IFS whitespace characters) would be stripped if there was no other separator in the input (like in IFS=": " read -r var <<< "input : "). Quoting is irrelevant here, though this behavior can be disabled by emptying IFS, typically with IFS= read -r var


CategoryShell

WordSplitting (last edited 2023-09-21 06:22:37 by StephaneChazelas)