20884
Comment:
|
24903
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
#pragma section-numbers 3 | |
Line 24: | Line 23: |
echo "$fruit" # more generally, printf "%s\n" "$fruit" # but we'll keep it simple for now |
printf '%s\n' "$fruit" # not using echo which can't be used with arbitrary data |
Line 32: | Line 30: |
echo "I like to eat $fruit" | printf '%s\n' "I like to eat $fruit" |
Line 37: | Line 35: |
echo "I like to eat $fruits" | printf '%s\n' "I like to eat $fruits" |
Line 42: | Line 40: |
echo "I like to eat ${fruit}s" | printf '%s\n' "I like to eat ${fruit}s" |
Line 46: | Line 44: |
It should be pointed out that these tricks only work on ''parameter expansions''. You can't operate on a constant string (or a command substitution, etc.) using them, because the syntax requires a parameter name inside the curly braces. (You can, of course, stick your constant string or command substitution into a temporary variable and then use that.) | It should be pointed out that in Bash, contrary to Zsh, these tricks only work on ''parameter expansions''. You can't operate on a constant string (or a command substitution, etc.) using them, because the syntax requires a parameter name inside the curly braces. (You can, of course, stick your constant string or command substitution into a temporary variable and then use that.) |
Line 52: | Line 50: |
echo "The string <$var> is ${#var} characters long." | printf '%s\n' "The string <$var> is ${#var} characters long." |
Line 58: | Line 56: |
This overlaps [[BashFAQ/041|FAQ #41]] but we'll repeat it here. To check for a (known, static) substring and act upon its presence or absence, just do this: | This overlaps [[BashFAQ/041|FAQ #41]] but we'll repeat it here. To check for a (known, static) substring and act upon its presence or absence, just use the standard `case` construct: {{{ case $var in (*substring*) printf '%s\n' "<$var> contains <substring>";; (*) printf '%s\n' "<$var> does not contain <substring>" esac }}} In Bash, you can also use the Korn-style `[[...]]` construct: |
Line 62: | Line 69: |
echo "<$var> contains <substring>" | printf '%s\n' "<$var> contains <substring>" |
Line 64: | Line 71: |
echo "<$var> does not contain <substring>" | printf '%s\n' "<$var> does not contain <substring>" |
Line 67: | Line 74: |
If the substring you want to look for is in a variable, and you want to prevent it from being treated as a glob, you can quote that part: | If the substring you want to look for is in a variable, and you want to prevent it from being treated as a glob pattern, you can quote that part: {{{ case $var in (*"$substring"*) ... }}} It also applies for the `=` (aka `==`) and `!=` operators of the `[[...]]` construct: |
Line 77: | Line 90: |
# substring will be treated as a glob }}} There is also a RegularExpression capability, involving the `=~` operator. For compatibility with all versions of Bash from 3.0 up, be sure to put the regular expression into a variable -- don't put it directly into the `[[` command. And don't quote it, either -- or else it will be treated as a literal string. |
# substring will be treated as a glob pattern }}} There is also a RegularExpression capability, involving the `=~` operator. For compatibility with all versions of Bash from 3.0 up and other shells, be sure to put the regular expression into a variable -- don't put it directly into the `[[` command. And don't quote it, either -- or else it may be treated as a literal string. |
Line 86: | Line 99: |
Beware that on many systems, regular expressions choke on strings that are not valid text in the user's locale, while bash glob patterns can somewhat deal with them, so in cases where either `=` or `=~` can be used, `=` may be preferable. |
|
Line 91: | Line 107: |
$ echo "${var/old/new}" | $ printf '%s\n' "${var/old/new}" |
Line 98: | Line 114: |
$ echo "${var//old/new}" | $ printf '%s\n' "${var//old/new}" |
Line 105: | Line 121: |
$ echo "${var//b??d/mold}" | $ printf '%s\n' "${var//b??d/mold}" |
Line 108: | Line 124: |
We can also ''anchor'' the word we're looking for to either the start or end of the string. In other words, we can tell bash that it should only perform the substitution if it finds the word at the start, or at the end, of the string, rather than somewhere in the middle. | We can also ''anchor'' the word we're looking for to either the start or end of the string (but not both). In other words, we can tell bash that it should only perform the substitution if it finds the word at the start, or at the end, of the string, rather than somewhere in the middle. |
Line 112: | Line 128: |
$ echo "${var/#bold/mold}" | $ printf '%s\n' "${var/#bold/mold}" |
Line 114: | Line 130: |
$ echo "${var/#She/He}" | $ printf '%s\n' "${var/#She/He}" |
Line 116: | Line 132: |
$ echo "${var/%cold/awful}" | $ printf '%s\n' "${var/%cold/awful}" |
Line 118: | Line 134: |
$ echo "${var/%cold?/awful}" | $ printf '%s\n' "${var/%cold?/awful}" |
Line 124: | Line 140: |
We can use the `${var/old/}` or `${var//old/}` syntax to replace a word with ''nothing'' if we want. That's one way to remove part of a string. But there are some other ways that come in handy more often than you might guess. | We can use the `${var/old/}` or `${var//old/}` syntax (or even `${var/old}`, `${var//old}`) to replace a word with ''nothing'' if we want. That's one way to remove part of a string. But there are some other ways that come in handy more often than you might guess. |
Line 130: | Line 146: |
$ echo "${var##*/}" | $ printf '%s\n' "${var##*/}" |
Line 141: | Line 157: |
$ echo "${var%/*}" | $ printf '%s\n' "${var%/*}" |
Line 146: | Line 162: |
Here again, there is a notable difference with `dirname` in that for instance with `var=file`, `dirname` would return `.` while `${var%/*}` would expand to `file`. And in `var=dir/`, `dirname` also returns `.` while `${var%/*}` expands to `dir`. |
|
Line 147: | Line 165: |
Those operators, contrary to the `${var/pattern/replacement}` operator from ksh93 are standard so can also be used in `sh` script. |
|
Line 153: | Line 173: |
$ echo "${var#$tmp/}" | $ printf '%s\n' "${var#"$tmp/"}" |
Line 171: | Line 191: |
$ echo "${tmp%%]*}" | $ printf '%s\n' "${tmp%%]*}" |
Line 176: | Line 196: |
If the delimiter is the same both times (for instance, double quotes) then we need to be a bit more careful: | If the delimiter is the same both times (for instance, double quotes) then we need to be a bit more careful and use only one `#` or `%`: |
Line 181: | Line 201: |
$ echo "${tmp%\"*}" | $ printf '%s\n' "${tmp%\"*}" |
Line 191: | Line 211: |
Here, the input is an MS-DOS "8.3" filename, space-padded to its full length. If for some reason we need to separate into its two parts, we have several possible ways to go about it. We could split the name into ''fields'' at the dot (we'll show that approach later). Or we could use `${var#*.}` to get the "extension" (the part after the dot) and `${var%.*}` to get the left-hand part. Or we could count the columns, as we showed here. In the `${var:0:8}` example, the `0` is the starting position (0 is the first column) and `8` is the length of the piece we want. If we omit the length, or if the length is greater than the rest of the string, then we get the rest of the string as output. In the `${var:(-3)}` example, we omitted the length. We specified a starting position of `-3` (negative three), which means ''three from the end''. We have to use parentheses or a space between the `:` and the negative number to avoid a syntactic inconvenience (we'll discuss that later). We could also have used `${var:8}` to get the rest of the string starting at column number 8 (which is the ''ninth'' column) in this case, since we know the length is constant; but in many cases, we might not know the length in advance, and specifying a negative starting position lets us avoid some unnecessary work. Column-counting is an even stronger technique when there is no delimiter ''at all'' between the pieces we want: |
Here, the input is an MS-DOS "8.3" filename, space-padded to its full length. If for some reason we need to separate into its two parts, we have several possible ways to go about it. We could split the name into ''fields'' at the dot (we'll show that approach later). Or we could use `${var##*.}` to get the "extension" (the part after the last dot) and `${var%.*}` to get the left-hand part. Or we could count the characters, as we showed here. In the `${var:0:8}` example, the `0` is the starting position (0 is the first character) and `8` is the length of the piece we want in characters. If we omit the length, or if the length is greater than the rest of the string, then we get the rest of the string as result. In the `${var:(-3)}` example, we omitted the length. We specified a starting position of `-3` (negative three), which means ''three from the end''. We have to use parentheses or a space between the `:` and the negative number to avoid a syntactic inconvenience (we'll discuss that later). We could also have used `${var:8}` to get the rest of the string starting at character offset 8 (which is the ''ninth'' character) in this case, since we know the length is constant; but in many cases, we might not know the length in advance, and specifying a negative starting position lets us avoid some unnecessary work. Character-counting is an even stronger technique when there is no delimiter ''at all'' between the pieces we want: |
Line 204: | Line 224: |
That operator is also from ksh93 and not standard sh. |
|
Line 207: | Line 229: |
If the delimiter is a single character (or one character of a set -- so long as it's never ''more than one'') then bash offers several viable approaches. The first is to read the input directly into an [[BashFAQ/005|array]] (assuming the variable doesn't contain newline characters): | If the delimiter is a single character (or one character of a set -- so long as it's never ''more than one'') then bash offers several viable approaches. The first, and in the special case where the variable never contain newline characters and doesn't end with the delimiter, is to read the input directly into an [[BashFAQ/005|array]] |
Line 218: | Line 242: |
* `<<< "$var"` means we use the contents of `var` as ''standard input'' to the `read` command. | * `<<< "$var"` means we use the contents of `var` as ''standard input'' to the `read` command (fed via a temporary file in older versions of bash and via a pipe in newer versions for short strings only). |
Line 236: | Line 260: |
Another approach to the same sort of problem involves the intentional use of WordSplitting to retrieve fields one at a time. This is not any more powerful than the array approach we just saw, but it does have two advantages: | Another approach to the same sort of problem involves the intentional use of WordSplitting to retrieve fields one at a time. This is more cumbersome but than the array approach we just saw, but it does have several advantages: |
Line 239: | Line 263: |
* It's a bit simpler. {{{ var=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin |
* It works even if the string ends in a delimier * It works even if strings contain newline characters. {{{ var=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin: |
Line 244: | Line 269: |
set -f | set -o noglob |
Line 246: | Line 271: |
for dir in $var | for dir in $var'' |
Line 250: | Line 275: |
set +f; unset IFS | set +o noglob; unset IFS |
Line 254: | Line 279: |
* `set -f` disables [[glob]] expansion. You should always disable globs when using unquoted parameter expansion, ''unless'' you specifically want to allow globs in the parameter's contents. * We use `set +f` and `unset IFS` at the end of the code to return the shell to a ''default'' state. However, this is not necessarily the state the shell was in when the code started. Returning the shell to its previous (possibly non-default) state is more trouble than it's worth in most cases, so we won't discuss it in depth here. * Again, [[IFS]] contains a list of field delimiters. We want to split our parameter at each colon. |
* `set -o noglob` (or `set -f`) disables [[glob]] expansion. You should always disable globs when using unquoted parameter expansions, ''unless'' you specifically want to allow globs in the parameter's contents to be expanded. * We use `set +o noglob` (or `set +f`) and `unset IFS` at the end of the code to return the shell to a ''default'' state. However, this is not necessarily the state the shell was in when the code started. Returning the shell to its previous (possibly non-default) state is more trouble than it's worth in most cases, so we won't discuss it in depth here. * Again, [[IFS]] contains a list of field delimiters. We want to split our parameter at each colon. We add a `''` at the end so an empty trailing element be not discarded. That also means that an empty `$var` is considered as containing one empty element (which is how the `$PATH` variable works: an empty `$PATH` means searching only in the current working directory). |
Line 273: | Line 298: |
awk's `-F` allows us to specify a field delimiter of any length. awk also allows [[BashFAQ/022|floating point arithmetic]], associative arrays, and a wide variety of other features that many shells lack. | awk's `-F` allows us to specify a field delimiter as an extended regular expression. awk also allows [[BashFAQ/022|floating point arithmetic]], associative arrays, and a wide variety of other features that many shells lack. |
Line 279: | Line 304: |
echo "$foo$bar" | printf '%s\n' "$foo$bar" |
Line 285: | Line 310: |
$ (IFS=/; echo "${array[*]}") | $ (IFS=/; printf '%s\n' "${array[*]}") |
Line 290: | Line 315: |
* We can't use `IFS=/ echo ...` because of [[BashFAQ/104|how the parser works]]. * Therefore, we have to set `IFS` first, in a separate command. This would make the assignment persist for the rest of the shell. Since we don't want that, and because we aren't assigning to any variables that we need to keep, we use an explicit SubShell (using parentheses) to set up an environment where the change to `IFS` is not persistent. |
* We can't use `IFS=/ printf '%s\n' ...` because of [[BashFAQ/104|how the parser works]]. * Therefore, we have to set `IFS` first, in a separate command. This would make the assignment persist for the rest of the shell. Since we don't want that, and because we aren't assigning to any variables that we need to keep, we use an explicit SubShell (using parentheses) to set up an environment where the change to `IFS` is not persistent. Another option would be to use a function in which we declare `IFS` as local with `local IFS`. |
Line 314: | Line 339: |
printf " <%s>" "$@" | [ "$#" -eq 0 ] || printf " <%s>" "$@" |
Line 317: | Line 342: |
The case where `$#` is 0 has to be treated specially as `printf` still goes through the format once if not passed any argument. |
|
Line 342: | Line 369: |
=== Upper/lower case conversion === In bash 4, we have some new parameter expansion features: * `${var^}` capitalizes the first letter of `var` * `${var^[aeiou]}` capitalizes the first letter of `var` if it is a vowel * `${var^^}` capitalizes all the letters in `var` * `${var,}` lower-cases the first letter of `var` * `${var,[abc]}` lower-cases the first letter of `var` if it is a, b or c * `${var,,}` lower-cases all the letters in `var` These are more efficient alternatives to invoking `tr`. |
|
Line 353: | Line 391: |
This one uses `vi` if the `EDITOR` variable is unset ''or empty''. Previously, we mentioned a syntactic infelicity that required parentheses or whitespace to work around: | This one uses `vi` if the `EDITOR` variable is unset ''or empty''. You may use a `:` in front of ''any'' of the operators in this section to treat empty variables the same as unset variables. Previously, we mentioned a syntactic infelicity that required parentheses or whitespace to work around: |
Line 359: | Line 399: |
If we were to use `${var:-3}` here, it would be interpreted as ''use 3 as the default if var is not set'' because the latter syntax has been in use longer than bash has existed. Hence the need for a workaround. | If we were to use `${var:-3}` here, it would be interpreted as ''use 3 as the default if var is unset or empty'' because the latter syntax has been in use longer than bash has existed. Hence the need for a workaround. |
Line 364: | Line 404: |
: ${PATH=/usr/bin:/bin} : ${PATH:=/usr/bin:/bin} |
: "${PATH=/usr/bin:/bin}" : "${PATH:=/usr/bin:/bin}" |
Line 374: | Line 414: |
This one means ''use foo if the variable is set; otherwise, use nothing''. It's an extremely primitive conditional check, and it has two main uses: | This one means ''use foo if the variable is set; otherwise, use nothing''. It's an extremely primitive conditional check, and it has three main uses: |
Line 378: | Line 418: |
It's almost never used outside of these two contexts. |
* One may conditionally pass optional arguments like: `cmd ${opt_x+-x "$opt_x"} ...` It's almost never used outside of those three contexts. Quick glance table: || `${var-word}` || Expands to the contents of var if var is set; otherwise, word. || || `${var:-word}` || Expands to the contents of var if var is set but not empty; otherwise, word. || || `${var+word}` || Expands to word if var is set; otherwise, nothing. || || `${var:+word}` || Expands to word if var is set but not empty; otherwise, nothing. || || `${var=word}` || Assigns word to var if var is unset; then expands to the contents of var. || || `${var:=word}` || Assigns word to var if var is unset or empty; then expands to the contents of var. || || `${var?word}` || Expands to the contents of var if var is set; otherwise, write word to stderr and exit the shell. || || `${var:?word}` || Expands to the contents of var if var is set but not empty; otherwise, write word to stderr and exit the shell. || Nobody ever uses `${var?word}` or `${var:?word}`. Please pretend they don't exist, just like you pretend [[BashFAQ/105|set -e]] and [[BashFAQ/112|set -u]] don't exist. |
How do I do string manipulations in bash?
Bash can do string operations. LOTS of string operations. This is an introduction to bash string manipulations and related techniques. It overlaps with the Parameter Expansion question, but the information here is presented in a more beginner-friendly manner (we hope).
Parameter expansion syntax
A parameter in bash is a term that covers both variables (storage places with names, that you can read and write by using their name) and special parameters (things you can only read from, not write to). For example, if we have a variable named fruit we can assign the value apple to it by writing:
fruit=apple
And we can read that value back by using a parameter expansion:
$fruit
Note, however, that $fruit is an expression -- a noun, not a verb -- and so normally we need to put it in some sort of command. Also, the results of an unquoted parameter expansion will be split into multiple words and expanded into filenames, which we generally don't want. So, we should always quote our parameter expansions unless we're dealing with a special case.
So, to see the value of a parameter (such as a variable):
printf '%s\n' "$fruit" # not using echo which can't be used with arbitrary data
Or, we can use these expansions as part of a larger expression:
printf '%s\n' "I like to eat $fruit"
If we want to put an s on the end of our variable's content, we run into a dilemma:
printf '%s\n' "I like to eat $fruits"
This command tries to expand a variable named fruits, rather than a variable named fruit. We need to tell the shell that we have a variable name followed by a bunch of other letters that are not part of the variable name. We can do that like this:
printf '%s\n' "I like to eat ${fruit}s"
And while we're inside the curly braces, we also have the opportunity to manipulate the variable's content in various exciting and occasionally even useful ways, which we're about to describe.
It should be pointed out that in Bash, contrary to Zsh, these tricks only work on parameter expansions. You can't operate on a constant string (or a command substitution, etc.) using them, because the syntax requires a parameter name inside the curly braces. (You can, of course, stick your constant string or command substitution into a temporary variable and then use that.)
Length of a string
This one's easy, so we'll get it out of the way first.
printf '%s\n' "The string <$var> is ${#var} characters long."
Note that since bash 3.0, it's indeed characters as opposed to bytes which is a significant difference in multi-byte locales. If you need the number of bytes, you need to issue LC_ALL=C before expanding ${#var}.
Checking for substrings
This overlaps FAQ #41 but we'll repeat it here. To check for a (known, static) substring and act upon its presence or absence, just use the standard case construct:
case $var in (*substring*) printf '%s\n' "<$var> contains <substring>";; (*) printf '%s\n' "<$var> does not contain <substring>" esac
In Bash, you can also use the Korn-style [[...]] construct:
if [[ $var = *substring* ]]; then printf '%s\n' "<$var> contains <substring>" else printf '%s\n' "<$var> does not contain <substring>" fi
If the substring you want to look for is in a variable, and you want to prevent it from being treated as a glob pattern, you can quote that part:
case $var in (*"$substring"*) ...
It also applies for the = (aka ==) and != operators of the [[...]] construct:
if [[ $var = *"$substring"* ]]; then # substring will be treated as a literal string, even if it contains glob chars
If you want it to be treated as a glob pattern, remove the quotes:
if [[ $var = *$substring* ]]; then # substring will be treated as a glob pattern
There is also a RegularExpression capability, involving the =~ operator. For compatibility with all versions of Bash from 3.0 up and other shells, be sure to put the regular expression into a variable -- don't put it directly into the [[ command. And don't quote it, either -- or else it may be treated as a literal string.
my_re='^fo+.*bar' if [[ $var =~ $my_re ]]; then # my_re will be treated as an Extended Regular Expression (ERE)
Beware that on many systems, regular expressions choke on strings that are not valid text in the user's locale, while bash glob patterns can somewhat deal with them, so in cases where either = or =~ can be used, = may be preferable.
Substituting part of a string
A common need is to replace some part of a string with something else. (Let's call the old and new parts "words" for now.) If we know what the old word is, and what the new word should be, but not necessarily where in the string it appears, then we can do this:
$ var="She favors the bold. That's cold." $ printf '%s\n' "${var/old/new}" She favors the bnew. That's cold.
That replaces just the first occurrence of the word old. If we want to replace all occurrence of the word, we double up the first slash:
$ var="She favors the bold. That's cold." $ printf '%s\n' "${var//old/new}" She favors the bnew. That's cnew.
We may not know the exact word we want to replace. If we can express the kind of word we're looking for with a glob pattern, then we're still in good shape:
$ var="She favors the bold. That's cold." $ printf '%s\n' "${var//b??d/mold}" She favors the mold. That's cold.
We can also anchor the word we're looking for to either the start or end of the string (but not both). In other words, we can tell bash that it should only perform the substitution if it finds the word at the start, or at the end, of the string, rather than somewhere in the middle.
$ var="She favors the bold. That's cold." $ printf '%s\n' "${var/#bold/mold}" She favors the bold. That's cold. $ printf '%s\n' "${var/#She/He}" He favors the bold. That's cold. $ printf '%s\n' "${var/%cold/awful}" She favors the bold. That's cold. $ printf '%s\n' "${var/%cold?/awful}" She favors the bold. That's awful
Note that nothing happened in the first command, because bold did not appear at the beginning of the string; and also in the third command, because cold did not appear at the end of the string. The # anchors the pattern (plain word or glob) to the beginning, and the % anchors it to the end. In the fourth command, the pattern cold? matches the word cold. (including the period) at the end of the string.
Removing part of a string
We can use the ${var/old/} or ${var//old/} syntax (or even ${var/old}, ${var//old}) to replace a word with nothing if we want. That's one way to remove part of a string. But there are some other ways that come in handy more often than you might guess.
The first involves removing something from the beginning of a string. Again, the part we're going to remove might be a constant string that we know in advance, or it might be something we have to describe with a glob pattern.
$ var="/usr/local/bin/tcpserver" $ printf '%s\n' "${var##*/}" tcpserver
The ## means "remove the largest possible matching string from the beginning of the variable's contents". The */ is the pattern that we want to match -- any number of characters ending with a (literal) forward slash. The result is essentially the same as the basename command, with one notable exception: If the string ends with a slash (or several), basename would return the name of the last path element, while the above would return an empty string. Use with caution.
If we only use one # then we remove the shortest possible matching string. This is less commonly needed, so we'll skip the example for now and give a really cool one later.
As you might have guessed, we can also remove a string from the end of our variable's contents. For example, to mimic the dirname command, we remove everything starting at the last slash:
$ var="/usr/local/bin/tcpserver" $ printf '%s\n' "${var%/*}" /usr/local/bin
The % means "remove the shortest possible match from the end of the variable's contents", and /* is a glob that begins with a literal slash character, followed by any number of characters. Since we require the shortest match, bash isn't allowed to match /bin/tcpserver or anything else that contains multiple slashes. It has to remove /tcpserver only.
Here again, there is a notable difference with dirname in that for instance with var=file, dirname would return . while ${var%/*} would expand to file. And in var=dir/, dirname also returns . while ${var%/*} expands to dir.
Likewise, %% means "remove the longest possible match from the end of the variable's contents".
Those operators, contrary to the ${var/pattern/replacement} operator from ksh93 are standard so can also be used in sh script.
Now let's try something harder: what if we wanted a sort of double basename -- the last two parts of a pathname, instead of just the last part?
$ var=/home/someuser/projects/q/quark $ tmp=${var%/*/*} $ printf '%s\n' "${var#"$tmp/"}" q/quark
This is a bit trickier. Here's how it works:
Look for the shortest possible string matching /*/* at the end of the pathname. In this case, it would match /q/quark.
Remove that from the end of the original string. The result of this is the thing we don't want. We store this in tmp.
Remove the thing we don't want (plus an extra /) from the original variable.
- We're left with the last two parts of the pathname.
It's also worth pointing out that, as we just demonstrated, the pattern to be removed (after # or % or ## or %%) doesn't have to be a constant -- it can be another substitution. This isn't the most common case in real life, but it's sometimes handy.
Extracting parts of strings
We can combine the # and % operations to produce some interesting results, too. For example, we might know that our variable contains something in square brackets, somewhere, with an unknown amount of "garbage" on both sides. We can use this to extract the part we want:
$ var='garbage in [42] garbage out' $ tmp=${var##*[} $ printf '%s\n' "${tmp%%]*}" 42
Note that we used a temporary variable to hold the results of one parameter expansion, and then fed that result to the second one. We can't do two parameter expansions to the same variable at once (the syntax simply doesn't permit it).
If the delimiter is the same both times (for instance, double quotes) then we need to be a bit more careful and use only one # or %:
$ var='garbage in "42" garbage out' $ tmp=${var#*\"} $ printf '%s\n' "${tmp%\"*}" 42
Sometimes, however, we don't have useful delimiters. If we know that the good part resides in a certain set of columns, we can extract it that way. We can use range notation to extract a substring by specifying starting position and length:
var='CONFIG .SYS' left=${var:0:8} right=${var:(-3)}
Here, the input is an MS-DOS "8.3" filename, space-padded to its full length. If for some reason we need to separate into its two parts, we have several possible ways to go about it. We could split the name into fields at the dot (we'll show that approach later). Or we could use ${var##*.} to get the "extension" (the part after the last dot) and ${var%.*} to get the left-hand part. Or we could count the characters, as we showed here.
In the ${var:0:8} example, the 0 is the starting position (0 is the first character) and 8 is the length of the piece we want in characters. If we omit the length, or if the length is greater than the rest of the string, then we get the rest of the string as result. In the ${var:(-3)} example, we omitted the length. We specified a starting position of -3 (negative three), which means three from the end. We have to use parentheses or a space between the : and the negative number to avoid a syntactic inconvenience (we'll discuss that later). We could also have used ${var:8} to get the rest of the string starting at character offset 8 (which is the ninth character) in this case, since we know the length is constant; but in many cases, we might not know the length in advance, and specifying a negative starting position lets us avoid some unnecessary work.
Character-counting is an even stronger technique when there is no delimiter at all between the pieces we want:
var='CONFIG SYS' left=${var:0:8} right=${var:8}
We can't use ${var#*.} or similar techniques here!
That operator is also from ksh93 and not standard sh.
Splitting a string into fields
Sometimes your input might naturally consist of various fields with some sort of delimiter between them. In these cases, a natural approach to handling the input is to divide it into its component fields, so that each one can be handled on its own.
If the delimiter is a single character (or one character of a set -- so long as it's never more than one) then bash offers several viable approaches.
The first, and in the special case where the variable never contain newline characters and doesn't end with the delimiter, is to read the input directly into an array
var=192.168.1.3 IFS=. read -r -a octets <<< "$var"
We're no longer in the realm of parameter expansion here at all. We've combined several features at once:
The IFS variable tells the read command what field delimiters to use. In this case, we only want to use the dot. If we had specified more than one character, then it would have meant any one of those characters would qualify as a delimiter.
The notation var=value command means we set the variable only for the duration of this single command. The IFS variable goes back to whatever it was before, once read is finished.
read puts its results into an array named octets.
<<< "$var" means we use the contents of var as standard input to the read command (fed via a temporary file in older versions of bash and via a pipe in newer versions for short strings only).
After this command, the result is an array named octets whose first element (element 0) is 192, and whose second element (element 1) is 168, and so on. If we want a fixed set of variables instead of an array, we can do that as well:
IFS=, read lastname firstname rest <<< "$name"
We can also "skip" fields we don't want by assigning them to a variable we don't care about such as x or junk; or to _ which is overwritten by each command:
while IFS=: read user x uid gid x home shell; do ... done < /etc/passwd
(for portability, it's best to avoid _ as it's a read-only variable in some shells)
Another approach to the same sort of problem involves the intentional use of WordSplitting to retrieve fields one at a time. This is more cumbersome but than the array approach we just saw, but it does have several advantages:
It works in sh as well as bash.
- It works even if the string ends in a delimier
- It works even if strings contain newline characters.
var=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin: found=no set -o noglob IFS=: for dir in $var'' do if test -x "$dir"/foo; then found=yes; fi done set +o noglob; unset IFS
This example is similar to one on FAQ 81. Bash offers better ways to determine whether a command exists in your PATH, but this illustrates the concept quite clearly. Points of note:
set -o noglob (or set -f) disables glob expansion. You should always disable globs when using unquoted parameter expansions, unless you specifically want to allow globs in the parameter's contents to be expanded.
We use set +o noglob (or set +f) and unset IFS at the end of the code to return the shell to a default state. However, this is not necessarily the state the shell was in when the code started. Returning the shell to its previous (possibly non-default) state is more trouble than it's worth in most cases, so we won't discuss it in depth here.
Again, IFS contains a list of field delimiters. We want to split our parameter at each colon. We add a '' at the end so an empty trailing element be not discarded. That also means that an empty $var is considered as containing one empty element (which is how the $PATH variable works: an empty $PATH means searching only in the current working directory).
If your field delimiter is a multi-character string, then unfortunately bash does not offer any simple ways to deal with that. Your best bet is to handle the task in awk instead.
$ cat inputfile apple::0.75::21 banana::0.50::43 cherry::0.15::107 date::0.30::20 $ awk -F '::' '{print $1 " qty " $3 " @" $2 " = " $2*$3; total+=$2*$3} END {print "Total: " total}' inputfile apple qty 21 @0.75 = 15.75 banana qty 43 @0.50 = 21.5 cherry qty 107 @0.15 = 16.05 date qty 20 @0.30 = 6 Total: 59.3
awk's -F allows us to specify a field delimiter as an extended regular expression. awk also allows floating point arithmetic, associative arrays, and a wide variety of other features that many shells lack.
Joining fields together
The simplest way to concatenate values is to use them together, with nothing in between:
printf '%s\n' "$foo$bar"
If we have an array instead of a fixed set of variables, then we can print the array with a single character (or nothing) between fields using IFS:
$ array=(1 2 3) $ (IFS=/; printf '%s\n' "${array[*]}") 1/2/3
Notable points here:
We can't use IFS=/ printf '%s\n' ... because of how the parser works.
Therefore, we have to set IFS first, in a separate command. This would make the assignment persist for the rest of the shell. Since we don't want that, and because we aren't assigning to any variables that we need to keep, we use an explicit SubShell (using parentheses) to set up an environment where the change to IFS is not persistent. Another option would be to use a function in which we declare IFS as local with local IFS.
If IFS is not set, we get a space between elements. If it's set to the empty string, there is nothing between elements.
- The delimiter is not printed after the final element.
- If we wanted more than one character between fields, we would have to use a different approach; see below.
A more general approach to "joining" an array involves iterating through the fields, either explicitly (using a for loop) or implicitly (using printf). We'll start with a for loop. This example joins the elements of an array with :: between elements, producing the joined string on stdout:
array=(1 2 3) first=1 for element in "${array[@]}"; do if ((! first)); then printf "::"; fi printf "%s" "$element" first=0 done echo
This example uses the implicit looping of printf to print all the script's arguments, with angle brackets around each one:
#!/bin/sh
printf "$# args:"
[ "$#" -eq 0 ] || printf " <%s>" "$@"
echo
The case where $# is 0 has to be treated specially as printf still goes through the format once if not passed any argument.
A named array can also be used in place of @ (e.g. "${array[@]}" expands to all the elements of array).
If we wanted to join the strings into another variable, instead of dumping them out, then we have a few choices:
A string can be built up a piece at a time using var="$var$newthing" (portable) or var+=$newthing (bash 3.1). For example,
output=$1; shift while (($#)); do output+="::$1"; shift; done
If the joining can be done with a single printf command, it can be assigned to a variable using printf -v var FORMAT FIELDS... (bash 3.1). For example,
printf -v output "%s::" "$@" output=${output%::} # Strip extraneous delimiter from end of string.
If the joining requires multiple commands, and a piecemeal string build-up isn't desirable, CommandSubstitution can be used to assign a function's output: var=$(myjoinfunction). It can also be used with a chunk of commands:
var=$( command command )
The disadvantage of command substitution is that it discards all trailing newlines. See the CommandSubstitution page for a workaround.
Upper/lower case conversion
In bash 4, we have some new parameter expansion features:
${var^} capitalizes the first letter of var
${var^[aeiou]} capitalizes the first letter of var if it is a vowel
${var^^} capitalizes all the letters in var
${var,} lower-cases the first letter of var
${var,[abc]} lower-cases the first letter of var if it is a, b or c
${var,,} lower-cases all the letters in var
These are more efficient alternatives to invoking tr.
Default or alternate values
The oldest parameter expansion features of all (every Bourne-family shell has the basic form of these) involve the use or assignment of default values when a parameter is not set. These are fairly straightforward:
"${EDITOR-vi}" "$filename"
If the EDITOR variable isn't set, use vi instead. There's a variant of this:
"${EDITOR:-vi}" "$filename"
This one uses vi if the EDITOR variable is unset or empty. You may use a : in front of any of the operators in this section to treat empty variables the same as unset variables.
Previously, we mentioned a syntactic infelicity that required parentheses or whitespace to work around:
var='a bunch of junk089' value=${var:(-3)}
If we were to use ${var:-3} here, it would be interpreted as use 3 as the default if var is unset or empty because the latter syntax has been in use longer than bash has existed. Hence the need for a workaround.
We can also assign a default value to a variable if it's not already set:
: "${PATH=/usr/bin:/bin}" : "${PATH:=/usr/bin:/bin}"
In the first one, if PATH is set, nothing happens. If it's not set, then it is assigned the value /usr/bin:/bin. In the second one, the assignment also happens if PATH is set to an empty value. Since ${...} is an expression and not a command, it has to be used in a command. Traditionally, the : command (which does nothing, and is a builtin command even in the most ancient shells) is used for this purpose.
Finally, we have this expression:
${var+foo}
This one means use foo if the variable is set; otherwise, use nothing. It's an extremely primitive conditional check, and it has three main uses:
The expression ${1+"$@"} is used to work around broken behavior of "$@" in old or buggy shells when writing a WrapperScript.
A test such as if test "${var+defined}" can be used to determine whether a variable is set.
One may conditionally pass optional arguments like: cmd ${opt_x+-x "$opt_x"} ...
It's almost never used outside of those three contexts.
Quick glance table:
${var-word} |
Expands to the contents of var if var is set; otherwise, word. |
${var:-word} |
Expands to the contents of var if var is set but not empty; otherwise, word. |
${var+word} |
Expands to word if var is set; otherwise, nothing. |
${var:+word} |
Expands to word if var is set but not empty; otherwise, nothing. |
${var=word} |
Assigns word to var if var is unset; then expands to the contents of var. |
${var:=word} |
Assigns word to var if var is unset or empty; then expands to the contents of var. |
${var?word} |
Expands to the contents of var if var is set; otherwise, write word to stderr and exit the shell. |
${var:?word} |
Expands to the contents of var if var is set but not empty; otherwise, write word to stderr and exit the shell. |
Nobody ever uses ${var?word} or ${var:?word}. Please pretend they don't exist, just like you pretend set -e and set -u don't exist.
See Also
Parameter expansion (terse version, with handy tables).