Differences between revisions 23 and 39 (spanning 16 versions)
Revision 23 as of 2010-09-01 01:27:47
Size: 18785
Editor: GreyCat
Comment: finish rewrite
Revision 39 as of 2016-03-23 06:21:03
Size: 243
Editor: MirtaFrods
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
<<Anchor(faq100)>>
== How do I do string manipulations in bash? ==
Bash can do string operations. LOTS of string operations. This is an introduction to bash string manipulations and related techniques. It overlaps with the [[BashFAQ/073|Parameter Expansion]] question, but the information here is presented in a more beginner-friendly manner (we hope).

=== Parameter expansion syntax ===

A ''parameter'' in bash is a term that covers both variables (storage places with names, that you can read and write by using their name) and ''special parameters'' (things you can only read from, not write to). For example, if we have a variable named `fruit` we can assign the value `apple` to it by writing:

{{{
fruit=apple
}}}

And we can read that value back by using a ''parameter expansion'':

{{{
$fruit
}}}

Note, however, that `$fruit` is an ''expression'' -- a noun, not a verb -- and so normally we need to put it in some sort of command. Also, the results of an unquoted parameter expansion will be [[WordSplitting|split into multiple words]] and [[glob|expanded into filenames]], which we generally don't want. So, we should always [[Quotes|quote]] our parameter expansions unless we're dealing with a special case.

So, to see the value of a parameter (such as a variable):

{{{
echo "$fruit"

# more generally, printf "%s\n" "$fruit"
# but we'll keep it simple for now
}}}

Or, we can use these expansions as part of a larger expression:

{{{
echo "I like to eat $fruit"
}}}

If we want to put an `s` on the end of our variable's content, we run into a dilemma:

{{{
echo "I like to eat $fruits"
}}}

This command tries to expand a variable named `fruits`, rather than a variable named `fruit`. We need to tell the shell that we have a variable name followed by a bunch of other letters that are ''not'' part of the variable name. We can do that like this:

{{{
echo "I like to eat ${fruit}s"
}}}

And while we're inside the curly braces, we also have the opportunity to manipulate the variable's content in various exciting and occasionally even useful ways, which we're about to describe.

It should be pointed out that these tricks only work on ''parameter expansions''. You can't operate on a constant string (or a command substitution, etc.) using them, because the syntax requires a parameter name inside the curly braces. (You can, of course, stick your constant string or command substitution into a temporary variable and then use that.)

=== Length of a string ===

This one's easy, so we'll get it out of the way first.

{{{
echo "The string <$var> is ${#var} characters long."
}}}

=== Substituting part of a string ===

A common need is to replace some part of a string with something else. (Let's call the old and new parts "words" for now.) If we know what the old word is, and what the new word should be, but not necessarily ''where'' in the string it appears, then we can do this:

{{{
$ var="She favors the bold. That's cold."
$ echo "${var/old/new}"
She favors the bnew. That's cold.
}}}

That replaces just the first occurrence of the word `old`. If we want to replace ''all'' occurrence of the word, we double up the first slash:

{{{
$ var="She favors the bold. That's cold."
$ echo "${var//old/new}"
She favors the bnew. That's cnew.
}}}

We may not know the ''exact'' word we want to replace. If we can express the kind of word we're looking for with a [[glob]] pattern, then we're still in good shape:

{{{
$ var="She favors the bold. That's cold."
$ echo "${var//b??d/mold}"
She favors the mold. That's cold.
}}}

We can also ''anchor'' the word we're looking for to either the start or end of the string. In other words, we can tell bash that it should only perform the substitution if it finds the word at the start, or at the end, of the string, rather than somewhere in the middle.

{{{
$ var="She favors the bold. That's cold."
$ echo "${var/#bold/mold}"
She favors the bold. That's cold.
$ echo "${var/#She/He}"
He favors the bold. That's cold.
$ echo "${var/%cold/awful}"
She favors the bold. That's cold.
$ echo "${var/%cold?/awful}"
She favors the bold. That's awful
}}}

Note that nothing happened in the first command, because `bold` did not appear at the beginning of the string; and also in the third command, because `cold` did not appear at the end of the string. The `#` anchors the pattern (plain word or glob) to the beginning, and the `%` anchors it to the end. In the fourth command, the pattern `cold?` matches the word `cold.` (including the period) at the end of the string.

=== Removing part of a string ===

We can use the `${var/old/}` or `${var//old/}` syntax to replace a word with ''nothing'' if we want. That's one way to remove part of a string. But there are some other ways that come in handy more often than you might guess.

The first involves removing something from the ''beginning'' of a string. Again, the part we're going to remove might be a constant string that we know in advance, or it might be something we have to describe with a glob pattern.

{{{
$ var="/usr/local/bin/tcpserver"
$ echo "${var##*/}"
tcpserver
}}}

The `##` means "remove the largest possible matching string from the beginning of the variable's contents". The `*/` is the pattern that we want to match -- any number of characters ending with a (literal) forward slash. The result is essentially the same as the `basename` command.

If we only use one `#` then we remove the ''shortest'' possible matching string. This is less commonly needed, so we'll skip the example for now and give a really cool one later.

As you might have guessed, we can also remove a string from the ''end'' of our variable's contents. For example, to mimic the `dirname` command, we remove everything starting at the ''last'' slash:

{{{
$ var="/usr/local/bin/tcpserver"
$ echo "${var%/*}"
/usr/local/bin
}}}

The `%` means "remove the shortest possible match from the end of the variable's contents", and `/*` is a glob that begins with a literal slash character, followed by any number of characters. Since we require the ''shortest'' match, bash isn't allowed to match `/bin/tcpserver` or anything else that contains multiple slashes. It has to remove `/tcpserver` only.

Likewise, `%%` means "remove the longest possible match from the end of the variable's contents".

We can combine these operations to produce some interesting results, too. For example, we might know that our variable contains something in square brackets, somewhere, with an unknown amount of "garbage" on both sides. We can use this to extract the part we want:

{{{
$ var='garabge in [42] garbage out'
$ tmp=${var##*[}
$ echo "${tmp%%]*}"
42
}}}

Note that we used a temporary variable to hold the results of one parameter expansion, and then fed that result to the second one. We can't do two parameter expansions to the same variable at once (the syntax simply doesn't permit it).

Now let's try something harder: what if we wanted a sort of ''double basename'' -- the last ''two'' parts of a pathname, instead of just the last part?

{{{
$ var=/home/someuser/projects/q/quark
$ tmp=${var%/*/*}
$ echo "${var#$tmp/}"
q/quark
}}}

This is a bit trickier. Here's how it works:

 * Look for the shortest possible string matching `/*/*` at the ''end'' of the pathname. In this case, it would match `/q/quark`.
 * Remove that from the ''end'' of the original string. The result of this is the thing we ''don't'' want. We store this in `tmp`.
 * Remove the thing we don't want (plus an extra `/`) from the original variable.
 * We're left with the last two parts of the pathname.

It's also worth pointing out that, as we just demonstrated, the pattern to be removed (after `#` or `%` or `##` or `%%`) doesn't have to be a constant -- it can be another substitution. This isn't the most common case in real life, but it's sometimes handy.

=== Extracting parts of strings ===

We've already seen one way to extract (keep) the good part of a string, and discard the junk parts, by combining a `##` and a `%%`. That technique works any time we have a pair of known delimiters (special markers) to tell us where the good part is.

Sometimes, however, we don't have useful delimiters. If we know that the good part resides in a certain set of ''columns'', we can extract it that way. We can use range notation to extract a substring by specifying starting position and length:

{{{
var='CONFIG .SYS'
left=${var:0:8}
right=${var:(-3)}
}}}

Here, the input is an MS-DOS "8.3" filename, space-padded to its full length. If for some reason we need to separate into its two parts, we have several possible ways to go about it. We could split the name into ''fields'' at the dot (we'll show that approach later). Or we could use `${var#*.}` to get the "extension" (the part after the dot) and `${var%.*}` to get the left-hand part. Or we could count the columns, as we showed here.

In the `${var:0:8}` example, the `0` is the starting position (0 is the first column) and `8` is the length of the piece we want. If we omit the length, or if the length is greater than the rest of the string, then we get the rest of the string as output. In the `${var:(-3)}` example, we omitted the length. We specified a starting position of `-3` (negative three), which means ''three from the end''. We have to use parentheses or a space between the `:` and the negative number to avoid a syntactic inconvenience (we'll discuss that later). We could also have used `${var:8}` to get the rest of the string starting at column number 8 (which is the ''ninth'' columns) in this case, since we know the length is constant; but in many cases, we might not know the length in advance, and specifying a negative starting position lets us avoid some unnecessary work.

Column-counting is an even stronger technique when there is no delimiter ''at all'' between the pieces we want:

{{{
var='CONFIG SYS'
left=${var:0:8}
right=${var:8}
}}}

We can't use `${var#*.}` or similar techniques here!

=== Splitting a string into fields ===

Sometimes your input might naturally consist of various ''fields'' with some sort of delimiter between them. In these cases, a natural approach to handling the input is to divide it into its component fields, so that each one can be handled on its own.

If the delimiter is a single character (or one character of a set -- so long as it's never ''more than one'') then bash offers several viable approaches. The first is to read the input directly into an [[BashFAQ/005|array]]:

{{{
var=192.168.1.3
IFS=. read -r -a octets <<< "$var"
}}}

We're no longer in the realm of ''parameter expansion'' here at all. We've combined several features at once:

 * The [[IFS]] variable tells the `read` command what field delimiters to use. In this case, we only want to use the dot. If we had specified more than one character, then it would have meant ''any one'' of those characters would qualify as a delimiter.
 * The notation `var=value command` means we set the variable only for the duration of this single command. The `IFS` variable goes back to whatever it was before, once `read` is finished.
 * `read` puts its results into an array named `octets`.
 * `<<< "$var"` means we use the contents of `var` as ''standard input'' to the `read` command.

After this command, the result is an array named `octets` whose first element (element 0) is `192`, and whose second element (element 1) is `168`, and so on. If we want a fixed set of variables instead of an array, we can do that as well:

{{{
IFS=, read lastname firstname <<< "$name"
}}}

We can also "skip" fields we don't want by assigning them to a variable we don't care about such as `x` or `junk`; or to `_` which is overwritten by each command:

{{{
while IFS=: read user _ uid gid _ home shell; do
 ...
done < /etc/passwd
}}}

Another approach to the same sort of problem involves the intentional use of WordSplitting to retrieve fields one at a time. This is not any more powerful than the array approach we just saw, but it does have two advantages:

 * It works in `sh` as well as bash.
 * It's a bit simpler.

{{{
var=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
found=no
set -f
IFS=:
for dir in $var
do
  if test -x "$dir"/foo; then found=yes; fi
done
set +f; unset IFS
}}}

This example is similar to one on [[BashFAQ/081|FAQ 81]]. Bash offers better ways to determine whether a command exists in your `PATH`, but this illustrates the concept quite clearly. Points of note:

 * `set -f` disables [[glob]] expansion. You should always disable globs when using unquoted parameter expansion, ''unless'' you specifically want to allow globs in the parameter's contents.
 * We use `set +f` and `unset IFS` at the end of the code to return the shell to a ''default'' state. However, this is not necessarily the state the shell was in when the code started. Returning the shell to its previous (possible non-default) state is more trouble than it's worth in most cases, so we won't discuss it in depth here.
 * Again, [[IFS]] contains a list of field delimiters. We want to split our parameter at each colon.

If your field delimiter is a multi-character string, then unfortunately bash does not offer any simple ways to deal with that. Your best bet is to handle the task in awk instead.

{{{
$ cat inputfile
apple::0.75::21
banana::0.50::43
cherry::0.15::107
date::0.30::20
$ awk -F '::' '{print $1 " qty " $3 " @" $2 " = " $2*$3; total+=$2*$3} END {print "Total: " total}' inputfile
apple qty 21 @0.75 = 15.75
banana qty 43 @0.50 = 21.5
cherry qty 107 @0.15 = 16.05
date qty 20 @0.30 = 6
Total: 59.3
}}}

awk's `-F` allows us to specify a field delimiter of any length. awk also allows [[BashFAQ/022|floating point arithmetic]], associative arrays, and a wide variety of other features that many shells lack.

=== Joining fields together ===

The simplest way to concatenate values is to use them together, with nothing in between:

{{{
echo "$foo$bar"
}}}

If we have an array instead of a fixed set of variables, then we can print the array with a single character (or nothing) between fields using [[IFS]]:

{{{
$ array=(1 2 3)
$ (IFS=/; echo "${array[*]}")
1/2/3
}}}

Notable points here:

 * We can't use `IFS=/ echo ...` because of [[BashFAQ/104|how the parser works]].
 * Therefore, we have to set `IFS` first, in a separate command. This would make the assignment persist for the rest of the shell. Since we don't want that, and because we aren't assigning to any variables that we need to keep, we use an explicit SubShell (using parentheses) to set up an environment where the change to `IFS` is not persistent.
 * If `IFS` is not set, we get a space between elements. If it's set to the empty string, there is nothing between elements.
 * The delimiter is not printed after the final element.
 * If we wanted more than one character between fields, we would have to use a different approach.

A more general approach to "joining" an array involves iterating through the fields, either explicitly (using a `for` loop) or implicitly (using `printf`). We'll start with the `for` loop:

{{{
array=(1 2 3)
first=1
for element in "${array[@]}"; do
  if ((! first)); then printf "::"; fi
  printf "%s" "$element"
  first=0
done
echo
}}}

And using `printf`:

{{{#!highlight text numbers=disable
#!/bin/sh
printf "$# args:"
printf " <%s>" "$@"
echo
}}}

This script dumps the positional parameters, which act like an array. A named array can be used in place of `@` (e.g. `"${array[@]}"` expands to all the elements of `array`).

If we wanted to join the strings into another variable, instead of dumping them out, then we have a few choices:

 * A string can be built up a piece at a time using `var="$var$newthing"` (portable) or `var+=$newthing` (bash 3.1).
 * If the joining can be done with a single `printf` command, it can be assigned to a variable using `printf -v var FORMAT FIELDS...` (bash 3.1).
 * If the joining requires multiple commands, and a piecemeal string build-up isn't desirable, CommandSubstitution can be used to assign a function's output: `var=$(myjoinfunction)`. It can also be used with a chunk of commands:

 {{{
var=$(
  command
  command
)
 }}}

 . The disadvantage of command substitution is that it discards all trailing newlines. See the CommandSubstitution page for a workaround.

=== Default or alternate values ===

The oldest parameter expansion features of all (''every'' Bourne-family shell has the basic form of these) involve the use or assignment of ''default values'' when a parameter is not set. These are fairly straightforward:

{{{
"${EDITOR-vi}" "$filename"
}}}

If the `EDITOR` variable isn't set, use `vi` instead. There's a variant of this:

{{{
"${EDITOR:-vi}" "$filename"
}}}

This one uses `vi` if the `EDITOR` variable is unset ''or empty''. Previously, we mentioned a syntactic infelicity that required parentheses or whitespace to work around:

{{{
var='a bunch of junk089'
value=${var:(-3)}
}}}

If we were to use `${var:-3}` here, it would be interpreted as ''use 3 as the default if var is not set'' because the latter syntax has been in use longer than bash has existed. Hence the need for a workaround.

We can also ''assign'' a default value to a variable if it's not already set:

{{{
: ${PATH=/usr/bin:/bin}
: ${PATH:=/usr/bin:/bin}
}}}

In the first one, if `PATH` is set, nothing happens. If it's not set, then it is assigned the value `/usr/bin:/bin`. In the second one, the assignment also happens if `PATH` is set to an empty value. Since `${...}` is an ''expression'' and not a command, it has to be used in a command. Traditionally, the `:` command (which does nothing, and is a builtin command even in the most ancient shells) is used for this purpose.

Finally, we have this expression:

{{{
${var+foo}
}}}

This one means ''use foo is the variable is set; otherwise, use nothing''. It's an extremely primitive conditional check, and it has two main uses:

 * The expression `${1+"$@"}` is used to work around broken behavior of `"$@"` in old or buggy shells when writing a WrapperScript.
 * A test such as `if test "${var+defined}"` can be used to determine [[BashFAQ/083|whether a variable is set]].

It's almost never used outside of these two contexts.

=== See Also ===

[[BashFAQ/073|Parameter expansion]] (terse version, with handy tables).

----
CategoryShell
Early Childhood (Pre-Primary School) Teacher Jospeh from Fort Saskatchewan, enjoys to spend time fast, Corel Draw x7 keygen - [[http://ow.ly/ZrHSu|reference]], and computer. Had been in recent past traveling to Mapungubwe Cultural Landscape.

Early Childhood (Pre-Primary School) Teacher Jospeh from Fort Saskatchewan, enjoys to spend time fast, Corel Draw x7 keygen - reference, and computer. Had been in recent past traveling to Mapungubwe Cultural Landscape.

BashFAQ/100 (last edited 2023-06-26 10:03:19 by StephaneChazelas)