2518
Comment:
|
11947
part of a massive rewrite
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
Bash can do string operations. LOTS of string operations. This is an introduction to bash string operations for those new to Bash's special tool/feature called "Parameter Expansion", (PE), with a focus on typical string operations. Note Bash's Parameter Expansion, (PE), capability is a lot more powerful than the typical string manipulation calls you may be used to. There are some twists in the road up ahead. Here is a list of some typical string manipulation functions/subroutines that you may already be familiar with: * strlen returns the length of the string * leftstr returns a string N chars long starting from the left hand side * rightstr returns a string N chars long starting from the right hand side * midstr returns a string N chars long starting from offset K chars from the beginning/end * substr returns copy of the string with all instances matching a patter replaced with a new string * basename returns the last component of a pathname (everything after the last "/") * dirname returns everything in the pathname up to, but not including the last "/" * getext returns a filenames extension * dropext returns the filename without its extension. This article will cover how to do all of these using the Bash PE and will introduce the more powerful actions available with PE's. Pleasenote there is a BahfFaq about PE's already. That FAQ cover's more of the scope of PE capabilities, where this one instead focuses on string operations. lets assume we have a string variable named fullpath whose value is "usr/home/JosephBaldwin/Its_only_Rock_and_Roll.mp3" to perform a basename in bash we use the following PE expression: ${fullpath##*/} to perform a dirname in bash we use the following PE expression: ${fullpath%/*} to perform a dropext in bash we use the following PE expression: ${fullpath%.*} to perform a droppre in bash we use the following PE expression: ${fullpath#*.} to perform a getext in bash we use the following PE expression: ${fullpath##*.} to perform a strlen in bash we use the following PE expression: ${#fullpath} to perform a leftstr in bash we use the following PE expression: ${fullpath:0:$2} to perform a rightstr in bash we use the following PE expression: ${fullpath:$(( 0 - $2 ))} to perform a midstr in bash we use the following PE expression: ${fullpath:$2:$3} to perform a substr in bash we use the following PE expression: ${fullpath//$2/$3} |
<<Anchor(faq100)>> == How do I do string manipulations in bash? == Bash can do string operations. LOTS of string operations. This is an introduction to bash string manipulations and related techniques. It overlaps with [[BashFAQ/073|the Parameter Expansion]] question, but the information is presented in a more beginner-friendly manner (we hope). === Parameter expansion syntax === A ''parameter'' in bash is a term that covers both variables (storage places with names, that you can read and write by using their name) and ''special parameters'' (things you can only read from, not write to). For example, if we have a variable named `fruit` we can assign the value `apple` to it by writing: {{{ fruit=apple }}} And we can read that value back by using a ''parameter expansion'': {{{ $fruit }}} Note, however, that `$fruit` is an ''expression'' -- a noun, not a verb -- and so normally we need to put it in some sort of command. Also, the results of an unquoted parameter expansion will be [[WordSplitting|split into multiple words]] and [[glob|expanded into filenames]], which we generally don't want. So, we should always [[Quotes|quote]] our parameter expansions unless we're dealing with a special case. So, to see the value of a parameter (such as a variable): {{{ echo "$fruit" }}} Or, we can use these expansions as part of a larger expression: {{{ echo "I like to eat $fruit" }}} If we want to put an `s` on the end of our variable's content, we run into a dilemma: {{{ echo "I like to eat $fruits" }}} This command tries to expand a variable named `fruits`, rather than a variable named `fruit`. We need to tell the shell that we have a variable name followed by a bunch of other letters that are ''not'' part of the variable name. We can do that like this: {{{ echo "I like to eat ${fruit}s" }}} And while we're inside the curly braces, we also have the opportunity to manipulate the variable's content in various exciting and occasionally even useful ways, which we're about to describe. It should be pointed out that these tricks only work on ''parameter expansions''. You can't operate on a constant string (or a command substitution, etc.) using them, because the syntax requires a parameter name inside the curly braces. (You can, of course, stick your constant string or command substitution into a temporary variable and then use that.) === Length of a string === This one's easy, so we'll get it out of the way first. {{{ echo "The string <$var> is ${#var} characters long." }}} === Substituting part of a string === A common need is to replace some part of a string with something else. (Let's call the old and new parts "words" for now.) If we know what the old word is, and what the new word should be, but not necessarily ''where'' in the string it appears, then we can do this: {{{ $ var="She favors the bold. That's cold." $ echo "${var/old/new}" She favors the bnew. That's cold. }}} That replaces just the first occurrence of the old word. If we want to replace ''all'' occurrence of the old word, we double up the first slash: {{{ $ var="She favors the bold. That's cold." $ echo "${var//old/new}" She favors the bnew. That's cnew. }}} We may not know the ''exact'' word we want to replace. If we can express the kind of word we're looking for with a [[glob]] pattern, then we're still in good shape: {{{ $ var="She favors the bold. That's cold." $ echo "${var//b??d/mold}" She favors the mold. That's cold. }}} We can also ''anchor'' the word we're looking for to either the start or end of the string. In other words, we can tell bash that it should only perform the substitution if it finds the word at the start, or at the end, of the string, rather than somewhere in the middle. {{{ $ var="She favors the bold. That's cold." $ echo "${var/#bold/mold}" She favors the bold. That's cold. $ echo "${var/#She/He}" He favors the bold. That's cold. $ echo "${var/%cold/awful}" She favors the bold. That's cold. $ echo "${var/%cold?/awful}" She favors the bold. That's awful }}} Note that nothing happened in the first command, because `bold` did not appear at the beginning of the string; and also in the third command, because `cold` did not appear at the end of the string. The `#` anchors the pattern (plain word or glob) to the beginning, and the `%` anchors it to the end. In the fourth command, the pattern `cold?` matches the word `cold.` (including the period) at the end of the string. === Removing part of a string === We can use the `${var/old/}` or `${var//old/}` syntax to replace a word with ''nothing'' if we want. That's one way to remove part of a string. But there are some other ways that come in handy more often than you might guess. The first involves removing something from the ''beginning'' of a string. Again, the part we're going to remove might be a string that we know in advance, or it might be something we have to describe with a glob pattern. {{{ $ var="/usr/local/bin/tcpserver" $ echo "${var##*/}" tcpserver }}} The `##` means "remove the largest possible matching string from the beginning of the variable's contents". The `*/` is the pattern that we want to match -- any number of characters ending with a (literal) forward slash. The result is essentially the same as the `basename` command. If we only use one `#` then we remove the ''shortest'' possible matching string. This is less commonly needed, so I'm going to skip the example for now and give a really cool one later. As you might have guessed, we can also remove a string from the ''end'' of our variable's contents. For example, to mimic the `dirname` command, we remove everything starting at the ''last'' slash: {{{ $ var="/usr/local/bin/tcpserver" $ echo "${var%/*}" /usr/local/bin }}} The `%` means "remove the shortest possible match from the end of the variable's contents", and `/*` is a glob that begins with a literal slash character, followed by any number of characters. Since we require the ''shortest'' match, bash isn't allowed to match `/bin/tcpserver` or anything else that contains multiple slashes. It has to remove `/tcpserver` only. Likewise, `%%` means "remove the longest possible match from the end of the variable's contents". We can combine these operations to produce some interesting results, too. For example, we might know that our variable contains something in square brackets, somewhere, with an unknown amount of "garbage" on both sides. We can use this to extract the part we want: {{{ $ var='garabge in [42] garbage out' $ tmp=${var##*[} $ echo "${tmp%%]*}" 42 }}} Note that we used a temporary variable to hold the results of one parameter expansion, and then fed that result to the second one. We can't do two parameter expansions to the same variable at once (the syntax simply doesn't permit it). Now let's try something harder: what if we wanted a sort of ''double basename'' -- the last ''two'' parts of a pathname, instead of just the last part? {{{ $ var=/home/someuser/projects/q/quark $ tmp=${var%/*/*} $ echo "${var#$tmp/}" q/quark }}} This is a bit trickier. Here's how it works: * Look for the shortest possible string matching `/*/*` at the ''end'' of the pathname. In this case, it would match `/q/quark`. * Remove that from the ''end'' of the original string. The result of this is the thing we ''don't'' want. We store this in `tmp`. * Remove the thing we don't want (plus an extra `/`) from the original variable. * We're left with the last two parts of the pathname. === Other stuff goes here later === XXX === Original page content === Here is a list of some typical string manipulation functions/subroutines from other languages that you may already be familiar with: || '''strlen''' ||returns the length of the string|| || '''leftstr''' ||returns a string N chars long starting from the left hand side|| || '''rightstr''' ||returns a string N chars long starting from the right hand side|| || '''midstr''' ||returns a string N chars long starting from an offset K chars from the beginning/end|| || '''substr''' ||substitutes all instances of a pattern with a new string|| || '''basename''' ||returns the last component of a pathname (everything after the last "/")|| || '''dirname''' ||returns everything in the pathname up to, but not including the last "/"|| Two that you may not have heard of but would want to use all the time when scripting on *NIX: || '''getext''' ||returns a filename's extension (eg. "txt", "mp3", "doc", "sxc", "html", etc. ... )|| || '''dropext''' ||returns the filename with the extension stripped off the end of the name.|| This article will cover how to do all of these using the Bash and will introduce the more powerful actions available with Bash's PEs. Please note there is a BashFAQ about PEs already. [[BashFAQ/073|FAQ #73]] covers a larger scope of PE capabilities, where this one focuses on string operations. === Filename manipulation === Lets say we have a bash variable named fullpath that contains `/usr/home/JosephBaldwin/Its_only_Rock_and_Roll.mp3` Often in scripting we want to manipulate certain pieces of the path, like just the file name, which is the last component of the full path. So lets get just the filename from the full path: In *NIX we have the command `basename` which does this very nicely for us: `basename "$fullpath"` returns "`Its_only_Rock_and_Roll.mp3`". In Bash we can do that much faster with this command: `echo ${fullpath##*/}` "WHAT? What the heck is that? That's not a command! That's just a bunch of garbage someone made by whacking some of the stranger keys on the keyboard! I mean really! '' dollar curly pound pound star slash curly '' What IS that?" Um, OK - uh, just calm down for a moment. I know it doesn't look like the typical programming language keyword or library call, but consider a language like Perl. See? To shoehorn new features into Bash, you have to find ways to do it without creating keywords (or anything else) that might cause older scripts to break, so all these string manipulation functions got placed inside the syntax from an old sh feature where there just happened to be room for them: "Parameter Expansion". For a basename we use the PE expression: `${fullpath##*/}` which returns "`Its_only_Rock_and_Roll.mp3`". To find the dirname we use the PE expression: `${fullpath%/*}` which produces "`/usr/home/JosephBaldwin`". To drop the filename extension, we use the PE expression: `${fullpath%.*}` giving out "`/usr/home/JosephBaldwin/Its_only_Rock_and_Roll`" To get the filename's extension we use the PE expression: `${fullpath##*.}` generating only "`mp3`". To find the strlen, the PE expression: `${#fullpath}` finds it, and it's 49. To get a leftstr, the PE expression: `${fullpath:0:20}` grabs the first 20 chars of fullpath to make "`usr/home/JosephBaldw`". To perform a rightstr in bash we use the following PE expression: `${fullpath:(-20)}` which gets the last 20 chars, "`ly_Rock_and_Roll.mp3`". The parentheses are needed, although there are a couple other ways to write it that also work. To perform a midstr in bash we use the following PE expression: `${fullpath:10:20}` making "`osephBaldwin/Its_onl`". To perform a substr in bash we use the following PE expression: `${fullpath//Rock/Roll}` rolling it into "`usr/home/JosephBaldwin/Its_only_Roll_and_Roll.mp3`". === Why aren't the PE things named more nicely === === Can't I just have these a regular functions with nice names? === Not totally generalizable. === What can I do with PE's that I couldn't do with the string functions above? === Todo: offset code examples in code boxes as done in the rest of the wiki. |
How do I do string manipulations in bash?
Bash can do string operations. LOTS of string operations. This is an introduction to bash string manipulations and related techniques. It overlaps with the Parameter Expansion question, but the information is presented in a more beginner-friendly manner (we hope).
Parameter expansion syntax
A parameter in bash is a term that covers both variables (storage places with names, that you can read and write by using their name) and special parameters (things you can only read from, not write to). For example, if we have a variable named fruit we can assign the value apple to it by writing:
fruit=apple
And we can read that value back by using a parameter expansion:
$fruit
Note, however, that $fruit is an expression -- a noun, not a verb -- and so normally we need to put it in some sort of command. Also, the results of an unquoted parameter expansion will be split into multiple words and expanded into filenames, which we generally don't want. So, we should always quote our parameter expansions unless we're dealing with a special case.
So, to see the value of a parameter (such as a variable):
echo "$fruit"
Or, we can use these expansions as part of a larger expression:
echo "I like to eat $fruit"
If we want to put an s on the end of our variable's content, we run into a dilemma:
echo "I like to eat $fruits"
This command tries to expand a variable named fruits, rather than a variable named fruit. We need to tell the shell that we have a variable name followed by a bunch of other letters that are not part of the variable name. We can do that like this:
echo "I like to eat ${fruit}s"
And while we're inside the curly braces, we also have the opportunity to manipulate the variable's content in various exciting and occasionally even useful ways, which we're about to describe.
It should be pointed out that these tricks only work on parameter expansions. You can't operate on a constant string (or a command substitution, etc.) using them, because the syntax requires a parameter name inside the curly braces. (You can, of course, stick your constant string or command substitution into a temporary variable and then use that.)
Length of a string
This one's easy, so we'll get it out of the way first.
echo "The string <$var> is ${#var} characters long."
Substituting part of a string
A common need is to replace some part of a string with something else. (Let's call the old and new parts "words" for now.) If we know what the old word is, and what the new word should be, but not necessarily where in the string it appears, then we can do this:
$ var="She favors the bold. That's cold." $ echo "${var/old/new}" She favors the bnew. That's cold.
That replaces just the first occurrence of the old word. If we want to replace all occurrence of the old word, we double up the first slash:
$ var="She favors the bold. That's cold." $ echo "${var//old/new}" She favors the bnew. That's cnew.
We may not know the exact word we want to replace. If we can express the kind of word we're looking for with a glob pattern, then we're still in good shape:
$ var="She favors the bold. That's cold." $ echo "${var//b??d/mold}" She favors the mold. That's cold.
We can also anchor the word we're looking for to either the start or end of the string. In other words, we can tell bash that it should only perform the substitution if it finds the word at the start, or at the end, of the string, rather than somewhere in the middle.
$ var="She favors the bold. That's cold." $ echo "${var/#bold/mold}" She favors the bold. That's cold. $ echo "${var/#She/He}" He favors the bold. That's cold. $ echo "${var/%cold/awful}" She favors the bold. That's cold. $ echo "${var/%cold?/awful}" She favors the bold. That's awful
Note that nothing happened in the first command, because bold did not appear at the beginning of the string; and also in the third command, because cold did not appear at the end of the string. The # anchors the pattern (plain word or glob) to the beginning, and the % anchors it to the end. In the fourth command, the pattern cold? matches the word cold. (including the period) at the end of the string.
Removing part of a string
We can use the ${var/old/} or ${var//old/} syntax to replace a word with nothing if we want. That's one way to remove part of a string. But there are some other ways that come in handy more often than you might guess.
The first involves removing something from the beginning of a string. Again, the part we're going to remove might be a string that we know in advance, or it might be something we have to describe with a glob pattern.
$ var="/usr/local/bin/tcpserver" $ echo "${var##*/}" tcpserver
The ## means "remove the largest possible matching string from the beginning of the variable's contents". The */ is the pattern that we want to match -- any number of characters ending with a (literal) forward slash. The result is essentially the same as the basename command.
If we only use one # then we remove the shortest possible matching string. This is less commonly needed, so I'm going to skip the example for now and give a really cool one later.
As you might have guessed, we can also remove a string from the end of our variable's contents. For example, to mimic the dirname command, we remove everything starting at the last slash:
$ var="/usr/local/bin/tcpserver" $ echo "${var%/*}" /usr/local/bin
The % means "remove the shortest possible match from the end of the variable's contents", and /* is a glob that begins with a literal slash character, followed by any number of characters. Since we require the shortest match, bash isn't allowed to match /bin/tcpserver or anything else that contains multiple slashes. It has to remove /tcpserver only.
Likewise, %% means "remove the longest possible match from the end of the variable's contents".
We can combine these operations to produce some interesting results, too. For example, we might know that our variable contains something in square brackets, somewhere, with an unknown amount of "garbage" on both sides. We can use this to extract the part we want:
$ var='garabge in [42] garbage out' $ tmp=${var##*[} $ echo "${tmp%%]*}" 42
Note that we used a temporary variable to hold the results of one parameter expansion, and then fed that result to the second one. We can't do two parameter expansions to the same variable at once (the syntax simply doesn't permit it).
Now let's try something harder: what if we wanted a sort of double basename -- the last two parts of a pathname, instead of just the last part?
$ var=/home/someuser/projects/q/quark $ tmp=${var%/*/*} $ echo "${var#$tmp/}" q/quark
This is a bit trickier. Here's how it works:
Look for the shortest possible string matching /*/* at the end of the pathname. In this case, it would match /q/quark.
Remove that from the end of the original string. The result of this is the thing we don't want. We store this in tmp.
Remove the thing we don't want (plus an extra /) from the original variable.
- We're left with the last two parts of the pathname.
Other stuff goes here later
XXX
Original page content
Here is a list of some typical string manipulation functions/subroutines from other languages that you may already be familiar with:
strlen |
returns the length of the string |
leftstr |
returns a string N chars long starting from the left hand side |
rightstr |
returns a string N chars long starting from the right hand side |
midstr |
returns a string N chars long starting from an offset K chars from the beginning/end |
substr |
substitutes all instances of a pattern with a new string |
basename |
returns the last component of a pathname (everything after the last "/") |
dirname |
returns everything in the pathname up to, but not including the last "/" |
Two that you may not have heard of but would want to use all the time when scripting on *NIX:
getext |
returns a filename's extension (eg. "txt", "mp3", "doc", "sxc", "html", etc. ... ) |
dropext |
returns the filename with the extension stripped off the end of the name. |
This article will cover how to do all of these using the Bash and will introduce the more powerful actions available with Bash's PEs. Please note there is a BashFAQ about PEs already. FAQ #73 covers a larger scope of PE capabilities, where this one focuses on string operations.
Filename manipulation
Lets say we have a bash variable named fullpath that contains /usr/home/JosephBaldwin/Its_only_Rock_and_Roll.mp3
Often in scripting we want to manipulate certain pieces of the path, like just the file name, which is the last component of the full path. So lets get just the filename from the full path: In *NIX we have the command basename which does this very nicely for us: basename "$fullpath" returns "Its_only_Rock_and_Roll.mp3". In Bash we can do that much faster with this command: echo ${fullpath##*/}
"WHAT? What the heck is that? That's not a command! That's just a bunch of garbage someone made by whacking some of the stranger keys on the keyboard! I mean really! dollar curly pound pound star slash curly What IS that?"
Um, OK - uh, just calm down for a moment. I know it doesn't look like the typical programming language keyword or library call, but consider a language like Perl. See? To shoehorn new features into Bash, you have to find ways to do it without creating keywords (or anything else) that might cause older scripts to break, so all these string manipulation functions got placed inside the syntax from an old sh feature where there just happened to be room for them: "Parameter Expansion".
For a basename we use the PE expression: ${fullpath##*/} which returns "Its_only_Rock_and_Roll.mp3".
To find the dirname we use the PE expression: ${fullpath%/*} which produces "/usr/home/JosephBaldwin".
To drop the filename extension, we use the PE expression: ${fullpath%.*} giving out "/usr/home/JosephBaldwin/Its_only_Rock_and_Roll"
To get the filename's extension we use the PE expression: ${fullpath##*.} generating only "mp3".
To find the strlen, the PE expression: ${#fullpath} finds it, and it's 49.
To get a leftstr, the PE expression: ${fullpath:0:20} grabs the first 20 chars of fullpath to make "usr/home/JosephBaldw".
To perform a rightstr in bash we use the following PE expression: ${fullpath:(-20)} which gets the last 20 chars, "ly_Rock_and_Roll.mp3". The parentheses are needed, although there are a couple other ways to write it that also work.
To perform a midstr in bash we use the following PE expression: ${fullpath:10:20} making "osephBaldwin/Its_onl".
To perform a substr in bash we use the following PE expression: ${fullpath//Rock/Roll} rolling it into "usr/home/JosephBaldwin/Its_only_Roll_and_Roll.mp3".
Why aren't the PE things named more nicely
Can't I just have these a regular functions with nice names?
Not totally generalizable.
What can I do with PE's that I couldn't do with the string functions above?
Todo: offset code examples in code boxes as done in the rest of the wiki.