How do I do string manipulations in bash?

Bash can do string operations. LOTS of string operations. This is an introduction to bash string manipulations and related techniques. It overlaps with the Parameter Expansion question, but the information is presented in a more beginner-friendly manner (we hope).

Parameter expansion syntax

A parameter in bash is a term that covers both variables (storage places with names, that you can read and write by using their name) and special parameters (things you can only read from, not write to). For example, if we have a variable named fruit we can assign the value apple to it by writing:

fruit=apple

And we can read that value back by using a parameter expansion:

$fruit

Note, however, that $fruit is an expression -- a noun, not a verb -- and so normally we need to put it in some sort of command. Also, the results of an unquoted parameter expansion will be split into multiple words and expanded into filenames, which we generally don't want. So, we should always quote our parameter expansions unless we're dealing with a special case.

So, to see the value of a parameter (such as a variable):

echo "$fruit"

Or, we can use these expansions as part of a larger expression:

echo "I like to eat $fruit"

If we want to put an s on the end of our variable's content, we run into a dilemma:

echo "I like to eat $fruits"

This command tries to expand a variable named fruits, rather than a variable named fruit. We need to tell the shell that we have a variable name followed by a bunch of other letters that are not part of the variable name. We can do that like this:

echo "I like to eat ${fruit}s"

And while we're inside the curly braces, we also have the opportunity to manipulate the variable's content in various exciting and occasionally even useful ways, which we're about to describe.

It should be pointed out that these tricks only work on parameter expansions. You can't operate on a constant string (or a command substitution, etc.) using them, because the syntax requires a parameter name inside the curly braces. (You can, of course, stick your constant string or command substitution into a temporary variable and then use that.)

Length of a string

This one's easy, so we'll get it out of the way first.

echo "The string <$var> is ${#var} characters long."

Substituting part of a string

A common need is to replace some part of a string with something else. (Let's call the old and new parts "words" for now.) If we know what the old word is, and what the new word should be, but not necessarily where in the string it appears, then we can do this:

$ var="She favors the bold.  That's cold."
$ echo "${var/old/new}"
She favors the bnew.  That's cold.

That replaces just the first occurrence of the old word. If we want to replace all occurrence of the old word, we double up the first slash:

$ var="She favors the bold.  That's cold."
$ echo "${var//old/new}"
She favors the bnew.  That's cnew.

We may not know the exact word we want to replace. If we can express the kind of word we're looking for with a glob pattern, then we're still in good shape:

$ var="She favors the bold.  That's cold."
$ echo "${var//b??d/mold}"
She favors the mold.  That's cold.

We can also anchor the word we're looking for to either the start or end of the string. In other words, we can tell bash that it should only perform the substitution if it finds the word at the start, or at the end, of the string, rather than somewhere in the middle.

$ var="She favors the bold.  That's cold."
$ echo "${var/#bold/mold}"
She favors the bold.  That's cold.
$ echo "${var/#She/He}"
He favors the bold.  That's cold.
$ echo "${var/%cold/awful}"
She favors the bold.  That's cold.
$ echo "${var/%cold?/awful}"
She favors the bold.  That's awful

Note that nothing happened in the first command, because bold did not appear at the beginning of the string; and also in the third command, because cold did not appear at the end of the string. The # anchors the pattern (plain word or glob) to the beginning, and the % anchors it to the end. In the fourth command, the pattern cold? matches the word cold. (including the period) at the end of the string.

Removing part of a string

We can use the ${var/old/} or ${var//old/} syntax to replace a word with nothing if we want. That's one way to remove part of a string. But there are some other ways that come in handy more often than you might guess.

The first involves removing something from the beginning of a string. Again, the part we're going to remove might be a string that we know in advance, or it might be something we have to describe with a glob pattern.

$ var="/usr/local/bin/tcpserver"
$ echo "${var##*/}"
tcpserver

The ## means "remove the largest possible matching string from the beginning of the variable's contents". The */ is the pattern that we want to match -- any number of characters ending with a (literal) forward slash. The result is essentially the same as the basename command.

If we only use one # then we remove the shortest possible matching string. This is less commonly needed, so I'm going to skip the example for now and give a really cool one later.

As you might have guessed, we can also remove a string from the end of our variable's contents. For example, to mimic the dirname command, we remove everything starting at the last slash:

$ var="/usr/local/bin/tcpserver"
$ echo "${var%/*}"
/usr/local/bin

The % means "remove the shortest possible match from the end of the variable's contents", and /* is a glob that begins with a literal slash character, followed by any number of characters. Since we require the shortest match, bash isn't allowed to match /bin/tcpserver or anything else that contains multiple slashes. It has to remove /tcpserver only.

Likewise, %% means "remove the longest possible match from the end of the variable's contents".

We can combine these operations to produce some interesting results, too. For example, we might know that our variable contains something in square brackets, somewhere, with an unknown amount of "garbage" on both sides. We can use this to extract the part we want:

$ var='garabge in [42] garbage out'
$ tmp=${var##*[}
$ echo "${tmp%%]*}"
42

Note that we used a temporary variable to hold the results of one parameter expansion, and then fed that result to the second one. We can't do two parameter expansions to the same variable at once (the syntax simply doesn't permit it).

Now let's try something harder: what if we wanted a sort of double basename -- the last two parts of a pathname, instead of just the last part?

$ var=/home/someuser/projects/q/quark
$ tmp=${var%/*/*}
$ echo "${var#$tmp/}"
q/quark

This is a bit trickier. Here's how it works:

Other stuff goes here later

XXX

Original page content

Here is a list of some typical string manipulation functions/subroutines from other languages that you may already be familiar with:

strlen

returns the length of the string

leftstr

returns a string N chars long starting from the left hand side

rightstr

returns a string N chars long starting from the right hand side

midstr

returns a string N chars long starting from an offset K chars from the beginning/end

substr

substitutes all instances of a pattern with a new string

basename

returns the last component of a pathname (everything after the last "/")

dirname

returns everything in the pathname up to, but not including the last "/"

Two that you may not have heard of but would want to use all the time when scripting on *NIX:

getext

returns a filename's extension (eg. "txt", "mp3", "doc", "sxc", "html", etc. ... )

dropext

returns the filename with the extension stripped off the end of the name.

This article will cover how to do all of these using the Bash and will introduce the more powerful actions available with Bash's PEs. Please note there is a BashFAQ about PEs already. FAQ #73 covers a larger scope of PE capabilities, where this one focuses on string operations.

Filename manipulation

Lets say we have a bash variable named fullpath that contains /usr/home/JosephBaldwin/Its_only_Rock_and_Roll.mp3

Often in scripting we want to manipulate certain pieces of the path, like just the file name, which is the last component of the full path. So lets get just the filename from the full path: In *NIX we have the command basename which does this very nicely for us: basename "$fullpath" returns "Its_only_Rock_and_Roll.mp3". In Bash we can do that much faster with this command: echo ${fullpath##*/}

"WHAT? What the heck is that? That's not a command! That's just a bunch of garbage someone made by whacking some of the stranger keys on the keyboard! I mean really! dollar curly pound pound star slash curly What IS that?"

Um, OK - uh, just calm down for a moment. I know it doesn't look like the typical programming language keyword or library call, but consider a language like Perl. See? To shoehorn new features into Bash, you have to find ways to do it without creating keywords (or anything else) that might cause older scripts to break, so all these string manipulation functions got placed inside the syntax from an old sh feature where there just happened to be room for them: "Parameter Expansion".

For a basename we use the PE expression: ${fullpath##*/} which returns "Its_only_Rock_and_Roll.mp3".

To find the dirname we use the PE expression: ${fullpath%/*} which produces "/usr/home/JosephBaldwin".

To drop the filename extension, we use the PE expression: ${fullpath%.*} giving out "/usr/home/JosephBaldwin/Its_only_Rock_and_Roll"

To get the filename's extension we use the PE expression: ${fullpath##*.} generating only "mp3".

To find the strlen, the PE expression: ${#fullpath} finds it, and it's 49.

To get a leftstr, the PE expression: ${fullpath:0:20} grabs the first 20 chars of fullpath to make "usr/home/JosephBaldw".

To perform a rightstr in bash we use the following PE expression: ${fullpath:(-20)} which gets the last 20 chars, "ly_Rock_and_Roll.mp3". The parentheses are needed, although there are a couple other ways to write it that also work.

To perform a midstr in bash we use the following PE expression: ${fullpath:10:20} making "osephBaldwin/Its_onl".

To perform a substr in bash we use the following PE expression: ${fullpath//Rock/Roll} rolling it into "usr/home/JosephBaldwin/Its_only_Roll_and_Roll.mp3".

Why aren't the PE things named more nicely

Can't I just have these a regular functions with nice names?

Not totally generalizable.

What can I do with PE's that I couldn't do with the string functions above?

Todo: offset code examples in code boxes as done in the rest of the wiki.


CategoryShell