<- Parameters | Arrays ->


Patterns

BASH offers three different kinds of pattern matching. Pattern matching serves two roles in the shell: selecting filenames within a directory, or determining whether a string conforms to a desired format.

On the command line you will mostly use globs. These are a fairly straight-forward form of patterns that can easily be used to match a range of files, or to check variables against simple rules.

The second type of pattern matching involves extended globs, which allow more complicated expressions than regular globs.

Since version 3.0, BASH also supports regular expression patterns. These will be useful mainly in scripts to test user input or parse data.



Glob Patterns

Globs are a very important concept in BASH, if only for their incredible convenience. Properly understanding globs will benefit you in many ways. Globs are basically patterns that can be used to match filenames or other strings.

Globs are composed of normal characters and meta characters. Meta characters are characters that have a special meaning. These are the basic meta characters:

Globs are implicitly anchored at both ends. What this means is that a glob must match a whole string (filename or data string). A glob of a* will not match the string cat, because it only matches the at, not the whole string. A glob of ca*, however, would match cat.

Here's an example of how we can use glob patterns to expand to filenames:

    $ ls
    a  abc  b  c
    $ echo *
    a abc b c
    $ echo a*
    a abc

BASH sees the glob, for example a*. It expands this glob, by looking in the current directory and matching it against all files there. Any filenames that match the glob, are enumerated and used in place of the glob. As a result, the statement echo a* is replaced by the statement echo a abc, and is then executed.

BASH performs filename expansions after word splitting has already been done; therefore, filenames generated by a glob will always be handled correctly. For example:

    $ touch "a b.txt"
    $ ls
    a b.txt
    $ rm *
    $ ls

Here, * is expanded into the single filename "a b.txt". This filename will be passed as a single argument to rm. It is important to understand that using globs to enumerate files is always a better idea than using `ls` for that purpose. Here's an example with some more complex syntax which we will cover later on, but it will illustrate the reason very well:

    $ ls
    a b.txt
    $ for file in `ls`; do rm "$file"; done
    rm: cannot remove `a': No such file or directory
    rm: cannot remove `b.txt': No such file or directory
    $ for file in *; do rm "$file"; done
    $ ls

Here we use the for command to go through the output of the ls command. The ls command prints the string a b.txt. The for command splits that string into words over which it iterates. As a result, for iterates over first a, and then b.txt. Naturally, this is not what we want. The glob, however, expands in the proper form. It results in the file "a b.txt", which for takes as a single argument.

BASH also supports a feature called Extended Globs. These globs are more powerful in nature; technically, they are equivalent to regular expressions, although the syntax looks different than most people are used to. This feature is turned off by default, but can be turned on with the shopt command, which is used to toggle shell options:

    $ shopt -s extglob

The list inside the parentheses is a list of regular or extended globs separated by the | character. Here's an example:

    $ ls
    names.txt  tokyo.jpg  california.bmp
    $ echo !(*jpg|*bmp)
    names.txt

Our glob now expands to anything that does not match the *jpg or the *bmp pattern. Only the text file passes for that, so it is expanded.

In addition to filename expansion, globs may also be used to check data matches a specific format. For example, we might be given a filename, and need to take different actions depending on its extension:

    $ filename="somefile.jpg"
    $ if [[ $filename = *.jpg ]]; then
    > echo "$filename is a jpeg"
    > fi
    somefile.jpg is a jpeg

The [[ keyword and the case builtin command (which we will discuss in more detail later) both offer the opportunity to check a string against a glob -- either regular globs, or extended globs, if the latter have been enabled.

Then, there is Brace Expansion. Brace Expansion technically does not fit in the category of Globs, but it is similar. Globs only expand to actual filenames, where brace expansion will expand to any permutation of the pattern. Here's how they work:

    $ echo th{e,a}n
    then than
    $ echo {/home/*,/root}/.*profile
    /home/axxo/.bash_profile /home/lhunath/.profile /root/.bash_profile /root/.profile
    $ echo {1..9}
    1 2 3 4 5 6 7 8 9
    $ echo {0,1}{0..9}
    00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19






Regular Expressions

Regular expressions (regex) are similar to Glob Patterns but cannot be used for filename matching in BASH. Since 3.0, BASH supports the =~ operator to the [[ keyword. This operator matches the string that comes before it against the regex pattern that follows it. When the string matches the pattern, [[ returns with an exit code of 0 ("true"). If the string does not match the pattern, an exit code of 1 ("false") is returned. In case the pattern's syntax is invalid, [[ will abort the operation and return an exit code of 2.

BASH uses the Extended Regular Expression (ERE) dialect. We will not cover regexes in depth in this guide, but if you are interested in this concept, please read up on RegularExpression, or Extended Regular Expressions.

Regular Expression patterns that use capturing groups (parentheses) will have their captured strings assigned to the BASH_REMATCH variable for later retrieval.

Let's illustrate how regex can be used in BASH:

    $ if [[ $LANG =~ (..)_(..) ]]
    > then echo "You live in ${BASH_REMATCH[2]} and speak ${BASH_REMATCH[1]}."
    > else echo "Your locale was not recognised"
    > fi

Be aware that regex parsing in BASH has changed between releases 3.1 and 3.2. Before 3.2 it was safe to wrap your regex pattern in quotes but this has changed in 3.2. Since then, regex should always be unquoted. You should protect any special characters by escaping it using a backslash.

    $ [[ "My sentence" =~ My\ sentence ]]

Be careful to escape any characters that the shell could misinterpret, such as whitespace, dollar signs followed by text, braces, etc.






<- Parameters | Arrays ->