Differences between revisions 1 and 21 (spanning 20 versions)
Revision 1 as of 2008-05-14 13:53:45
Size: 8011
Editor: Lhunath
Comment:
Revision 21 as of 2013-01-16 20:28:00
Size: 10917
Editor: geirha
Comment: Ease up on [[BASH]], and lose some uneccesary indentation
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
## page was renamed from BashGuide/04.Patterns
[[BashGuide/Parameters|<- Parameters]] | [[BashGuide/TestsAndConditionals|Tests and Conditionals ->]]
----
<<Anchor(StartOfContent)>>
= Patterns =
Line 2: Line 7:
[[Anchor(Patterns)]]
== Patterns ==
[[BASH]] offers three different kinds of ''pattern matching''. Pattern matching serves two roles in the shell: selecting filenames within a directory, or determining whether a string conforms to a desired format.
Line 5: Line 9:
Patterns are strings that are used to match a whole range of strings. They have a special format depending on the pattern dialect which describes the kinds of strings that they match. ''Regular Expression'' patterns can even be used to grab certain pieces out of the strings they match. On the command line you will mostly use ''globs''. These are a fairly straight-forward form of patterns that can easily be used to match a range of files, or to check variables against simple rules.
Line 7: Line 11:
On the command line you will mostly use ''Glob Patterns''. They are a fairly straight-forward form of patterns that can easily be used to match a range of files. The second type of pattern matching involves ''extended globs'', which allow more complicated expressions than regular globs.
Line 9: Line 13:
Since version `3.0`, ["BASH"] also supports ''Regular Expression'' patterns. These will be useful mainly in scripts to test user input or parse data. Since version `3.0`, Bash also supports ''regular expression'' patterns. These will be useful mainly in scripts to test user input or parse data.  (You can't use a regular expression to select filenames; only globs and extended globs can do that.)
Line 12: Line 16:
 . ''Pattern'': A pattern is a string with a special format designed to be a sort of key that matches several other strings of a kind.  . ''Pattern'': A pattern is a string with a special format designed to match filenames, or to check, classify or validate data strings.
Line 17: Line 21:
[[Anchor(Glob_Patterns)]]
=== Glob Patterns ===
<<Anchor(Glob_Patterns)>>
== Glob Patterns ==
Line 20: Line 24:
Globs are a very important concept in ["BASH"], if only for their incredible convenience. Properly understanding globs will benefit you in many ways. Globs are basically patterns that can be used to match filenames or other strings. [[glob|Globs]] are a very important concept in Bash, if only for their incredible convenience. Properly understanding globs will benefit you in many ways. Globs are basically patterns that can be used to match filenames or other strings.
Line 22: Line 26:
Globs are composed of normal characters and meta characters. Meta characters are characters that have a special meaning. These are the basic meta characters: Globs are composed of normal characters and metacharacters. Metacharacters are characters that have a special meaning. These are the metacharacters that can be used in globs:
Line 27: Line 31:

Globs are implicitly ''anchored'' at both ends. What this means is that a glob must match a ''whole'' string (filename or data string). A glob of `a*` will not match the string `cat`, because it only matches the `at`, not the whole string. A glob of `ca*`, however, would match `cat`.
Line 30: Line 37:
    $ ls
    a abc b c
    $ echo *
    a abc b c
    $ echo a*
    a abc
$ ls
a abc b c
$ echo *
a abc b c
$ echo a*
a abc
Line 37: Line 44:
["BASH"] sees the glob, for example `a*`. It ''expands'' this glob, by looking in the current directory and matching it against all files there. Any filenames that match the glob, are enumerated and used in place of the glob. As a result, the statement `echo a*` is replaced by the statement `echo a abc`, and is then executed. Bash sees the glob, for example `a*`. It ''expands'' this glob, by looking in the current directory and matching it against all files there. Any filenames that match the glob are gathered up and sorted, and then the list of filenames is used in place of the glob. As a result, the statement `echo a*` is replaced by the statement `echo a abc`, which is then executed.
Line 39: Line 46:
["BASH"] will always make sure that whitespace and special characters are escaped properly when expanding the glob. For example: When a glob is used to match ''filenames'', the `*` and `?` characters cannot match a slash (`/`) character. So, for instance, the glob `*/bin` might match `foo/bin` but it cannot match `/usr/local/bin`. When globs match ''patterns'', the `/` restriction is removed.

Bash performs filename expansions ''after'' word splitting has already been done. Therefore, filenames generated by a glob will not be split; they will always be handled correctly. For example:
Line 42: Line 51:
    $ touch "a b.txt"
    $ ls
    a b.txt
    $ rm *
    $ ls
$ touch "a b.txt"
$ ls
a b.txt
$ rm *
$ ls
Line 48: Line 57:
Here, `rm *` is expanded into `rm a\ b.txt`. This makes sure that the string `a b.txt` is passed as a single argument to `rm`, since it represents a single file. It is important to understand that using globs to enumerate files is nearly '''always''' a better idea than using `ls` for that purpose. Here's an example with some more complex syntax which we will cover later on, but it will illustrate the problem very well: Here, `*` is expanded into the single filename "`a b.txt`". This filename will be passed as a single argument to `rm`. Using globs to enumerate files is '''always''' a better idea than using {{{`ls`}}} for that purpose. Here's an example with some more complex syntax which we will cover later on, but it will illustrate the reason very well:
Line 51: Line 60:
    $ ls
    a b.txt
    $ for file in `ls`; do rm "$file"; done
    rm: cannot remove `a': No such file or directory
    rm: cannot remove `b.txt': No such file or directory
    $ for file in *; do rm "$file"; done
    $ ls
$ ls
a b.txt
$ for file in `ls`; do rm "$file"; done
rm: cannot remove `a': No such file or directory
rm: cannot remove `b.txt': No such file or directory
$ for file in *; do rm "$file"; done
$ ls
Line 59: Line 68:
Here we use the `for` command to go through the output of the `ls` command. The `ls` command results in a string `a b.txt`. The `for` command splits that string into arguments over which it iterates. As a result, for iterates over `a` and `b.txt`. Naturally, this is '''not''' what we want. The glob however expands in the proper form. It results in the string `a\ b.txt`, which `for` takes as a single argument. Here we use the `for` command to go through the output of the `ls` command. The `ls` command prints the string `a b.txt`. The `for` command splits that string into words over which it iterates. As a result, `for` iterates over first `a`, and then `b.txt`. Naturally, this is '''not''' what we want. The glob, however, expands in the proper form. It results in the string "`a b.txt`", which `for` takes as a single argument.
Line 61: Line 70:
["BASH"] also supports a feature called `Extended Globs`. These globs are more powerful in nature. This feature is turned off by default, but can be turned on with the `shopt` command, which is used to toggle '''sh'''ell '''opt'''ions: In addition to filename expansion, globs may also be used to check whether data matches a specific format. For example, we might be given a filename, and need to take different actions depending on its extension:
{{{
$ filename="somefile.jpg"
$ if [[ $filename = *.jpg ]]; then
> echo "$filename is a jpeg"
> fi
somefile.jpg is a jpeg
}}}

The `[[` keyword and the `case` builtin command (which we will discuss in more detail later) both offer the opportunity to check a string against a glob -- either regular globs, or extended globs, if the latter have been enabled.

--------
 . '''Good Practice: <<BR>> You should always use globs instead of `ls` (or similar) to enumerate files. Globs will always expand safely and minimize the risk for bugs. <<BR>> You can sometimes end up with some very weird filenames. Most scripts aren't tested against all the odd cases that they may end up being used with. Don't let your script be one of those!'''
----
 . '''In The Manual: [[http://www.gnu.org/software/bash/manual/bashref.html#Pattern-Matching|Pattern Matching]]'''
----
 . '''In the FAQ: <<BR>> [[BashFAQ/016|How can I use a logical AND/OR/NOT in a shell pattern (glob)?]]'''
----
 . ''Glob'': A glob is a string that can match certain strings or filenames.
--------

== Extended Globs ==

Bash also supports a feature called ''Extended Globs''. These globs are more powerful in nature; technically, they are equivalent to regular expressions, although the syntax looks different than most people are used to. This feature is turned off by default, but can be turned on with the `shopt` command, which is used to toggle '''sh'''ell '''opt'''ions:
Line 64: Line 96:
    $ shopt -s extglob $ shopt -s extglob
Line 66: Line 98:
Line 71: Line 104:
The list inside the parentheses is a list of globs separated by the `|` character. Here's an example: The list inside the parentheses is a list of globs or extended globs separated by the `|` character. Here's an example:
Line 74: Line 107:
    $ ls
    names.txt tokyo.jpg california.bmp
    $ echo !(*jpg|*bmp)
    names.txt
$ ls
names.txt tokyo.jpg california.bmp
$ echo !(*jpg|*bmp)
names.txt
Line 79: Line 112:
Our glob now expands to anything that does not match the `*jpg` or the `*bmp` pattern. Only the text file passes for that, so it is expanded. Our extended glob expands to anything that does not match the `*jpg` or the `*bmp` pattern. Only the text file passes for that, so it is expanded.
Line 81: Line 114:
Then, there is ''Brace Expansion''. Brace Expansion technically does not fit in the category of Globs, but it is similar. Globs only expand to actual filenames, where brace expansion will expand to any permutation of the pattern. Here's how they work:
<<Anchor(Regular_Expressions)>>
== Regular Expressions ==

Regular expressions (regex) are similar to ''Glob Patterns'', but they can only be used for pattern matching, not for filename matching. Since 3.0, Bash supports the `=~` operator to the `[[` keyword. This operator matches the string that comes before it against the regex pattern that follows it. When the string matches the pattern, `[[` returns with an exit code of `0` ("true"). If the string does not match the pattern, an exit code of `1` ("false") is returned. In case the pattern's syntax is invalid, `[[` will abort the operation and return an exit code of `2`.

Bash uses the ''Extended Regular Expression'' (`ERE`) dialect. We will not cover regexes in depth in this guide, but if you are interested in this concept, please read up on RegularExpression, or [[http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_04|Extended Regular Expressions]].

''Regular Expression'' patterns that use capturing groups (parentheses) will have their captured strings assigned to the `BASH_REMATCH` variable for later retrieval.

Let's illustrate how regex can be used in Bash:
Line 84: Line 127:
    $ echo th{e,a}n
    then than
    $ echo {/home/*,/root}/.*profile
    /home/axxo/.bash_profile /home/lhunath/.profile /root/.bash_profile /root/.profile
    $ echo {1..9}
    1 2 3 4 5 6 7 8 9
    $ echo {0,1}{0..9}
    00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19
$ langRegex='(..)_(..)'
$ if [[ $LANG =~ $langRegex ]]
> then
> echo "Your country code (ISO 3166-1-alpha-2) is ${BASH_REMATCH[2]}."
> echo "Your language code (ISO 639-1) is ${BASH_REMATCH[1]}."
> else
> echo "Your locale was not recognised"
> fi
Line 93: Line 136:
--------
 . '''Good Practice: [[BR]] You should always use globs instead of `ls` (or similar) to enumerate files. Globs will always expand safely and minimize the risk for bugs. [[BR]] You can sometimes end up with some very weird filenames. Generally speaking, scripts aren't always tested against all the odd cases that they may end up being used with.'''
----
 . '''In The Manual: [http://www.gnu.org/software/bash/manual/bashref.html#SEC35 Pattern Matching]'''
----
 . '''In the FAQ: [[BR]] [http://wooledge.org/mywiki/BashFAQ/016 How can I use a logical AND in a shell pattern (glob)?]'''
----
 . ''Glob'': A glob is a string composed of glob meta characters that can match certain strings or filenames.
--------



[[Anchor(Regular_Expressions)]]
=== Regular Expressions ===

''Regular Expressions'' (regex) are similar to ''Glob Patterns'' but cannot be used for filename matching in ["BASH"]. Since `3.0` ["BASH"] supports the `=~` operator to the `[[` built-in. This operator matches the string that comes before it against the regex pattern that follows it. When the string matches the pattern, `[[` returns with an exit code of `0` ("true"). If the string does not match the pattern, an exit code of `1` ("false") is returned. In case the pattern's syntax is invalid, `[[` will abort the operation and return an exit code of `2`.

["BASH"] uses the ''Extended Regular Expression'' (`ERE`) dialect. I will not teach you about regex in this guide, but if you are interested in this concept, please read up on [http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html#tag_09_04 Extended Regular Expressions] or Google for a tutorial.

''Regular Expression'' patterns that use capturing groups will have their captured strings assigned to the `BASH_REMATCH` variable for later retrieval.

Let's illustrate how regex can be used in ["BASH"]:

{{{
    $ if [[ $LANG =~ (..)_(..) ]]
    > then echo "You live in ${BASH_REMATCH[2]} and speak ${BASH_REMATCH[1]}."
    > else echo "Your locale was not recognised"
    > fi
}}}
Be aware that regex parsing in ["BASH"] has changed between releases `3.1` and `3.2`. Before `3.2` it was safe to wrap your regex pattern in quotes but this has changed in `3.2`. Since then, regex should always be unquoted. You should protect any special characters by escaping it using a backslash.

{{{
    $ [[ "My sentence" =~ My\ sentence ]]
}}}
Be careful to escape any characters that the shell could misinterpret, such as whitespace, dollar signs followed by text, braces, etc.
Be aware that regex parsing in Bash has changed between releases `3.1` and `3.2`. Before `3.2` it was safe to wrap your regex pattern in quotes but this has changed in `3.2`. Since then, regex should always be unquoted. You should protect any special characters by escaping it using a backslash. The best way to always be compatible is to put your regex in a variable and expand that variable in `[[` without quotes, as we showed above.
Line 130: Line 139:
 . '''Good Practice: [[BR]] Since the way regex is used in `3.2` is also valid in `3.1` we ''highly'' recommend you just never quote your regex. Remember to keep special characters properly escaped!'''  . '''Good Practice: <<BR>> Since the way regex is used in `3.2` is also valid in `3.1` we ''highly'' recommend you just never quote your regex. Remember to keep special characters properly escaped!'''
 . ''' For cross-compatibility (to avoid having to escape parentheses, pipes and so on) use a variable to store your regex, e.g. {{{re='^\*( >| *Applying |.*\.diff|.*\.patch)'; [[ $var =~ $re ]]}}} This is much easier to maintain since you only write ERE syntax and avoid the need for shell-escaping, as well as being compatible with all 3.x BASH versions.'''
 . See also [[http://tiswww.case.edu/php/chet/bash/FAQ|Chet Ramey's Bash FAQ]], section E14.
Line 132: Line 143:
 . '''In The Manual: [http://www.daemon-systems.org/man/regex.3.html Regex(3)]'''  . '''In The Manual: [[http://www.daemon-systems.org/man/regex.3.html|Regex(3)]]'''
Line 134: Line 145:
 . '''In the FAQ: [[BR]] [http://wooledge.org/mywiki/BashFAQ/066 I want to check if [[ $var == foo || $var == bar || $var == more ... without repeating $var n times.]'''  . '''In the FAQ: <<BR>> [[BashFAQ/066|I want to check if [[ $var == foo or $var == bar or $var == more ... without repeating $var n times.]]'''
Line 138: Line 149:

== Brace Expansion ==

Then, there is ''Brace Expansion''. Brace Expansion technically does not fit in the category of patterns, but it is similar. Globs only expand to actual filenames, but brace expansions will expand to any possible permutation of their contents. Here's how they work:

{{{
$ echo th{e,a}n
then than
$ echo {/home/*,/root}/.*profile
/home/axxo/.bash_profile /home/lhunath/.profile /root/.bash_profile /root/.profile
$ echo {1..9}
1 2 3 4 5 6 7 8 9
$ echo {0,1}{0..9}
00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19
}}}

The brace expansion is replaced by a list of words, just like a glob is. However, these words aren't necessarily filenames, and they are not sorted (`then` would have come after `than` if they were).

Brace expansion happens ''before'' filename expansion. In the second `echo` command above, we used a combination of brace expansion and globs. The brace expansion goes first, and we get:
{{{
$ echo /home/*/.*profile /root/.*profile
}}}

After the brace expansion, the globs are expanded, and we get the filenames as the final result.

Brace expansions can only be used to generate lists of words. They cannot be used for pattern matching.
<<Anchor(EndOfContent)>>
--------
[[BashGuide/Parameters|<- Parameters]] | [[BashGuide/TestsAndConditionals|Tests and Conditionals ->]]

<- Parameters | Tests and Conditionals ->


Patterns

BASH offers three different kinds of pattern matching. Pattern matching serves two roles in the shell: selecting filenames within a directory, or determining whether a string conforms to a desired format.

On the command line you will mostly use globs. These are a fairly straight-forward form of patterns that can easily be used to match a range of files, or to check variables against simple rules.

The second type of pattern matching involves extended globs, which allow more complicated expressions than regular globs.

Since version 3.0, Bash also supports regular expression patterns. These will be useful mainly in scripts to test user input or parse data. (You can't use a regular expression to select filenames; only globs and extended globs can do that.)


  • Pattern: A pattern is a string with a special format designed to match filenames, or to check, classify or validate data strings.


Glob Patterns

Globs are a very important concept in Bash, if only for their incredible convenience. Properly understanding globs will benefit you in many ways. Globs are basically patterns that can be used to match filenames or other strings.

Globs are composed of normal characters and metacharacters. Metacharacters are characters that have a special meaning. These are the metacharacters that can be used in globs:

  • *: Matches any string, including the null string.

  • ?: Matches any single character.

  • [...]: Matches any one of the enclosed characters.

Globs are implicitly anchored at both ends. What this means is that a glob must match a whole string (filename or data string). A glob of a* will not match the string cat, because it only matches the at, not the whole string. A glob of ca*, however, would match cat.

Here's an example of how we can use glob patterns to expand to filenames:

$ ls
a  abc  b  c
$ echo *
a abc b c
$ echo a*
a abc

Bash sees the glob, for example a*. It expands this glob, by looking in the current directory and matching it against all files there. Any filenames that match the glob are gathered up and sorted, and then the list of filenames is used in place of the glob. As a result, the statement echo a* is replaced by the statement echo a abc, which is then executed.

When a glob is used to match filenames, the * and ? characters cannot match a slash (/) character. So, for instance, the glob */bin might match foo/bin but it cannot match /usr/local/bin. When globs match patterns, the / restriction is removed.

Bash performs filename expansions after word splitting has already been done. Therefore, filenames generated by a glob will not be split; they will always be handled correctly. For example:

$ touch "a b.txt"
$ ls
a b.txt
$ rm *
$ ls

Here, * is expanded into the single filename "a b.txt". This filename will be passed as a single argument to rm. Using globs to enumerate files is always a better idea than using `ls` for that purpose. Here's an example with some more complex syntax which we will cover later on, but it will illustrate the reason very well:

$ ls
a b.txt
$ for file in `ls`; do rm "$file"; done
rm: cannot remove `a': No such file or directory
rm: cannot remove `b.txt': No such file or directory
$ for file in *; do rm "$file"; done
$ ls

Here we use the for command to go through the output of the ls command. The ls command prints the string a b.txt. The for command splits that string into words over which it iterates. As a result, for iterates over first a, and then b.txt. Naturally, this is not what we want. The glob, however, expands in the proper form. It results in the string "a b.txt", which for takes as a single argument.

In addition to filename expansion, globs may also be used to check whether data matches a specific format. For example, we might be given a filename, and need to take different actions depending on its extension:

$ filename="somefile.jpg"
$ if [[ $filename = *.jpg ]]; then
> echo "$filename is a jpeg"
> fi
somefile.jpg is a jpeg

The [[ keyword and the case builtin command (which we will discuss in more detail later) both offer the opportunity to check a string against a glob -- either regular globs, or extended globs, if the latter have been enabled.


  • Good Practice:
    You should always use globs instead of ls (or similar) to enumerate files. Globs will always expand safely and minimize the risk for bugs.
    You can sometimes end up with some very weird filenames. Most scripts aren't tested against all the odd cases that they may end up being used with. Don't let your script be one of those!




  • Glob: A glob is a string that can match certain strings or filenames.


Extended Globs

Bash also supports a feature called Extended Globs. These globs are more powerful in nature; technically, they are equivalent to regular expressions, although the syntax looks different than most people are used to. This feature is turned off by default, but can be turned on with the shopt command, which is used to toggle shell options:

$ shopt -s extglob
  • ?(list): Matches zero or one occurrence of the given patterns.

  • *(list): Matches zero or more occurrences of the given patterns.

  • +(list): Matches one or more occurrences of the given patterns.

  • @(list): Matches one of the given patterns.

  • !(list): Matches anything except one of the given patterns.

The list inside the parentheses is a list of globs or extended globs separated by the | character. Here's an example:

$ ls
names.txt  tokyo.jpg  california.bmp
$ echo !(*jpg|*bmp)
names.txt

Our extended glob expands to anything that does not match the *jpg or the *bmp pattern. Only the text file passes for that, so it is expanded.

Regular Expressions

Regular expressions (regex) are similar to Glob Patterns, but they can only be used for pattern matching, not for filename matching. Since 3.0, Bash supports the =~ operator to the [[ keyword. This operator matches the string that comes before it against the regex pattern that follows it. When the string matches the pattern, [[ returns with an exit code of 0 ("true"). If the string does not match the pattern, an exit code of 1 ("false") is returned. In case the pattern's syntax is invalid, [[ will abort the operation and return an exit code of 2.

Bash uses the Extended Regular Expression (ERE) dialect. We will not cover regexes in depth in this guide, but if you are interested in this concept, please read up on RegularExpression, or Extended Regular Expressions.

Regular Expression patterns that use capturing groups (parentheses) will have their captured strings assigned to the BASH_REMATCH variable for later retrieval.

Let's illustrate how regex can be used in Bash:

$ langRegex='(..)_(..)'
$ if [[ $LANG =~ $langRegex ]]
> then
>     echo "Your country code (ISO 3166-1-alpha-2) is ${BASH_REMATCH[2]}."
>     echo "Your language code (ISO 639-1) is ${BASH_REMATCH[1]}."
> else
>     echo "Your locale was not recognised"
> fi

Be aware that regex parsing in Bash has changed between releases 3.1 and 3.2. Before 3.2 it was safe to wrap your regex pattern in quotes but this has changed in 3.2. Since then, regex should always be unquoted. You should protect any special characters by escaping it using a backslash. The best way to always be compatible is to put your regex in a variable and expand that variable in [[ without quotes, as we showed above.


  • Good Practice:
    Since the way regex is used in 3.2 is also valid in 3.1 we highly recommend you just never quote your regex. Remember to keep special characters properly escaped!

  • For cross-compatibility (to avoid having to escape parentheses, pipes and so on) use a variable to store your regex, e.g. re='^\*( >| *Applying |.*\.diff|.*\.patch)'; [[ $var =~ $re ]] This is much easier to maintain since you only write ERE syntax and avoid the need for shell-escaping, as well as being compatible with all 3.x BASH versions.

  • See also Chet Ramey's Bash FAQ, section E14.




  • Regular Expression: A regular expression is a more complex pattern that can be used to match specific strings (but unlike globs cannot expand to filenames).


Brace Expansion

Then, there is Brace Expansion. Brace Expansion technically does not fit in the category of patterns, but it is similar. Globs only expand to actual filenames, but brace expansions will expand to any possible permutation of their contents. Here's how they work:

$ echo th{e,a}n
then than
$ echo {/home/*,/root}/.*profile
/home/axxo/.bash_profile /home/lhunath/.profile /root/.bash_profile /root/.profile
$ echo {1..9}
1 2 3 4 5 6 7 8 9
$ echo {0,1}{0..9}
00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19

The brace expansion is replaced by a list of words, just like a glob is. However, these words aren't necessarily filenames, and they are not sorted (then would have come after than if they were).

Brace expansion happens before filename expansion. In the second echo command above, we used a combination of brace expansion and globs. The brace expansion goes first, and we get:

$ echo /home/*/.*profile /root/.*profile

After the brace expansion, the globs are expanded, and we get the filenames as the final result.

Brace expansions can only be used to generate lists of words. They cannot be used for pattern matching.


<- Parameters | Tests and Conditionals ->

BashGuide/Patterns (last edited 2016-01-15 10:08:43 by google-proxy-66-249-93-205)