Differences between revisions 2 and 3
Revision 2 as of 2008-06-15 13:05:27
Size: 8352
Editor: MrIgli
Comment: [igli] Use a var for =~ cross-compatibility (*much* easier to maintain ;-)
Revision 3 as of 2008-11-22 14:08:33
Size: 8362
Editor: localhost
Comment: converted to 1.6 markup
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
[[Anchor(Patterns)]] <<Anchor(Patterns)>>
Line 8: Line 8:
Since version `3.0`, ["BASH"] also supports ''Regular Expression'' patterns. These will be useful mainly in scripts to test user input or parse data. Since version `3.0`, [[BASH]] also supports ''Regular Expression'' patterns. These will be useful mainly in scripts to test user input or parse data.
Line 16: Line 16:
[[Anchor(Glob_Patterns)]] <<Anchor(Glob_Patterns)>>
Line 19: Line 19:
Globs are a very important concept in ["BASH"], if only for their incredible convenience. Properly understanding globs will benefit you in many ways. Globs are basically patterns that can be used to match filenames or other strings. Globs are a very important concept in [[BASH]], if only for their incredible convenience. Properly understanding globs will benefit you in many ways. Globs are basically patterns that can be used to match filenames or other strings.
Line 36: Line 36:
["BASH"] sees the glob, for example `a*`. It ''expands'' this glob, by looking in the current directory and matching it against all files there. Any filenames that match the glob, are enumerated and used in place of the glob. As a result, the statement `echo a*` is replaced by the statement `echo a abc`, and is then executed. [[BASH]] sees the glob, for example `a*`. It ''expands'' this glob, by looking in the current directory and matching it against all files there. Any filenames that match the glob, are enumerated and used in place of the glob. As a result, the statement `echo a*` is replaced by the statement `echo a abc`, and is then executed.
Line 38: Line 38:
["BASH"] will always make sure that whitespace and special characters are escaped properly when expanding the glob. For example: [[BASH]] will always make sure that whitespace and special characters are escaped properly when expanding the glob. For example:
Line 60: Line 60:
["BASH"] also supports a feature called `Extended Globs`. These globs are more powerful in nature. This feature is turned off by default, but can be turned on with the `shopt` command, which is used to toggle '''sh'''ell '''opt'''ions: [[BASH]] also supports a feature called `Extended Globs`. These globs are more powerful in nature. This feature is turned off by default, but can be turned on with the `shopt` command, which is used to toggle '''sh'''ell '''opt'''ions:
Line 93: Line 93:
 . '''Good Practice: [[BR]] You should always use globs instead of `ls` (or similar) to enumerate files. Globs will always expand safely and minimize the risk for bugs. [[BR]] You can sometimes end up with some very weird filenames. Generally speaking, scripts aren't always tested against all the odd cases that they may end up being used with.'''  . '''Good Practice: <<BR>> You should always use globs instead of `ls` (or similar) to enumerate files. Globs will always expand safely and minimize the risk for bugs. <<BR>> You can sometimes end up with some very weird filenames. Generally speaking, scripts aren't always tested against all the odd cases that they may end up being used with.'''
Line 95: Line 95:
 . '''In The Manual: [http://www.gnu.org/software/bash/manual/bashref.html#SEC35 Pattern Matching]'''  . '''In The Manual: [[http://www.gnu.org/software/bash/manual/bashref.html#SEC35|Pattern Matching]]'''
Line 97: Line 97:
 . '''In the FAQ: [[BR]] [http://wooledge.org/mywiki/BashFAQ/016 How can I use a logical AND in a shell pattern (glob)?]'''  . '''In the FAQ: <<BR>> [[http://wooledge.org/mywiki/BashFAQ/016|How can I use a logical AND in a shell pattern (glob)?]]'''
Line 104: Line 104:
[[Anchor(Regular_Expressions)]] <<Anchor(Regular_Expressions)>>
Line 107: Line 107:
''Regular Expressions'' (regex) are similar to ''Glob Patterns'' but cannot be used for filename matching in ["BASH"]. Since `3.0` ["BASH"] supports the `=~` operator to the `[[` built-in. This operator matches the string that comes before it against the regex pattern that follows it. When the string matches the pattern, `[[` returns with an exit code of `0` ("true"). If the string does not match the pattern, an exit code of `1` ("false") is returned. In case the pattern's syntax is invalid, `[[` will abort the operation and return an exit code of `2`. ''Regular Expressions'' (regex) are similar to ''Glob Patterns'' but cannot be used for filename matching in [[BASH]]. Since `3.0` [[BASH]] supports the `=~` operator to the `[[` built-in. This operator matches the string that comes before it against the regex pattern that follows it. When the string matches the pattern, `[[` returns with an exit code of `0` ("true"). If the string does not match the pattern, an exit code of `1` ("false") is returned. In case the pattern's syntax is invalid, `[[` will abort the operation and return an exit code of `2`.
Line 109: Line 109:
["BASH"] uses the ''Extended Regular Expression'' (`ERE`) dialect. I will not teach you about regex in this guide, but if you are interested in this concept, please read up on [http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html#tag_09_04 Extended Regular Expressions] or Google for a tutorial. [[BASH]] uses the ''Extended Regular Expression'' (`ERE`) dialect. I will not teach you about regex in this guide, but if you are interested in this concept, please read up on [[http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html#tag_09_04|Extended Regular Expressions]] or Google for a tutorial.
Line 113: Line 113:
Let's illustrate how regex can be used in ["BASH"]: Let's illustrate how regex can be used in [[BASH]]:
Line 121: Line 121:
Be aware that regex parsing in ["BASH"] has changed between releases `3.1` and `3.2`. Before `3.2` it was safe to wrap your regex pattern in quotes but this has changed in `3.2`. Since then, regex should always be unquoted. You should protect any special characters by escaping it using a backslash. Be aware that regex parsing in [[BASH]] has changed between releases `3.1` and `3.2`. Before `3.2` it was safe to wrap your regex pattern in quotes but this has changed in `3.2`. Since then, regex should always be unquoted. You should protect any special characters by escaping it using a backslash.
Line 129: Line 129:
 . '''Good Practice: [[BR]] Since the way regex is used in `3.2` is also valid in `3.1` we ''highly'' recommend you just never quote your regex. Remember to keep special characters properly escaped!'''  . '''Good Practice: <<BR>> Since the way regex is used in `3.2` is also valid in `3.1` we ''highly'' recommend you just never quote your regex. Remember to keep special characters properly escaped!'''
Line 132: Line 132:
 . '''In The Manual: [http://www.daemon-systems.org/man/regex.3.html Regex(3)]'''  . '''In The Manual: [[http://www.daemon-systems.org/man/regex.3.html|Regex(3)]]'''
Line 134: Line 134:
 . '''In the FAQ: [[BR]] [http://wooledge.org/mywiki/BashFAQ/066 I want to check if [[ $var == foo || $var == bar || $var == more ... without repeating $var n times.]'''  . '''In the FAQ: <<BR>> [[http://wooledge.org/mywiki/BashFAQ/066|I want to check if [[ $var == foo || $var == bar || $var == more ... without repeating $var n times.]]'''

Patterns

Patterns are strings that are used to match a whole range of strings. They have a special format depending on the pattern dialect which describes the kinds of strings that they match. Regular Expression patterns can even be used to grab certain pieces out of the strings they match.

On the command line you will mostly use Glob Patterns. They are a fairly straight-forward form of patterns that can easily be used to match a range of files.

Since version 3.0, BASH also supports Regular Expression patterns. These will be useful mainly in scripts to test user input or parse data.


  • Pattern: A pattern is a string with a special format designed to be a sort of key that matches several other strings of a kind.


Glob Patterns

Globs are a very important concept in BASH, if only for their incredible convenience. Properly understanding globs will benefit you in many ways. Globs are basically patterns that can be used to match filenames or other strings.

Globs are composed of normal characters and meta characters. Meta characters are characters that have a special meaning. These are the basic meta characters:

  • *: Matches any string, including the null string.

  • ?: Matches any single character.

  • [...]: Matches any one of the enclosed characters.

Here's an example of how we can use glob patterns to expand to filenames:

    $ ls
    a  abc  b  c
    $ echo *
    a abc b c
    $ echo a*
    a abc

BASH sees the glob, for example a*. It expands this glob, by looking in the current directory and matching it against all files there. Any filenames that match the glob, are enumerated and used in place of the glob. As a result, the statement echo a* is replaced by the statement echo a abc, and is then executed.

BASH will always make sure that whitespace and special characters are escaped properly when expanding the glob. For example:

    $ touch "a b.txt"
    $ ls
    a b.txt
    $ rm *
    $ ls

Here, rm * is expanded into rm a\ b.txt. This makes sure that the string a b.txt is passed as a single argument to rm, since it represents a single file. It is important to understand that using globs to enumerate files is nearly always a better idea than using ls for that purpose. Here's an example with some more complex syntax which we will cover later on, but it will illustrate the problem very well:

    $ ls
    a b.txt
    $ for file in `ls`; do rm "$file"; done
    rm: cannot remove `a': No such file or directory
    rm: cannot remove `b.txt': No such file or directory
    $ for file in *; do rm "$file"; done
    $ ls

Here we use the for command to go through the output of the ls command. The ls command results in a string a b.txt. The for command splits that string into arguments over which it iterates. As a result, for iterates over a and b.txt. Naturally, this is not what we want. The glob however expands in the proper form. It results in the string a\ b.txt, which for takes as a single argument.

BASH also supports a feature called Extended Globs. These globs are more powerful in nature. This feature is turned off by default, but can be turned on with the shopt command, which is used to toggle shell options:

    $ shopt -s extglob
  • ?(list): Matches zero or one occurrence of the given patterns.

  • *(list): Matches zero or more occurrences of the given patterns.

  • +(list): Matches one or more occurrences of the given patterns.

  • @(list): Matches one of the given patterns.

  • !(list): Matches anything except one of the given patterns.

The list inside the parentheses is a list of globs separated by the | character. Here's an example:

    $ ls
    names.txt  tokyo.jpg  california.bmp
    $ echo !(*jpg|*bmp)
    names.txt

Our glob now expands to anything that does not match the *jpg or the *bmp pattern. Only the text file passes for that, so it is expanded.

Then, there is Brace Expansion. Brace Expansion technically does not fit in the category of Globs, but it is similar. Globs only expand to actual filenames, where brace expansion will expand to any permutation of the pattern. Here's how they work:

    $ echo th{e,a}n
    then than
    $ echo {/home/*,/root}/.*profile
    /home/axxo/.bash_profile /home/lhunath/.profile /root/.bash_profile /root/.profile
    $ echo {1..9}
    1 2 3 4 5 6 7 8 9
    $ echo {0,1}{0..9}
    00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19


  • Good Practice:
    You should always use globs instead of ls (or similar) to enumerate files. Globs will always expand safely and minimize the risk for bugs.
    You can sometimes end up with some very weird filenames. Generally speaking, scripts aren't always tested against all the odd cases that they may end up being used with.




  • Glob: A glob is a string composed of glob meta characters that can match certain strings or filenames.


Regular Expressions

Regular Expressions (regex) are similar to Glob Patterns but cannot be used for filename matching in BASH. Since 3.0 BASH supports the =~ operator to the [[ built-in. This operator matches the string that comes before it against the regex pattern that follows it. When the string matches the pattern, [[ returns with an exit code of 0 ("true"). If the string does not match the pattern, an exit code of 1 ("false") is returned. In case the pattern's syntax is invalid, [[ will abort the operation and return an exit code of 2.

BASH uses the Extended Regular Expression (ERE) dialect. I will not teach you about regex in this guide, but if you are interested in this concept, please read up on Extended Regular Expressions or Google for a tutorial.

Regular Expression patterns that use capturing groups will have their captured strings assigned to the BASH_REMATCH variable for later retrieval.

Let's illustrate how regex can be used in BASH:

    $ if [[ $LANG =~ (..)_(..) ]]
    > then echo "You live in ${BASH_REMATCH[2]} and speak ${BASH_REMATCH[1]}."
    > else echo "Your locale was not recognised"
    > fi

Be aware that regex parsing in BASH has changed between releases 3.1 and 3.2. Before 3.2 it was safe to wrap your regex pattern in quotes but this has changed in 3.2. Since then, regex should always be unquoted. You should protect any special characters by escaping it using a backslash.

    $ [[ "My sentence" =~ My\ sentence ]]

Be careful to escape any characters that the shell could misinterpret, such as whitespace, dollar signs followed by text, braces, etc.


  • Good Practice:
    Since the way regex is used in 3.2 is also valid in 3.1 we highly recommend you just never quote your regex. Remember to keep special characters properly escaped!

  • For cross-compatibility (to avoid having to escape parentheses, pipes and so on) use a variable to store your regex eg re='^\*( >| *Applying |.*\.diff|.*\.patch)'; $var =~ $re This is much easier to maintain since you only write ERE syntax and avoid the need for shell-escaping, as well as being compatible with all 3.x BASH.




  • Regular Expression: A regular expression is a more complex pattern that can be used to match specific strings (but unlike globs cannot expand to filenames).


BashGuide/Patterns (last edited 2016-01-15 10:08:43 by google-proxy-66-249-93-205)