How do I determine whether a variable contains a substring?

There are many choices here: you can perform an exact substring match, or a glob-style pattern match, or a RegularExpression match.

To match exact substrings, POSIX sh uses case:

# POSIX
case $bigvar in
    *substr*) ... ;;

If the substring is in a variable, and if an exact substring match is wanted, then the substring variable should be quoted:

# POSIX
case $bigvar in
    *"$substr"*) ... ;;

In Bash, you may also use the [[...]] construct. It follows the same quoting semantics as case:

# Bash
if [[ $bigvar = *substr* ]]; then ...     # These are
if [[ $bigvar == *substr* ]]; then ...    # equivalent.

if [[ $bigvar = *"$substr"* ]]; then ...  # These are
if [[ $bigvar == *"$substr"* ]]; then ... # equivalent.

In both case and [[...]] you may also do glob-style pattern matching. Simply use unquoted glob characters in the pattern. If the pattern is in a variable, omit the double quotes, and it will be interpreted as a pattern instead of an exact substring.

# POSIX
case $filename in
    *.txt) ... ;;

pattern='*.txt'
case $filename in
    $pattern) ... ;;

# Bash
if [[ $filename = *.txt ]]; then ...
if [[ $filename == *.txt ]]; then ...

pattern='*.txt'
if [[ $filename = $pattern ]]; then ...
if [[ $filename == $pattern ]]; then ...

Since Bash 4.1, ksh88 extended glob operators are recognised in [[...]] even when the extglob option is not enabled.

In Bash 3.x or later, [[..]] can also do Extended Regular Expression (ERE) matches using the =~ operator:

# Bash
# Matches ac, zabcz, xabbbbcq, etc.
re='ab*c'
if [[ $foo =~ $re ]]; then ...

Storing the regular expression in a variable and using =~ $variable (where $variable is not quoted) is strongly recommended, as it avoids many undesirable surprises.

POSIX sh has no builtin regular expression matching operator, but you can call standard utilities such as awk, expr or grep to do it (which may or may not be implemented as shell builtins; they are not in Bash).

# POSIX
ere_match() { awk -- 'BEGIN{exit !(ARGV[1] ~ ARGV[2])}' "$@"; }
if ere_match "$foo" "$re"; then

# With expr, leading anchors are implied. An initial .* works around this. We
# also need to prefix the subject with a character or string not starting with -
# and that is not found at the start of any expr operator present or future.
if expr "@$foo" : "@.*$re" > /dev/null; then ...

# Grep can only be used for single-line strings, but can do case insensitive matching
# with -i. -x can be used to anchor at both start or end or use the usual ^ or $
if printf '%s\n' "$foo" | grep -q -- "$re"; then ...

expr uses Basic Regular Expressions (BRE); grep defaults to BRE, but may use ERE with the -E option, awk uses an ERE variant that also recognises some C-like escape sequences such as \n (at least with some awk implementation when the regexp is not literal like here).

For more hints on string manipulations in Bash, see FAQ #100.


CategoryShell

BashFAQ/041 (last edited 2025-04-19 15:51:39 by StephaneChazelas)