Diff for "BashFAQ/054"

Differences between revisions 19 and 50 (spanning 31 versions)

How can I tell whether a variable contains a valid number?

First, you have to define what you mean by "number". The most common case when people ask this seems to be "a non-negative integer, with no leading + sign". Or in other words, a string of all digits. Other times, people want to validate a floating-point input, with optional sign and optional decimal point.

Hand parsing

If you're validating a simple "string of digits", you can do it with a glob:

# Bash / Ksh
if [[ -n $foo && $foo != *[!0123456789]* ]]; then
    printf '"%s" is strictly numeric\n' "$foo"
else
    printf '"%s" has a non-digit somewhere in it or is empty\n' "$foo"
fi >&2

Avoid [0-9] or [[:digit:]] which in some locales and some systems can match characters other than 0123456789.

The same thing can be done in POSIX shells as well, using case:

# POSIX
case $var in
    '')
        printf 'var is empty\n';;
    *[!0123456789]*)
        printf '%s has a non-digit somewhere in it\n' "$var";;
    *)
        printf '%s is strictly numeric\n' "$var";;
esac >&2

Of course, if all you care about is valid vs. invalid, you can combine cases:

# POSIX
case $var in
    '' | *[!0123456789]*)
        printf '%s\n' "$0: $var: invalid digit" >&2; exit 1;;
esac

If you need to allow a leading negative sign, or if want a valid floating-point number or something else more complex, then there are a few possible ways. Standard globs aren't expressive enough to do this, but you can trim off any sign and then compare:

# POSIX
case ${var#[-+]} in   # notice ${var#prefix} substitution to trim sign
    '')
        printf 'var is empty\n';;
    .)
        printf 'var is just a dot\n';;
    *.*.*)
        printf '"%s" has more than one decimal point in it\n' "$var";;
    *[!0123456789.]*)
        printf '"%s" has a non-digit somewhere in it\n' "$var";;
    *)
        printf '"%s" looks like a valid float\n' "$var";;
esac >&2

Or in Bash, we can use extended globs:

# Bash -- extended globs must be enabled explicitly in versions prior to 4.1.
# Check whether the variable is all digits.
shopt -s extglob
[[ $var = +([0123456789]) ]]

A more complex case:

# Bash / ksh
shopt -s extglob

if [[ $foo = @(*[0123456789]*|!([+-]|)) && $foo = ?([+-])*([0123456789])?(.*([0123456789])) ]]; then
  echo 'foo is a floating-point number'
fi

Optionally, case..esac may have been used in shells with extended pattern matching. The leading test of $foo is to ensure that it contains at least one digit, isn't empty, and contains more than just + or - by itself.

If your definition of "a valid number" is even more complex, or if you need a solution that works in legacy Bourne shells, you might prefer to use an external tool's regular expression syntax. Here is a portable version (explained in detail here), using awk (not egrep which is line-based so would be tricked by variables that contain newline characters):

# Bourne
 
if awk -- 'BEGIN {exit !(ARGV[1] ~ /^[-+]?([0123456789]+\.?|[0123456789]*\.[0123456789]+)$/)}' "$foo"; then
    printf '"%s" is a number\n' "$foo"
else
    printf '"%s" is not a number\n' "$foo"
fi

Bash version 3 and above have regular expression support in the [[...]] construct.

# Bash
# The regexp must be stored in a var and expanded for backward compatibility with versions < 3.2

regexp='^[-+]?[0123456789]*(\.[0123456789]*)?$'
if [[ $foo = *[0123456789]* && $foo =~ $regexp ]]; then
    printf '"%s" looks rather like a number\n' "$foo"
else
    printf '"%s" doesn't look particularly numeric to me\n' "$foo"
fi

Using the parsing done by [ and printf (or "using eq")

# fails with ksh
if [ "$foo" -eq "$foo" ] 2>/dev/null; then
 echo "$foo is an integer"
fi

[ parses the variable and interprets it a decimal integer because of the -eq. If the parsing succeeds the test is trivially true; if it fails [ prints an error message that 2>/dev/null hides and sets a status different from 0. However this method fails if the shell is ksh, because ksh evaluates the variable as an arithmetic expression (and that would constitute an arbitrary command injection vulnerability).

Be careful: the following trick with printf (not supported by all shells, and the list of supported float representations varies with the shell as well; not to mention the command injection vulnerability in ksh or zsh)

if printf %f "$foo" >/dev/null 2>&1; then
  printf '"%s" is a float\n' "$foo"
fi

is broken: about the arguments of the a, A, e, E, f, F, g, or G format modifiers, POSIX specifies that if the leading character is a single-quote or double-quote, the value shall be the numeric value in the underlying codeset of the character following the single-quote or double-quote. Hence this fails when foo expands to a string with a leading single-quote or double-quote: the previous command will happily validate the string as a float. It also returns 0 when foo expands to a number with a leading 0x, which is a valid number in a shell script but may not work elsewhere.

You can use %d to parse an integer. Take care that the parsing might be (is supposed to be?) locale-dependent.

-  ⇤ ← Revision 19 as of 2010-01-15 23:59:45 → 
  Size: 6033
  Editor: GreyCat
  Comment: restore some deleted content; clean up the previous "clean-up"
+   ← Revision 50 as of 2020-10-15 10:20:40 → ⇥
  Size: 5521
  Editor: StephaneChazelas
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 9:
-# Bash
if [[ $foo = *[^0-9]* ]]; then
    echo "'$foo' has a non-digit somewhere in it"
+# Bash / Ksh
if [[ -n $foo && $foo != *[!0123456789]* ]]; then
    printf '"%s" is strictly numeric\n' "$foo"
 Line 13:
-    echo "'$foo' is strictly numeric"
fi
+    printf '"%s" has a non-digit somewhere in it or is empty\n' "$foo"
fi >&2
 Line 17:
-The same thing can be done in Korn and POSIX shells as well, using {{{case}}}:
+Avoid `[0-9]` or `[[:digit:]]` which in some locales and some systems can match characters other than 0123456789.

The same thing can be done in POSIX shells as well, using {{{case}}}:
-Line 20:
+Line 22:
-# ksh, POSIX
case "$foo" in
    *[!0-9]*) echo "'$foo' has a non-digit somewhere in it" ;;
    *) echo "'$foo' is strictly numeric" ;;
+# POSIX
case $var in
    '')
        printf 'var is empty\n';;
    *[!0123456789]*)
        printf '%s has a non-digit somewhere in it\n' "$var";;
    *)
        printf '%s is strictly numeric\n' "$var";;
esac >&2
}}}
Of course, if all you care about is valid vs. invalid, you can combine cases:

{{{
# POSIX
case $var in
    '' | *[!0123456789]*)
        printf '%s\n' "$0: $var: invalid digit" >&2; exit 1;;
-Line 26:
+Line 41:
-If you need to allow a leading negative sign, or if want a valid floating-point number or something else more complex, then there are a few possible ways.  Standard globs aren't expressive enough to do this, but we can use [[glob|extended globs]]:
+If you need to allow a leading negative sign, or if want a valid floating-point number or something else more complex, then there are a few possible ways.  Standard globs aren't expressive enough to do this, but you can trim off any sign and then compare:
-Line 30:
+Line 44:
-# Bash -- extended globs must be enabled.
+# POSIX
case ${var#[-+]} in   # notice ${var#prefix} substitution to trim sign
    '')
        printf 'var is empty\n';;
    .)
        printf 'var is just a dot\n';;
    *.*.*)
        printf '"%s" has more than one decimal point in it\n' "$var";;
    *[!0123456789.]*)
        printf '"%s" has a non-digit somewhere in it\n' "$var";;
    *)
        printf '"%s" looks like a valid float\n' "$var";;
esac >&2
}}}
Or in Bash, we can use [[glob|extended globs]]:

{{{
# Bash -- extended globs must be enabled explicitly in versions prior to 4.1.
-Line 33:
+Line 64:
-[[ $var == +([0-9]) ]]
+[[ $var = +([0123456789]) ]]
-Line 35:
+Line 66:
-Line 39:
+Line 69:
-# Bash
+# Bash / ksh
-Line 41:
+Line 71:
-[[ $foo = *[0-9]* && $foo = ?([+-])*([0-9])?(.*([0-9])) ]] &&
  echo "foo is a floating-point number"
+if [[ $foo = @(*[0123456789]*|!([+-]|)) && $foo = ?([+-])*([0123456789])?(.*([0123456789])) ]]; then
  echo 'foo is a floating-point number'
fi
-Line 44:
+Line 76:
+Optionally, `case..esac` may have been used in shells with extended pattern matching. The leading test of {{{$foo}}} is to ensure that it contains at least one digit, isn't empty, and contains more than just + or - by itself.
-Line 45:
+Line 78:
-The leading test of {{{$foo}}} is to ensure that it contains at least one digit.  The extended glob, by itself, would match the empty string, or a lone {{{+}}} or {{{-}}}, which may not be desirable behavior.

Korn shell has extended globs enabled by default, but lacks `[[`, so we must use `case` to do the glob-matching:

{{{
# Korn
case $foo in
  *[0-9]*)
    case $foo in
        ?([+-])*([0-9])?(.*([0-9]))) echo "foo is a number";;
    esac;;
esac
}}}

Note that this uses the same extended glob as the Bash example before it; the third closing parenthesis at the end of it is actually part of the case syntax.

If your definition of "a valid number" is even more complex, or if you need a solution that works in legacy Bourne shells, you might prefer to use an external tool's [[RegularExpression|regular expression]] syntax.  Here is a portable version, using {{{egrep}}}:
+If your definition of "a valid number" is even more complex, or if you need a solution that works in legacy Bourne shells, you might prefer to use an external tool's [[RegularExpression|regular expression]] syntax.  Here is a portable version (explained in detail [[http://www.wplug.org/wiki/Meeting-20100612#EXERCISE_TWO|here]]), using {{{awk}}} (not `egrep` which is line-based so would be tricked by variables that contain newline characters):
-Line 65:
+Line 82:
-if test "$foo" && echo "$foo" | egrep '^[-+]?[0-9]*(\.[0-9]*)?$' >/dev/null
then
    echo "'$foo' might be a number"
+  if awk -- 'BEGIN {exit !(ARGV[1] ~ /^[-+]?([0123456789]+\.?|[0123456789]*\.[0123456789]+)$/)}' "$foo"; then
    printf '"%s" is a number\n' "$foo"
-Line 69:
+Line 86:
-    echo "'$foo' might not be a number"
+    printf '"%s" is not a number\n' "$foo"
-Line 72:
+Line 89:
-(Like the extended globs, this [[RegularExpression|extended regular expression]] will match a lone {{{+}}} or {{{-}}}.  The initial {{{test}}} command only requires a non-empty string.  Closing the last "bug" is left as an exercise for the reader, mostly because GreyCat is too damned lazy to learn {{{expr(1)}}}.)

Bash version 3 and above have regular expression support in the [[ command.  Due to bugs and changes in the implementation of the `=~` feature throughout bash 3.x, we '''do not recommend''' using it, but people do it anyway, so we have to maintain this example (''and keep restoring this warning, too, when people delete it''):
+Bash version 3 and above have regular expression support in the `[[...]]` construct.
-Line 79:
+Line 93:
-# Put the RE in a var for backward compatibility with versions <3.2
regexp='^[-+]?[0-9]*(\.[0-9]*)?$' 
if [[ $foo = *[0-9]* && $foo =~ $var ]]; then
    echo "'$foo' looks rather like a number"
+# The regexp must be stored in a var and expanded for backward compatibility with versions < 3.2

regexp='^[-+]?[0123456789]*(\.[0123456789]*)?$'
if [[ $foo = *[0123456789]* && $foo =~ $regexp ]]; then
    printf '"%s" looks rather like a number\n' "$foo"
-Line 84:
+Line 99:
-    echo "'$foo' doesn't look particularly numeric to me"
+    printf '"%s" doesn't look particularly numeric to me\n' "$foo"
-Line 87:
+Line 102:
+=== Using the parsing done by [ and printf (or "using eq") ===
{{{
# fails with ksh
if [ "$foo" -eq "$foo" ] 2>/dev/null; then
 echo "$foo is an integer"
fi
}}}
`[` parses the variable and interprets it a decimal integer because of the `-eq`. If the parsing succeeds the test is trivially true; if it fails `[` prints an error message that `2>/dev/null` hides and sets a status different from 0.  However this method fails if the shell is ksh, because ksh evaluates the variable as an arithmetic expression (and that would constitute an arbitrary command injection vulnerability).
-Line 88:
+Line 111:
-=== Using the parsing done by [ and printf ===
+Be careful: the following trick with `printf` (not supported by all shells, and the list of supported float representations varies with the shell as well; not to mention the command injection vulnerability in ksh or zsh)
-Line 91:
+Line 114:
-# fails with ksh
if [ "$foo" -eq "$foo" ] 2>/dev/null;then
 echo "$foo is an integer"
fi
}}} 

`[` parses the variable and interprets it as in integer because of the `-eq`. If the parsing succeds the test is trivially true; if it fails `[` prints an error message that `2>/dev/null` hides and sets a status different from 0.  However this method fails if the shell is ksh, because ksh evaluates the variable as an arithmetic expression.

You can use a similar trick with `printf`:
{{{
# POSIX
if printf "%f" "$foo" >/dev/null 2>&1; then
  echo "$foo is a float"
+if printf %f "$foo" >/dev/null 2>&1; then
  printf '"%s" is a float\n' "$foo"
-Line 106:
+Line 118:
+is broken: about the arguments of the{{{ a}}}, {{{A}}}, {{{e}}}, {{{E}}}, {{{f}}}, {{{F}}}, {{{g}}}, or {{{G}}} format modifiers, POSIX specifies that ''if the leading character is a single-quote or double-quote, the value shall be the numeric value in the underlying codeset of the character following the single-quote or double-quote.'' Hence this fails when {{{foo}}} expands to a string with a leading single-quote or double-quote: the previous command will happily validate the string as a float.
It also returns 0 when {{{foo}}} expands to a number with a leading {{{0x}}}, which is a valid number in a shell script but may not work elsewhere.
-Line 108:
+Line 122:
-=== Using the integer type ===

If you just want to guarantee ahead of time that a variable contains an integer, without actually checking, you can give the variable the "integer" attribute.

{{{
# Bash
declare -i foo
foo=-10+1; echo "$foo"    # prints -9

foo="hello"; echo "$foo"
# the value of the variable "hello" is evaluated; if unset, foo is 0

foo="Some random string"  # results in an error.
}}}

Any value assigned to a variable with the integer attribute set is evaluated as an [[ArithmeticExpression|arithmetic expression]] just like inside `$(( ))`.  Bash will raise an error if you try to assign an invalid arithmetic expression.

In Bash and ksh93, if a variable which has been declared integer is used in a `read` command, the user's input is treated as an [[ArithmeticExpression|arithmetic expression]],  as with assignment.  In particular, if the user types an identifier, the variable will be set to the value of the variable with that name, and `read` will give no other indication of a problem.

{{{
# Bash (and ksh93, if you replace declare with typeset)
$ declare -i foo
$ read foo
hello
$ echo $foo    # prints 0; 'hello' is unset, so is treated as 0 for arithmetic purposes
$ hello=5
$ read foo     # user types hello again
hello
$ echo $foo    # prints 5, the value of 'hello' as an arithmetic expression
}}}

Pretty useless if you want to read only integers.

In the older Korn shell (ksh88), if a variable is declared integer and used in a `read` command, and the user types an invalid integer, the shell complains, the read command returns an error status, and the value of the variable is unchanged.

{{{
# ksh88
$ typeset -i foo
$ foo=42
$ read foo
hello
ksh: hello: bad number
$ echo $?
1
$ echo $foo
42
}}}