Diff for "BashFAQ/054"

Differences between revisions 10 and 53 (spanning 43 versions)

How can I tell whether a variable contains a valid number?

First, you have to define what you mean by "number". The most common case when people ask this seems to be "a non-negative integer, with no leading + sign". Or in other words, a string of all digits. Other times, people want to validate a floating-point input, with optional sign and optional decimal point.

Hand parsing

If you're validating a simple "string of digits", you can do it with a glob:

   1 # Bash / Ksh
   2 if [[ -n $foo && $foo != *[!0123456789]* ]]; then
   3     printf '"%s" is strictly numeric\n' "$foo"
   4 else
   5     printf '"%s" has a non-digit somewhere in it or is empty\n' "$foo"
   6 fi >&2

Avoid [0-9] or [[:digit:]] which in some locales and some systems can match characters other than 0123456789.

The same thing can be done in POSIX shells as well, using case:

   1 # POSIX
   2 case $var in
   3     '')
   4         printf 'var is empty\n';;
   5     *[!0123456789]*)
   6         printf '%s has a non-digit somewhere in it\n' "$var";;
   7     *)
   8         printf '%s is strictly numeric\n' "$var";;
   9 esac >&2

Of course, if all you care about is valid vs. invalid, you can combine cases:

   1 # POSIX
   2 case $var in
   3     '' | *[!0123456789]*)
   4         printf '%s\n' "$0: $var: invalid digit" >&2; exit 1;;
   5 esac

If you need to allow a leading negative sign, or if want a valid floating-point number or something else more complex, then there are a few possible ways. Standard globs aren't expressive enough to do this, but you can trim off any sign and then compare:

   1 # POSIX
   2 case ${var#[-+]} in   # notice ${var#prefix} substitution to trim sign
   3     '')
   4         printf 'var is empty\n';;
   5     .)
   6         printf 'var is just a dot\n';;
   7     *.*.*)
   8         printf '"%s" has more than one decimal point in it\n' "$var";;
   9     *[!0123456789.]*)
  10         printf '"%s" has a non-digit somewhere in it\n' "$var";;
  11     *)
  12         printf '"%s" looks like a valid float\n' "$var";;
  13 esac >&2

Or in Bash, we can use extended globs:

   1 # Bash -- extended globs must be enabled explicitly in versions prior to 4.1.
   2 # Check whether the variable is all digits.
   3 shopt -s extglob
   4 [[ $var = +([0123456789]) ]]

A more complex case:

   1 # Bash / ksh
   2 shopt -s extglob # not necessary in ksh and bash 4.1 or newer
   3 
   4 if [[ $foo = @(*[0123456789]*|!([+-]|)) && $foo = ?([+-])*([0123456789])?(.*([0123456789])) ]]; then
   5   echo 'foo is a floating-point number'
   6 fi

Optionally, case..esac may have been used in shells with extended pattern matching. The leading test of $foo is to ensure that it contains at least one digit, isn't empty, and contains more than just + or - by itself.

If your definition of "a valid number" is even more complex, or if you need a solution that works in legacy Bourne shells, you might prefer to use an external tool's regular expression syntax. Here is a portable version (explained in detail here), using awk (not egrep which is line-based so would be tricked by variables that contain newline characters):

   1 # Bourne
   2 
   3 if awk -- 'BEGIN {exit !(ARGV[1] ~ /^[-+]?([0123456789]+\.?|[0123456789]*\.[0123456789]+)$/)}' "$foo"; then
   4     printf '"%s" is a number\n' "$foo"
   5 else
   6     printf '"%s" is not a number\n' "$foo"
   7 fi

Bash version 3 and above have regular expression support in the [[...]] construct.

   1 # Bash
   2 # The regexp must be stored in a var and expanded for backward compatibility with versions < 3.2
   3 
   4 regexp='^[-+]?[0123456789]*(\.[0123456789]*)?$'
   5 if [[ $foo = *[0123456789]* && $foo =~ $regexp ]]; then
   6     printf '"%s" looks rather like a number\n' "$foo"
   7 else
   8     printf '"%s" doesn't look particularly numeric to me\n' "$foo"
   9 fi

Using the parsing done by [ and printf (or "using eq")

   1 # fails with ksh
   2 if [ "$foo" -eq "$foo" ] 2>/dev/null; then
   3     printf '"%s" is an integer\n' "$foo"
   4 fi

[ parses the variable and interprets it a decimal integer because of the -eq. If the parsing succeeds the test is trivially true; if it fails [ prints an error message that 2>/dev/null hides and sets a status different from 0. However this method fails if the shell is ksh, because ksh evaluates the variable as an arithmetic expression (and that would constitute an arbitrary command injection vulnerability).

Be careful: the following trick with printf (not supported by all shells, and the list of supported float representations varies with the shell as well; not to mention the command injection vulnerability in ksh or zsh)

   1 if printf %f "$foo" >/dev/null 2>&1; then
   2     printf '"%s" is a float\n' "$foo"
   3 fi

is broken: about the arguments of the a, A, e, E, f, F, g, or G format modifiers, POSIX specifies that if the leading character is a single-quote or double-quote, the value shall be the numeric value in the underlying codeset of the character following the single-quote or double-quote. Hence this fails when foo expands to a string with a leading single-quote or double-quote: the previous command will happily validate the string as a float. It also returns 0 when foo expands to a number with a leading 0x, which is a valid number in a shell script but may not work elsewhere.

You can use %d to parse an integer. Take care that the parsing might be (is supposed to be?) locale-dependent.

-  ⇤ ← Revision 10 as of 2008-03-06 04:59:38 → 
  Size: 4676
  Editor: pgas
  Comment: move the declare -i stuff down as it is a bit different, change the examples
+   ← Revision 53 as of 2022-04-19 05:23:33 → ⇥
  Size: 5741
  Editor: emanuele6
  Comment: clarify that that enabling the extglob shopt is not necessary to use extglob patterns inside of [[ in bash4.1
-Deletions are marked like this.
+Additions are marked like this.
 Line 1:
-[[Anchor(faq54)]]
+<<Anchor(faq54)>>
 Line 3:
+First, you have to define what you mean by "number".  The most common case when people ask this seems to be "a non-negative integer, with no leading + sign".  Or in other words, a string of all digits.  Other times, people want to validate a floating-point input, with optional sign and optional decimal point.
-Line 4:
+Line 5:
-First, you have to define what you mean by "number".  The most common case seems to be that, when people ask this, they actually mean "a non-negative integer, with no leading + sign".  Or in other words, a string of all digits.
+=== Hand parsing ===
If you're validating a simple "string of digits", you can do it with a [[glob]]:
-Line 6:
+Line 8:
-{{{
if [[ $foo = *[^0-9]* ]]; then
    echo "'$foo' has a non-digit somewhere in it"
+{{{#!highlight bash
# Bash / Ksh
if [[ -n $foo && $foo != *[!0123456789]* ]]; then
    printf '"%s" is strictly numeric\n' "$foo"
-Line 10:
+Line 13:
-    echo "'$foo' is strictly numeric"
+    printf '"%s" has a non-digit somewhere in it or is empty\n' "$foo"
fi >&2
}}}

Avoid `[0-9]` or `[[:digit:]]` which in some locales and some systems can match characters other than 0123456789.

The same thing can be done in POSIX shells as well, using {{{case}}}:

{{{#!highlight bash
# POSIX
case $var in
    '')
        printf 'var is empty\n';;
    *[!0123456789]*)
        printf '%s has a non-digit somewhere in it\n' "$var";;
    *)
        printf '%s is strictly numeric\n' "$var";;
esac >&2
}}}
Of course, if all you care about is valid vs. invalid, you can combine cases:

{{{#!highlight bash
# POSIX
case $var in
    '' | *[!0123456789]*)
        printf '%s\n' "$0: $var: invalid digit" >&2; exit 1;;
esac
}}}
If you need to allow a leading negative sign, or if want a valid floating-point number or something else more complex, then there are a few possible ways.  Standard globs aren't expressive enough to do this, but you can trim off any sign and then compare:

{{{#!highlight bash
# POSIX
case ${var#[-+]} in   # notice ${var#prefix} substitution to trim sign
    '')
        printf 'var is empty\n';;
    .)
        printf 'var is just a dot\n';;
    *.*.*)
        printf '"%s" has more than one decimal point in it\n' "$var";;
    *[!0123456789.]*)
        printf '"%s" has a non-digit somewhere in it\n' "$var";;
    *)
        printf '"%s" looks like a valid float\n' "$var";;
esac >&2
}}}
Or in Bash, we can use [[glob|extended globs]]:

{{{#!highlight bash
# Bash -- extended globs must be enabled explicitly in versions prior to 4.1.
# Check whether the variable is all digits.
shopt -s extglob
[[ $var = +([0123456789]) ]]
}}}
A more complex case:

{{{#!highlight bash
# Bash / ksh
shopt -s extglob # not necessary in ksh and bash 4.1 or newer

if [[ $foo = @(*[0123456789]*|!([+-]|)) && $foo = ?([+-])*([0123456789])?(.*([0123456789])) ]]; then
  echo 'foo is a floating-point number'
-Line 13:
+Line 76:
+Optionally, `case..esac` may have been used in shells with extended pattern matching. The leading test of {{{$foo}}} is to ensure that it contains at least one digit, isn't empty, and contains more than just + or - by itself.
-Line 14:
+Line 78:
-This can be done in Korn and legacy Bourne shells as well, using {{{case}}}:
+If your definition of "a valid number" is even more complex, or if you need a solution that works in legacy Bourne shells, you might prefer to use an external tool's [[RegularExpression|regular expression]] syntax.  Here is a portable version (explained in detail [[http://www.wplug.org/wiki/Meeting-20100612#EXERCISE_TWO|here]]), using {{{awk}}} (not `egrep` which is line-based so would be tricked by variables that contain newline characters):
-Line 16:
+Line 80:
-{{{
case "$foo" in
    *[!0-9]*) echo "'$foo' has a non-digit somewhere in it" ;;
    *) echo "'$foo' is strictly numeric" ;;
esac
}}}
+{{{#!highlight bash
# Bourne
-Line 23:
+Line 83:
-If what you actually mean is "a valid floating-point number" or something else more complex, then there are a few possible ways.  One of them is to use Bash's {{{extglob}}} capability:

{{{
# Bash example; extended globs are disabled by default
shopt -s extglob
[[ $foo = *[0-9]* && $foo = ?([+-])*([0-9])?(.*([0-9])) ]] && echo "foo is a number"
}}}

The leading test of {{{$foo}}} is to ensure that it contains at least one digit.  The extended glob, by itself, would match the empty string, or a lone {{{+}}} or {{{-}}}, which may not be desirable behavior.

The features enabled with {{{extglob}}} in Bash are also allowed in the Korn shell by default.  The difference here is that Ksh lacks Bash's {{{[[}}} and must use {{{case}}} instead:

{{{
# Ksh example using extended globs
case $foo in
  *[0-9]*)
    case $foo in
        ?([+-])*([0-9])?(.*([0-9]))) echo "foo is a number";;
    esac;;
esac
}}}

Note that this uses the same extended glob as the Bash example before it; the third closing parenthesis at the end of it is actually part of the {{{case}}} syntax.

If your definition of "a valid number" is even more complex, or if you need a solution that works in legacy Bourne shells, you might prefer to use a regular expression.  Here is a portable version, using {{{egrep}}}:

{{{
if test "$foo" && echo "$foo" | egrep '^[-+]?[0-9]*(\.[0-9]*)?$' >/dev/null; then
    echo "'$foo' might be a number"
+if awk -- 'BEGIN {exit !(ARGV[1] ~ /^[-+]?([0123456789]+\.?|[0123456789]*\.[0123456789]+)$/)}' "$foo"; then
    printf '"%s" is a number\n' "$foo"
-Line 54:
+Line 86:
-    echo "'$foo' might not be a number"
+    printf '"%s" is not a number\n' "$foo"
-Line 57:
+Line 89:
+Bash version 3 and above have regular expression support in the `[[...]]` construct.
-Line 58:
+Line 91:
-(Like the extended globs, this extended regular expression matches a lone {{{+}}} or {{{-}}}, and the code may therefore require adjustment.  The initial {{{test}}} command only requires a non-empty string.  Closing the last "bug" is left as an exercise for the reader, mostly because GreyCat is too damned lazy to learn {{{expr(1)}}}.)
+{{{#!highlight bash
# Bash
# The regexp must be stored in a var and expanded for backward compatibility with versions < 3.2
-Line 60:
+Line 95:
-Bash version 3 and above have regular expression support in the [[ command.  However, due to serious bugs and syntax changes in Bash's [[ regex support, we '''do not recommend''' using it.  Nevertheless, if I simply omit all Bash regex answers here, someone will come along and fill them in -- and they probably won't work, or won't contain all the caveats necessary.  So, in the interest of preventing disasters, here are the Bash regex answers that you should not use.

{{{
if [[ $foo = *[0-9]* && $foo =~ ^[-+]?[0-9]*\(\.[0-9]*\)?$ ]]; then  # Bash 3.1 only!
    echo "'$foo' looks rather like a number"
+regexp='^[-+]?[0123456789]*(\.[0123456789]*)?$'
if [[ $foo = *[0123456789]* && $foo =~ $regexp ]]; then
    printf '"%s" looks rather like a number\n' "$foo"
-Line 66:
+Line 99:
-    echo "'$foo' doesn't look particularly numeric to me"
+    printf '"%s" doesn't look particularly numeric to me\n' "$foo"
-Line 69:
+Line 102:
-Unfortunately, Bash changed the syntax of its regular expression support after version 3.1, so the following ''may'' work in some patched versions of Bash 3.2:

{{{
if [[ $foo = *[0-9]* && $foo =~ ^[-+]?[0-9]*(\.[0-9]*)?$ ]]; then    # **PATCHED** Bash 3.2 only!
    echo "'$foo' looks rather like a number"
else
    echo "'$foo' doesn't look particularly numeric to me"
+=== Using the parsing done by [ and printf (or "using eq") ===
{{{#!highlight bash
# fails with ksh
if [ "$foo" -eq "$foo" ] 2>/dev/null; then
    printf '"%s" is an integer\n' "$foo"
-Line 79:
+Line 109:
+`[` parses the variable and interprets it a decimal integer because of the `-eq`. If the parsing succeeds the test is trivially true; if it fails `[` prints an error message that `2>/dev/null` hides and sets a status different from 0.  However this method fails if the shell is ksh, because ksh evaluates the variable as an arithmetic expression (and that would constitute an arbitrary command injection vulnerability).
-Line 80:
+Line 111:
-It fails rather spectacularly in bash 3.1 and in bash 3.2 without patches.
+Be careful: the following trick with `printf` (not supported by all shells, and the list of supported float representations varies with the shell as well; not to mention the command injection vulnerability in ksh or zsh)
-Line 82:
+Line 113:
-Note that the parentheses in the {{{egrep}}} regular expression and the bash 3.2.patched regular expression don't require backslashes in front of them, whereas the ones in the bash 3.1 command do.
+{{{#!highlight bash
if printf %f "$foo" >/dev/null 2>&1; then
    printf '"%s" is a float\n' "$foo"
fi
}}}
is broken: about the arguments of the {{{a}}}, {{{A}}}, {{{e}}}, {{{E}}}, {{{f}}}, {{{F}}}, {{{g}}}, or {{{G}}} format modifiers, POSIX specifies that ''if the leading character is a single-quote or double-quote, the value shall be the numeric value in the underlying codeset of the character following the single-quote or double-quote.'' Hence this fails when {{{foo}}} expands to a string with a leading single-quote or double-quote: the previous command will happily validate the string as a float.
It also returns 0 when {{{foo}}} expands to a number with a leading {{{0x}}}, which is a valid number in a shell script but may not work elsewhere.
-Line 84:
+Line 121:
-Stuffing the Bash regex into a variable, and then using {{{[[ $foo =~ $bar ]]}}}, may also be an effective workaround in some cases.  But this belongs in a separate FAQ....


If you just want to guarantee ahead of time that a variable contains an integer, without actually checking, you can give the variable the "integer" attribute.

{{{
declare -i foo
foo=-10+1; echo "$foo" #prints -9
foo="hello"; echo "$foo" # the value of the variable "hello" is evaluated, if unset foo is assigned 0
foo="Some random string" #result in an error.
}}}

Any value assigned to a variable with the integer attribute set is evaluated as an arithmetic expression just like inside $(( )) (see [ArithmeticExpression]). Bash will raise an error if you try to assign an invalid arithmetic expression.
+You can use `%d` to parse an integer.  Take care that the parsing might be (is supposed to be?) [[locale]]-dependent.