Differences between revisions 12 and 16 (spanning 4 versions)
Revision 12 as of 2011-04-02 04:49:13
Size: 5781
Editor: sn18
Comment: printf %q quotes | character
Revision 16 as of 2019-05-20 20:45:14
Size: 7145
Editor: GreyCat
Comment: associative arrays, and fix a quoting issue... and quarantine everything that's wrong.
Deletions are marked like this. Additions are marked like this.
Line 5: Line 5:
First of all, let's get the terminology straight. Bash has no notion of "lists" or "sets" or any such. Bash has strings and [[BashFAQ/005|arrays]]. Strings are a "list" of '''characters''', arrays are a "list" of '''strings'''.
First of all, let's get the terminology straight. Bash has no real notion of "lists" or "sets". Bash has strings, [[BashFAQ/005|indexed arrays]], and associative arrays. So, what we're trying to do is not supported by the basic data structures available in the language.

The ''best'' choice for this problem is to use an associative array. Checking whether a key is set (or not set) in an associative array is much more efficient than checking whether a key exists as one of the values in an indexed array.

=== With an associative array ===
All we need to do is create one entry for each element of the set. Then, when we want to see whether our input is in that set, we just check whether the associative array contains an entry for our input.

{{{
# Bash 4 and higher
declare -A exists
for i in Bigfoot UFOs Republicans; do
  exists["$i"]=1
done

read -r input
if [[ ${exists["$input"]} ]]; then
  printf "%s exist!\\n" "$input"
else
  printf "%s doesn't exist.\\n" "$input"
fi
}}}

=== With an indexed array ===
Line 9: Line 30:
We can store a list of strings in the values of an indexed array.
Line 10: Line 33:
   {{{
   # Bash
   for element in "${foo[@]}"; do
      [[ $element = $bar ]] && echo "Found $bar."
   done
   }}}

{{{
# Bash
for element in "${foo[@]}"; do
   [[ $element = "$bar" ]] && echo "Found $bar."
done
}}}

And that's all there is to it. There are no other correct answers.

However, that never stopped anyone from contributing their incorrect answers to a wiki, so...

----

/!\ '''Everything below this point is silly and you should not use it.'''

=== Assorted wrong answers ===
Line 18: Line 51:
   {{{
   
# Bash
   isIn() {
       local pattern="$1" element
       shift

       for element
   
do
    [[ $element = $pattern ]] && return 0
       done

       return 1
   }

   
if isIn "jacob" "${names[@]}"
   then
   
echo "Jacob is on the list."
   fi
   }}}

{{{
# Bash
isIn() {
    local pattern="$1" element
    shift

    for element
do
        [[ $element = $pattern ]] && return 0
    done

    return 1
}

if isIn "jacob" "${names[@]}"
then
echo "Jacob is on the list."
fi
}}}
Line 39: Line 73:
   {{{
   
# Bash 3.0 or higher
   indexOf() {
       local pattern=$1
       local index list
       shift

   
list=("$@")
    for index in "${!list[@]}"
       do
   
[[ ${list[index]} = $pattern ]] && {
               echo $index
   
return 0
           }
       done

   
echo -1
       return 1
   }

   
if index=$(indexOf "jacob" "${names[@]}")
   then
   
echo "Jacob is the ${index}th on the list."
   else
   
echo "Jacob is not on the list."
   fi
   }}}
{{{
# Bash 3.0 or higher
indexOf() {
    local pattern=$1
    local index list
    shift

list=("$@")
    for index in "${!list[@]}"
    do
[[ ${list[index]} = $pattern ]] && {
            echo $index
return 0
        }
    done

echo -1
    return 1
}

if index=$(indexOf "jacob" "${names[@]}")
then
echo "Jacob is the ${index}th on the list."
else
echo "Jacob is not on the list."
fi
}}}
Line 68: Line 102:
   {{{
   # Bourne
   set -f
   for element in $foo; do
      if test x"$element" = x"$bar"; then
         echo "Found $bar."
      fi
   done
   set +f
   }}}

Here, a "word" is defined as any substring that is delimited by whitespace (or more specifically, the characters currently in IFS). The `set -f` prevents [[glob]] expansion of the words in the list. Turning glob expansions back on (`set +f`) is optional.

If you're working in bash 4 or ksh93, you have access to associative arrays. These will allow you to restructure the problem -- instead of making a list of words that are allowed, you can make an ''associative array'' whose keys are the words you want to allow. Their values could be meaningful, or not -- depending on the nature of the problem.

   {{{
   # Bash 4
   declare -A good
   for word in "goodword1" "goodword2" ...; do
     good["$word"]=1
   done

   # Check whether $foo is allowed:
   if ((${good[$foo]})); then ...
   }}}

Here's a hack that you shouldn't use, but which is presented for the sake of completeness:
   {{{
   # Bash
   if [[ " $foo " = *" $bar "* ]]; then
{{{
# Bourne
set -f
for element in $foo; do
   if test x"$element" = x"$bar"; then
Line 100: Line 109:
   }}} done
set +f
}}}

Here, a "word" is defined as any substring that is delimited by whitespace (or more specifically, the characters currently in IFS). The `set -f` prevents [[glob]] expansion of the words in the list. Turning glob expansions back on (`set +f`) is optional.

Here's a hack that you shouldn't use, but which is presented for the sake of completeness:
{{{
# Bash
if [[ " $foo " = *" $bar "* ]]; then
   echo "Found $bar."
fi
}}}
Line 104: Line 126:
   {{{
   # Bourne
   case " $foo " in
    *" $bar "*) echo "Found $bar.";;
   esac
   }}}
{{{
# Bourne
case " $foo " in
   *" $bar "*) echo "Found $bar.";;
esac
}}}
Line 114: Line 136:
   {{{
   # Bash
   shopt -s extglob
   #convert array to glob
   printf -v glob '%q|' "${array[@]}"
   glob=${glob%|}
   [[ $word = @($glob) ]] && echo "Found $word"
   }}}
{{{
# Bash
shopt -s extglob
#convert array to glob
printf -v glob '%q|' "${array[@]}"
glob=${glob%|}
[[ $word = @($glob) ]] && echo "Found $word"
}}}
Line 126: Line 148:
GNU's grep has a {{{\b}}} feature which allegedly matches the edges of words. Using that, one may attempt to replicate the shorter approach used above, but it is fraught with peril:

   {{{
   # Is 'foo' one of the positional parameters?
   egrep '\bfoo\b' <<<"$@" >/dev/null && echo yes

   # This is where it fails: is '-v' one of the positional parameters?
   egrep '\b-v\b' <<<"$@" >/dev/null && echo yes
   # Unfortunately, \b sees "v" as a separate word.
   # Nobody knows what the hell it's doing with the "-".

   # Is "someword" in the array 'array'?
   egrep '\bsomeword\b' <<<"${array[@]}"
   # Obviously, you can't use this if someword is '-v'!
   }}}
GNU's grep has a {{{\b}}} feature which allegedly matches the edges of words (word "boundaries"). Using that, one may attempt to replicate the shorter approach used above, but it is fraught with peril:

{{{
# Is 'foo' one of the positional parameters?
egrep '\bfoo\b' <<<"$@" >/dev/null && echo yes

# This is where it fails: is '-v' one of the positional parameters?
egrep '\b-v\b' <<<"$@" >/dev/null && echo yes
# Unfortunately, \b sees "v" as a separate word.
# Nobody knows what the hell it's doing with the "-".

# Is "someword" in the array 'array'?
egrep '\bsomeword\b' <<<"${array[@]}"
# Obviously, you can't use this if someword is '-v'!
}}}
Line 147: Line 169:
   {{{
   # usage: if has "element" list of words; then ...; fi
   has() {
     local IFS=$'\a' t="$1"
     shift
     [[ $'\a'"$*"$'\a' == *$'\a'$t$'\a'* ]]
   }
   }}}
{{{
# usage: if has "element" list of words; then ...; fi
has() {
  local IFS=$'\a' t="$1"
  shift
  [[ $'\a'"$*"$'\a' == *$'\a'$t$'\a'* ]]
}
}}}

== Enumerated types ==

In ksh93t or later, one may create enum types/variables/constants using the `enum` builtin. These work similarly to C enums (and the equivalent feature of other languages). These may be used to restrict which values may be assigned to a variable so as to avoid the need for an expensive test each time an array variable is set or referenced. Like types created using `typeset -T`, the result of an `enum` command is a new declaration command that can be used to instantiate objects of that type.

{{{
# ksh93
 $ enum colors=(red green blue)
 $ colors foo=green
 $ foo=yellow
ksh: foo: invalid value yellow
}}}

`typeset -a` can also be used in combination with an enum type to allow enum constants as subscripts.

{{{
# ksh93
 $ typeset -a [colors] bar
 $ bar[blue]=test1
 $ typeset -p bar
typeset -a [colors] bar=([blue]=test)
 $ bar[orange]=test
ksh: colors: invalid value orange
}}}

See `src/cmd/ksh93/tests/enum.sh` in the AST source for more examples.

I want to check to see whether a word is in a list (or an element is a member of a set).

If your real question was How do I check whether one of my parameters was -v? then please see FAQ #35 instead. Otherwise, read on....

First of all, let's get the terminology straight. Bash has no real notion of "lists" or "sets". Bash has strings, indexed arrays, and associative arrays. So, what we're trying to do is not supported by the basic data structures available in the language.

The best choice for this problem is to use an associative array. Checking whether a key is set (or not set) in an associative array is much more efficient than checking whether a key exists as one of the values in an indexed array.

With an associative array

All we need to do is create one entry for each element of the set. Then, when we want to see whether our input is in that set, we just check whether the associative array contains an entry for our input.

# Bash 4 and higher
declare -A exists
for i in Bigfoot UFOs Republicans; do
  exists["$i"]=1
done

read -r input
if [[ ${exists["$input"]} ]]; then
  printf "%s exist!\\n" "$input"
else
  printf "%s doesn't exist.\\n" "$input"
fi

With an indexed array

NOTE: In the general case, a string cannot possibly contain a list of other strings because there is no reliable way to tell where each substring begins and ends.

We can store a list of strings in the values of an indexed array.

Given a traditional array, the only proper way to do this is to loop over all elements in your array and check them for the element you are looking for. Say what we are looking for is in bar and our list is in the array foo:

# Bash
for element in "${foo[@]}"; do
   [[ $element = "$bar" ]] && echo "Found $bar."
done

And that's all there is to it. There are no other correct answers.

However, that never stopped anyone from contributing their incorrect answers to a wiki, so...


/!\ Everything below this point is silly and you should not use it.

Assorted wrong answers

If you need to perform this several times in your script, you might want to extract the logic into a function:

# Bash
isIn() {
    local pattern="$1" element
    shift

    for element
    do
        [[ $element = $pattern ]] && return 0
    done

    return 1
}

if isIn "jacob" "${names[@]}"
then
    echo "Jacob is on the list."
fi

Or, if you want your function to return the index at which the element was found:

# Bash 3.0 or higher
indexOf() {
    local pattern=$1
    local index list
    shift

    list=("$@")
    for index in "${!list[@]}"
    do
        [[ ${list[index]} = $pattern ]] && {
            echo $index
            return 0
        }
    done

    echo -1
    return 1
}

if index=$(indexOf "jacob" "${names[@]}")
then
    echo "Jacob is the ${index}th on the list."
else
    echo "Jacob is not on the list."
fi

If your "list" is contained in a string, and for some half-witted reason you choose not to heed the warnings above, you can use the following code to search through "words" in a string. (The only real excuse for this would be that you're stuck in Bourne shell, which has no arrays.)

# Bourne
set -f
for element in $foo; do
   if test x"$element" = x"$bar"; then
      echo "Found $bar."
   fi
done
set +f

Here, a "word" is defined as any substring that is delimited by whitespace (or more specifically, the characters currently in IFS). The set -f prevents glob expansion of the words in the list. Turning glob expansions back on (set +f) is optional.

Here's a hack that you shouldn't use, but which is presented for the sake of completeness:

# Bash
if [[ " $foo " = *" $bar "* ]]; then
   echo "Found $bar."
fi

(The problem here is that is assumes space can be used as a delimiter between words. Your elements might contain spaces, which would break this!)

That same hack, for Bourne shells:

# Bourne
case " $foo " in
   *" $bar "*) echo "Found $bar.";;
esac

You can also use extended glob with printf to search for a word in an array. I haven't tested it enough, so it might break in some cases --sn18

# Bash
shopt -s extglob
#convert array to glob
printf -v glob '%q|' "${array[@]}"
glob=${glob%|}
[[ $word = @($glob) ]] && echo "Found $word"
  • It will break when an array element contains a | character. Hence, I moved it down here with the other hacks that work in a similar fashion and have a similar limitation. -- GreyCat

    • printf %q quotes a | character too, so it probably should not --sn18

GNU's grep has a \b feature which allegedly matches the edges of words (word "boundaries"). Using that, one may attempt to replicate the shorter approach used above, but it is fraught with peril:

# Is 'foo' one of the positional parameters?
egrep '\bfoo\b' <<<"$@" >/dev/null && echo yes

# This is where it fails: is '-v' one of the positional parameters?
egrep '\b-v\b' <<<"$@" >/dev/null && echo yes
# Unfortunately, \b sees "v" as a separate word.
# Nobody knows what the hell it's doing with the "-".

# Is "someword" in the array 'array'?
egrep '\bsomeword\b' <<<"${array[@]}"
# Obviously, you can't use this if someword is '-v'!

Since this "feature" of GNU grep is both non-portable and poorly defined, we recommend not using it. It is simply mentioned here for the sake of completeness.

Bulk comparison

This method tries to compare the desired string to the entire contents of the array. It can potentially be very efficient, but it depends on a delimiter that must not be in the sought value or the array. Here we use $'\a', the BEL character, because it's extremely uncommon.

# usage: if has "element" list of words; then ...; fi
has() {
  local IFS=$'\a' t="$1"
  shift
  [[ $'\a'"$*"$'\a' == *$'\a'$t$'\a'* ]]
}

Enumerated types

In ksh93t or later, one may create enum types/variables/constants using the enum builtin. These work similarly to C enums (and the equivalent feature of other languages). These may be used to restrict which values may be assigned to a variable so as to avoid the need for an expensive test each time an array variable is set or referenced. Like types created using typeset -T, the result of an enum command is a new declaration command that can be used to instantiate objects of that type.

# ksh93
 $ enum colors=(red green blue)
 $ colors foo=green
 $ foo=yellow
ksh: foo:  invalid value yellow

typeset -a can also be used in combination with an enum type to allow enum constants as subscripts.

# ksh93
 $ typeset -a [colors] bar
 $ bar[blue]=test1
 $ typeset -p bar
typeset -a [colors] bar=([blue]=test)
 $ bar[orange]=test
ksh: colors:  invalid value orange

See src/cmd/ksh93/tests/enum.sh in the AST source for more examples.


CategoryShell

BashFAQ/046 (last edited 2023-04-29 04:33:04 by ormaaj)