Differences between revisions 21 and 32 (spanning 11 versions)
Revision 21 as of 2009-07-01 12:29:40
Size: 9893
Editor: localhost
Comment: [igli] ${#arr[*]} is faster than ${#arr[@]} and gives same result; nor should it be checked on every iteration unless you're adding to the array. for is quicker and preferred with loop incr
Revision 32 as of 2011-04-29 13:48:34
Size: 12735
Editor: GreyCat
Comment: more on @ as an array
Deletions are marked like this. Additions are marked like this.
Line 3: Line 3:
BASH and KornShell have one-dimensional arrays indexed by a numerical expression, e.g.:

 {{{
 # Bash
 host=(micky minnie goofy)
This answer assumes you have a basic understanding of what arrays ''are'' in the first place. If you're new to this kind of programming, you may wish to start with [[BashGuide/Arrays|the guide's explanation]]. BASH and KornShell have one-dimensional arrays indexed by a numerical expression, e.g.:

 {{{
 # Bash
 host=(mickey minnie goofy)
Line 14: Line 14:
The indexing always begins with 0.

Ksh93 and Bash 4.0 have [[AssociativeArray]]s as well. These are not available in Bourne, POSIX, ksh88 or older bash shells.

The awkward expression `${#host[*]}` or `${#host[@]}` returns the number of elements for the array {{{host}}}. Also noteworthy for BASH is the fact that inside the square brackets, {{{i++}}} works as a C programmer would expect. The square brackets in an array reference force an ArithmeticExpression. (That shortcut does not work in ksh88.)

When accessing the number of elements, * is quicker than @ in BASH in our testing on Bash-3, and gives the same result. (man bash: Arrays: ${#name[subscript]} expands to the length of ${name[subscript]}. If subscript is * or @, the expansion is the number of elements in the array.)

BASH and Korn shell arrays are also ''sparse''. Elements may be added and deleted out of sequence.
The indexing always begins with 0, unless you specifically choose otherwise. The awkward expression `${#host[*]}` or `${#host[@]}` returns the number of elements for the array {{{host}}}. (We'll go into more detail on syntax below.)

Ksh93 and Bash 4.0 have [[BashGuide/Arrays#Associative_Arrays|Associative Arrays]] as well. These are not available in Bourne, POSIX, ksh88 or older bash shells.

POSIX and Bourne shells are not guaranteed to have arrays at all (although a POSIX shell on any particular system may have them, they are not in the standard).

BASH and Korn shell arrays are ''sparse''. Elements may be added and deleted out of sequence.
Line 30: Line 28:
 unset arr[2]  unset 'arr[2]'
Line 40: Line 38:
Line 49: Line 48:
 # Bash  # Bash/ksh93
Line 58: Line 57:
You can also initialize an array using a [[glob]]:

 {{{
 # Bash
You can also initialize an array using a [[glob]] (see also NullGlob):

 {{{
 # Bash/ksh93
Line 68: Line 67:
(see also NullGlob), or a substitution of any kind: or using a substitution of any kind:
Line 73: Line 72:

 set -f; O=$IFS IFS=$'\n' lines=($(< myfile)) IFS=$O; set +f
Line 77: Line 73:
 }}}

When the `arrname=(...)` syntax is used, any substitutions inside the parentheses undergo WordSplitting according to the regular shell rules. Thus, in the second example above, if we want the lines of the input file to become individual array elements (even if they contain whitespace), we must set IFS appropriately (in this case: to a newline).

`set -f` and `set +f` disable and re-enable [[glob]] expansion, respectively, so that a line like `*` will not be expanded into filenames. (We could have used that in the `words=($sentence)` example too, just in case someone slipped a wildcard into a word.) In some scripts, `set -f` may be in effect already, and therefore running `set +f` may be undesirable. This is something you must manage properly yourself; there is no easy or elegant way to "store" the glob expansion switch setting and restore it later. (And don't try to say parsing the output of `set -o` is easy, because it's not.)

Here are some more ksh examples:
 {{{
Line 87: Line 76:
 }}}

When the `arrname=(...)` syntax is used, any unquoted substitutions inside the parentheses undergo WordSplitting and [[glob]] expansion according to the regular shell rules. In the first example above, if any of the words in `$sentence` contain glob characters, filename expansion may occur.

`set -f` and `set +f` may be used to disable and re-enable [[glob]] expansion, respectively, so that words like `*` will not be expanded into filenames. In some scripts, `set -f` may be in effect already, and therefore running `set +f` may be undesirable. This is something you must manage properly yourself; there is no easy or elegant way to "store" the glob expansion switch setting and restore it later. (And don't try to say parsing the output of `set -o` is easy, because it's not.)

==== Loading lines from a file or stream ====

In bash 4, the `mapfile` command (also known as `readarray`) accomplishes this:

{{{
 # Bash 4
 mapfile -t lines < myfile

 # or
 mapfile -t lines < <(some command)
}}}

See ProcessSubstitution and [[BashFAQ/024|FAQ #24]] for more details on the `<()` syntax.

`mapfile` handles blank lines (it inserts them as empty array elements), and it also handles missing final newlines from the input stream. Both those things become problematic when reading data in other ways, as we shall see momentarily.

`mapfile` does have one serious drawback: it can ''only'' handle newlines as line terminators. It can't, for example, handle NUL-delimited files from `find -print0`.

In other shells, we might start out like this:

{{{
 # These examples only work with certain kinds of input files.

 # Bash
 set -f; O=$IFS IFS=$'\n' lines=($(< myfile)) IFS=$O; set +f

 # Korn
Line 90: Line 111:
 }}}

That's a literal newline (and nothing else) between the single quotes in the second example. The `set -f` caveats apply to the first example here, just as they did to the bash version.
}}}

We use [[IFS]] (setting it to a newline) because we want each ''line'' of input to become an array element, not each ''word''. (This particular syntax may have undesired results with blank lines of input; see below for alternatives.)

That's a literal newline (and nothing else) between the single quotes in the Korn example.
Line 102: Line 125:
Line 107: Line 131:

Also noteworthy for BASH is the fact that inside the square brackets, `i++` works as a C programmer would expect. The square brackets in an array reference force an ArithmeticExpression. (That shortcut does not work in ksh88.)
Line 130: Line 156:
NOTE: it is necessary to quote the `'arr[i++]'` passed to read, so that the square brackets aren't interpreted as [[glob]]s. This is also true for other non-keyword builtins that take a subscripted variable name, such as `let` and `unset`.
Line 134: Line 162:
 while IFS= read -rd $'\0' 'arr[i++]'; do :; done < <(find . -name '*.ugly' -print0)  while IFS= read -rd '' 'arr[i++]'; do :; done < <(find . -name '*.ugly' -print0)
Line 136: Line 164:
Line 137: Line 166:
 while read -rd $'\0'; do arr[i++]=$REPLY; done < <(find . -name '*.ugly' -print0)  while read -rd ''; do arr[i++]=$REPLY; done < <(find . -name '*.ugly' -print0)
Line 140: Line 169:
See ProcessSubstitution and [[BashFAQ/024|FAQ #24]] for more details on that syntax.

NOTE: it is necessary to quote the `'arr[i++]'` passed to read, so that the square brackets aren't interpreted as [[glob]]s. This is also true for other non-keyword builtins that take a subscripted variable name, such as `let` and `unset`.

==== Appending to an existing array ====
Line 176: Line 204:
NOTE: the parentheses are required, just as when assigning to an array. (Or you will end up appending to `${arr[0]}` which `$arr` is a synonym for.)
Line 177: Line 207:
Line 179: Line 210:
Using array elements ''en masse'' is one of the key features. In exactly the same way that {{{"$@"}}} is expanded for positional parameters, {{{"${arr[@]}"}}} is expanded to a list of words, one array element per word. For example, `${#arr[*]}` or `${#arr[@]}` gives the number of elements in an array:

 {{{
 # Bash
 shopt -s nullglob
 oggs=(*.ogg)
 echo "There are ${#oggs[*]} Ogg files."
 }}}

When accessing the number of elements, * is quicker than @ in BASH in our testing on Bash-3, and gives the same result. (man bash: Arrays: `${#name[subscript]}` expands to the length of `${name[subscript]}`. If subscript is * or @, the expansion is the number of elements in the array.)

Using array elements ''en masse'' is one of the key features of shell arrays. In exactly the same way that {{{"$@"}}} is expanded for positional parameters, {{{"${arr[@]}"}}} is expanded to a list of words, one array element per word. For example,
Line 197: Line 239:
For more complex array-dumping, {{{"${arr[*]}"}}} will cause the elements to be concatenated together, with the first character of {{{IFS}}} (or a space if IFS isn't set) between them. As it happens, {{{"$*"}}} is expanded the same way for positional parameters. For slightly more complex array-dumping, {{{"${arr[*]}"}}} will cause the elements to be concatenated together, with the first character of {{{IFS}}} (or a space if IFS isn't set) between them. As it happens, {{{"$*"}}} is expanded the same way for positional parameters.
Line 206: Line 248:
Unfortunately, you can't put multiple characters in between array elements using that syntax. You would have to do something like this instead:

 {{{
 # Bash/ksh
 arr=(x y z)
 x=$(printf "%s<=>" "${arr[@]}")
 echo "${x%<=>}" # Remove the extra <=> from the end.
 # prints x<=>y<=>z
 }}}
Line 211: Line 263:
 unset arr[2]  unset 'arr[2]'
Line 216: Line 268:
Retrieving the indices is extremely important in certain kinds of tasks, such as maintaining two arrays with the same indices (a cheap way to mimic having an array of `struct`s in a language with no `struct`): Retrieving the indices is extremely important in certain kinds of tasks, such as maintaining parallel arrays with the same indices (a cheap way to mimic having an array of `struct`s in a language with no `struct`):
Line 219: Line 271:
 unset title artist i
 for f in *.mp3; do
 unset file title artist i
 for f in ./*.mp3; do
   file[i]=$f
Line 225: Line 278:
 # Later, iterate over every song
 for i in ${!title[*]}; do
   echo "${title[i]} is by ${artist[i]}"
 # Later, iterate over every song.
 # This works even if the arrays are spare, just so long as they all have
 # the SAME holes.

 for i in ${!file[*]}; do
   echo "${file[i]} is ${title[i]} by ${artist[i]}"
Line 230: Line 285:

=== Other tricks ===
Line 240: Line 297:
Parameter Expansion can also be used to extract elements from an array: Parameter Expansion can also be used to extract elements from an array. Some people call this ''slicing'':
Line 250: Line 307:
The {{{@}}} array (the array of positional parameters) can be used just like any regularly named array. As we see above, the `@` array (the array of positional parameters) can be used almost like a regularly named array. This is the ''only'' array available for use in POSIX or Bourne shells. It has certain limitations: you cannot individually set or unset single elements, and it cannot be sparse. Nevertheless, it still makes certain POSIX shell tasks possible that would otherwise require external tools:

 {{{
 # POSIX
 set -- *.mp3
 if [ -e "$1" ]; then
   echo "there are $# MP3 files"
 else
   echo "there are 0 MP3 files"
 fi
 }}}

 {{{
 # POSIX
 ...
 # Add an option to our dynamically generated list of options
 set -- "$@" -f "$somefile"
 ...
 foocommand "$@"
 }}}

(Compare to [[BashFAQ/050|FAQ #50]]'s dynamically generated commands using named arrays.)

----
CategoryShell

How can I use array variables?

This answer assumes you have a basic understanding of what arrays are in the first place. If you're new to this kind of programming, you may wish to start with the guide's explanation. BASH and KornShell have one-dimensional arrays indexed by a numerical expression, e.g.:

  •  # Bash
     host=(mickey minnie goofy)
     n=${#host[*]}
     for ((i=0;i<n;i++)); do
         echo "host number $i is ${host[i]}"
     done

The indexing always begins with 0, unless you specifically choose otherwise. The awkward expression ${#host[*]} or ${#host[@]} returns the number of elements for the array host. (We'll go into more detail on syntax below.)

Ksh93 and Bash 4.0 have Associative Arrays as well. These are not available in Bourne, POSIX, ksh88 or older bash shells.

POSIX and Bourne shells are not guaranteed to have arrays at all (although a POSIX shell on any particular system may have them, they are not in the standard).

BASH and Korn shell arrays are sparse. Elements may be added and deleted out of sequence.

  •  # Bash/ksh
     arr[0]=0
     arr[1]=1
     arr[2]=2
     arr[42]="what was the question?"
     unset 'arr[2]'
     echo "${arr[*]}"
     # prints 0 1 what was the question?

You should try to write your code in such a way that it can handle sparse arrays, unless you know in advance that an array will never have holes.

1. Loading values into an array

Assigning one element at a time is simple, and portable:

  •  # Bash/ksh
     arr[0]=0
     arr[42]='the answer'

It's possible to assign multiple values to an array at once, but the syntax differs across shells.

  •  # Bash/ksh93
     array=(zero one two three four)
    
     # Korn
     set -A array -- zero one two three four

When initializing in this way, the first index will be 0.

You can also initialize an array using a glob (see also NullGlob):

  •  # Bash/ksh93
     oggs=(*.ogg)
    
     # Korn
     set -A oggs -- *.ogg

or using a substitution of any kind:

  •  # Bash
     words=($sentence)
     letters=({a..z})    # Bash 3.0 or higher
    
     # Korn
     set -A words -- $sentence

When the arrname=(...) syntax is used, any unquoted substitutions inside the parentheses undergo WordSplitting and glob expansion according to the regular shell rules. In the first example above, if any of the words in $sentence contain glob characters, filename expansion may occur.

set -f and set +f may be used to disable and re-enable glob expansion, respectively, so that words like * will not be expanded into filenames. In some scripts, set -f may be in effect already, and therefore running set +f may be undesirable. This is something you must manage properly yourself; there is no easy or elegant way to "store" the glob expansion switch setting and restore it later. (And don't try to say parsing the output of set -o is easy, because it's not.)

1.1. Loading lines from a file or stream

In bash 4, the mapfile command (also known as readarray) accomplishes this:

 # Bash 4
 mapfile -t lines < myfile

 # or
 mapfile -t lines < <(some command)

See ProcessSubstitution and FAQ #24 for more details on the <() syntax.

mapfile handles blank lines (it inserts them as empty array elements), and it also handles missing final newlines from the input stream. Both those things become problematic when reading data in other ways, as we shall see momentarily.

mapfile does have one serious drawback: it can only handle newlines as line terminators. It can't, for example, handle NUL-delimited files from find -print0.

In other shells, we might start out like this:

 # These examples only work with certain kinds of input files.

 # Bash
 set -f; O=$IFS IFS=$'\n' lines=($(< myfile)) IFS=$O; set +f

 # Korn
 set -f; O=$IFS IFS='
 '; set -A lines -- $(< myfile); IFS=$O; set +f

We use IFS (setting it to a newline) because we want each line of input to become an array element, not each word. (This particular syntax may have undesired results with blank lines of input; see below for alternatives.)

That's a literal newline (and nothing else) between the single quotes in the Korn example.

Relying on IFS WordSplitting can cause issues if you have repeated whitespace delimiters that you wanted to be treated as multiple delimiters; e.g., a file with blank lines will have repeated newline characters. If you wanted the blank lines to be stored as empty array elements, IFS's behavior will backfire on you; the blank lines will disappear.

The solution to that is to read the elements one at a time, in a loop. Remember that in most shells (including bash), the subcommands of a pipeline are executed in subshells, so you might need to use something like this:

  •  # Bash
     unset arr i
     while read -r; do arr[i++]=$REPLY; done < yourfile
    
     # or
     while read -r; do arr[i++]=$REPLY; done < <(your command)

Rather than piping your command to a while read loop, which would cause the array to be set in a subshell -- not very useful in most cases.

Also noteworthy for BASH is the fact that inside the square brackets, i++ works as a C programmer would expect. The square brackets in an array reference force an ArithmeticExpression. (That shortcut does not work in ksh88.)

If your file or data stream might be missing its final delimiter (e.g. a text file that might be missing a closing newline), the final read command in the loop might "fail" (terminating the loop) but still contain data. There are a couple ways to work around that:

  •  # Bash
     unset arr i
     while read -r; do arr[i++]=$REPLY; done < <(your command)
     # Append unterminated data line if there was one.
     [[ $REPLY ]] && arr[i++]=$REPLY

Some people prefer reading directly into the array, which works great if there's an unterminated line (since the array element is populated with the partial data before the exit status of read is checked). Unfortunately, this puts an empty element on the end of the array if the data stream is correctly terminated:

  •  # Bash
     unset arr i
     while IFS= read -r 'arr[i++]'; do :; done < <(your command)
     # Remove trailing empty element, if any.
     if [[ ${arr[i-1]} = "" ]]; then unset 'arr[--i]'; fi

Whether you prefer to read too many and then have to remove one, or read too few and then have to add one, is a personal choice.

NOTE: it is necessary to quote the 'arr[i++]' passed to read, so that the square brackets aren't interpreted as globs. This is also true for other non-keyword builtins that take a subscripted variable name, such as let and unset.

If you are trying to deal with records that might have embedded newlines, you might be using an alternative delimiter such as the NUL character ( \0 ) to separate the records. In that case, you'll want to use the -d argument to read as well:

  •  # Bash
     unset arr i
     while IFS= read -rd '' 'arr[i++]'; do :; done < <(find . -name '*.ugly' -print0)
     if [[ ${arr[i-1]} = "" ]]; then unset 'arr[--i]'; fi
    
     # or
     while read -rd ''; do arr[i++]=$REPLY; done < <(find . -name '*.ugly' -print0)
     [[ $REPLY ]] && arr[i++]=$REPLY

1.2. Appending to an existing array

If you wish to append data to an existing array, there are several approaches. The most flexible is to keep a separate index variable:

  •  # Bash/ksh93
     arr[i++]="new item"

If you don't want to keep an index variable, but you happen to know that your array is not sparse, then you can use the highest existing index:

  •  # Bash/ksh
     # This will FAIL if the array has holes (is sparse).
     arr[${#arr[*]}]="new item"

If you don't know whether your array is sparse or not, but you don't mind re-indexing the entire array (and also being very slow), then you can use:

  •  # Bash
     arr=("${arr[@]}" "new item")
    
     # Ksh
     set -A arr -- "${arr[@]}" "new item"

If you're in bash 3.1 or higher, then you can use the += operator:

  •  # Bash 3.1
     arr+=("new item")

NOTE: the parentheses are required, just as when assigning to an array. (Or you will end up appending to ${arr[0]} which $arr is a synonym for.)

For examples of using arrays to hold complex shell commands, see FAQ #50 and FAQ #40.

2. Retrieving values from an array

${#arr[*]} or ${#arr[@]} gives the number of elements in an array:

  •  # Bash
     shopt -s nullglob
     oggs=(*.ogg)
     echo "There are ${#oggs[*]} Ogg files."

When accessing the number of elements, * is quicker than @ in BASH in our testing on Bash-3, and gives the same result. (man bash: Arrays: ${#name[subscript]} expands to the length of ${name[subscript]}. If subscript is * or @, the expansion is the number of elements in the array.)

Using array elements en masse is one of the key features of shell arrays. In exactly the same way that "$@" is expanded for positional parameters, "${arr[@]}" is expanded to a list of words, one array element per word. For example,

  •  # Korn/Bash
     for x in "${arr[@]}"; do
       echo "next element is '$x'"
     done

This works even if the elements contain whitespace. You always end up with the same number of words as you have array elements.

If one simply wants to dump the full array, one element per line, this is the simplest approach:

  •  # Bash/ksh
     printf "%s\n" "${arr[@]}"

For slightly more complex array-dumping, "${arr[*]}" will cause the elements to be concatenated together, with the first character of IFS (or a space if IFS isn't set) between them. As it happens, "$*" is expanded the same way for positional parameters.

  •  # Bash
     arr=(x y z)
     IFS=/; echo "${arr[*]}"; unset IFS
     # prints x/y/z

Unfortunately, you can't put multiple characters in between array elements using that syntax. You would have to do something like this instead:

  •  # Bash/ksh
     arr=(x y z)
     x=$(printf "%s<=>" "${arr[@]}")
     echo "${x%<=>}"    # Remove the extra <=> from the end.
     # prints x<=>y<=>z

BASH 3.0 added the ability to retrieve the list of index values in an array, rather than just iterating over the elements:

  •  # Bash 3.0 or higher
     arr=(0 1 2 3) arr[42]='what was the question?'
     unset 'arr[2]'
     echo ${!arr[*]}
     # prints 0 1 3 42

Retrieving the indices is extremely important in certain kinds of tasks, such as maintaining parallel arrays with the same indices (a cheap way to mimic having an array of structs in a language with no struct):

  •  # Bash 3.0 or higher
     unset file title artist i
     for f in ./*.mp3; do
       file[i]=$f
       title[i]=$(mp3info -p %t "$f")
       artist[i++]=$(mp3info -p %a "$f")
     done
    
     # Later, iterate over every song.
     # This works even if the arrays are spare, just so long as they all have
     # the SAME holes.
     for i in ${!file[*]}; do
       echo "${file[i]} is ${title[i]} by ${artist[i]}"
     done

3. Other tricks

Bash's Parameter Expansions may be performed on array elements en masse:

  •  # Bash
     arr=(abc def ghi jkl)
     echo "${arr[@]#?}"          # prints bc ef hi kl
     echo "${arr[@]/[aeiou]/}"   # prints bc df gh jkl

Parameter Expansion can also be used to extract elements from an array. Some people call this slicing:

  •  # Bash
     echo "${arr[@]:1:3}"        # three elements starting at #1 (second element)
     echo "${arr[@]:(-2)}"       # last two elements
     echo "${@:(-1)}"            # last positional parameter
     echo "${@:(-2):1}"          # second-to-last positional parameter

As we see above, the @ array (the array of positional parameters) can be used almost like a regularly named array. This is the only array available for use in POSIX or Bourne shells. It has certain limitations: you cannot individually set or unset single elements, and it cannot be sparse. Nevertheless, it still makes certain POSIX shell tasks possible that would otherwise require external tools:

  •  # POSIX
     set -- *.mp3
     if [ -e "$1" ]; then
       echo "there are $# MP3 files"
     else
       echo "there are 0 MP3 files"
     fi
     # POSIX
     ...
     # Add an option to our dynamically generated list of options
     set -- "$@" -f "$somefile"
     ...
     foocommand "$@"

(Compare to FAQ #50's dynamically generated commands using named arrays.)


CategoryShell

BashFAQ/005 (last edited 2024-07-18 13:37:28 by GreyCat)