2851
Comment: change internal links
|
9893
[igli] ${#arr[*]} is faster than ${#arr[@]} and gives same result; nor should it be checked on every iteration unless you're adding to the array. for is quicker and preferred with loop incr
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
[[Anchor(faq5)]] | <<Anchor(faq5)>> |
Line 3: | Line 3: |
BASH and KornShell already have one-dimensional arrays indexed by a numerical expression, e.g. {{{ host[0]="micky" host[1]="minnie" host[2]="goofy" i=0 while (($i < ${#host[@]} )) do echo "host number $i is ${host[i++]}" done}}} |
BASH and KornShell have one-dimensional arrays indexed by a numerical expression, e.g.: {{{ # Bash host=(micky minnie goofy) n=${#host[*]} for ((i=0;i<n;i++)); do echo "host number $i is ${host[i]}" done }}} |
Line 18: | Line 16: |
The awkward expression {{{ ${#host[@]} }}} returns the number of elements for the array {{{host}}}. Also noteworthy is the fact that inside the square brackets, {{{i++}}} works as a C programmer would expect. The square brackets in an array reference force an ArithmeticExpression. It's possible to assign multiple values to an array at once, but the syntax differs from Bash to KornShell: {{{ # Bash array=(one two three four) # KornShell set -A array -- one two three four}}} Bash also lets you initialize an array using a [:glob:]: {{{ oggs=(*.ogg)}}} Using array elements ''en masse'' is one of the key features. Much like {{{"$@"}}} for the positional parameters, {{{"${arr[@]}"}}} expands the array to a list of words, one array element per word, even if the words contain internal whitespace. For example, {{{ for x in "${arr[@]}"; do echo "next element is '$x'" done}}} If one simply wants to dump the full array, {{{"${arr[*]}"}}} will cause the elements to be concatenated together, with the first character of {{{IFS}}} (a space by default) between them. {{{ arr=(x y z) IFS=/; echo "${arr[*]}"; unset IFS # prints x/y/z}}} BASH's arrays are also ''sparse''. Elements may be added and deleted out of sequence. {{{ arr=(0 1 2 3) |
Ksh93 and Bash 4.0 have [[AssociativeArray]]s as well. These are not available in Bourne, POSIX, ksh88 or older bash shells. The awkward expression `${#host[*]}` or `${#host[@]}` returns the number of elements for the array {{{host}}}. Also noteworthy for BASH is the fact that inside the square brackets, {{{i++}}} works as a C programmer would expect. The square brackets in an array reference force an ArithmeticExpression. (That shortcut does not work in ksh88.) When accessing the number of elements, * is quicker than @ in BASH in our testing on Bash-3, and gives the same result. (man bash: Arrays: ${#name[subscript]} expands to the length of ${name[subscript]}. If subscript is * or @, the expansion is the number of elements in the array.) BASH and Korn shell arrays are also ''sparse''. Elements may be added and deleted out of sequence. {{{ # Bash/ksh arr[0]=0 arr[1]=1 arr[2]=2 |
Line 55: | Line 32: |
# prints 0 1 3 what was the question?}}} | # prints 0 1 what was the question? }}} You should try to write your code in such a way that it can handle sparse arrays, unless you know in advance that an array will never have holes. === Loading values into an array === Assigning one element at a time is simple, and portable: {{{ # Bash/ksh arr[0]=0 arr[42]='the answer' }}} It's possible to assign multiple values to an array at once, but the syntax differs across shells. {{{ # Bash array=(zero one two three four) # Korn set -A array -- zero one two three four }}} When initializing in this way, the first index will be 0. You can also initialize an array using a [[glob]]: {{{ # Bash oggs=(*.ogg) # Korn set -A oggs -- *.ogg }}} (see also NullGlob), or a substitution of any kind: {{{ # Bash words=($sentence) set -f; O=$IFS IFS=$'\n' lines=($(< myfile)) IFS=$O; set +f letters=({a..z}) # Bash 3.0 or higher }}} When the `arrname=(...)` syntax is used, any substitutions inside the parentheses undergo WordSplitting according to the regular shell rules. Thus, in the second example above, if we want the lines of the input file to become individual array elements (even if they contain whitespace), we must set IFS appropriately (in this case: to a newline). `set -f` and `set +f` disable and re-enable [[glob]] expansion, respectively, so that a line like `*` will not be expanded into filenames. (We could have used that in the `words=($sentence)` example too, just in case someone slipped a wildcard into a word.) In some scripts, `set -f` may be in effect already, and therefore running `set +f` may be undesirable. This is something you must manage properly yourself; there is no easy or elegant way to "store" the glob expansion switch setting and restore it later. (And don't try to say parsing the output of `set -o` is easy, because it's not.) Here are some more ksh examples: {{{ # Korn set -A words -- $sentence set -f; O=$IFS IFS=' '; set -A lines -- $(< myfile); IFS=$O; set +f }}} That's a literal newline (and nothing else) between the single quotes in the second example. The `set -f` caveats apply to the first example here, just as they did to the bash version. Relying on IFS WordSplitting can cause issues if you have repeated whitespace delimiters that you wanted to be treated as multiple delimiters; e.g., a file with blank lines will have repeated newline characters. If you wanted the blank lines to be stored as empty array elements, IFS's behavior will backfire on you; the blank lines will disappear. The solution to that is to read the elements one at a time, in a loop. Remember that in most shells (including bash), the subcommands of a pipeline are executed in [[SubShell|subshells]], so you might need to use something like this: {{{ # Bash unset arr i while read -r; do arr[i++]=$REPLY; done < yourfile # or while read -r; do arr[i++]=$REPLY; done < <(your command) }}} Rather than piping your command to a `while read` loop, which would cause the array to be [[BashFAQ/024|set in a subshell]] -- not very useful in most cases. If your file or data stream might be missing its final delimiter (e.g. a text file that might be missing a closing newline), the final `read` command in the loop might "fail" (terminating the loop) but still contain data. There are a couple ways to work around that: {{{ # Bash unset arr i while read -r; do arr[i++]=$REPLY; done < <(your command) # Append unterminated data line if there was one. [[ $REPLY ]] && arr[i++]=$REPLY }}} Some people prefer reading directly into the array, which works great if there's an unterminated line (since the array element is populated with the partial data before the exit status of `read` is checked). Unfortunately, this puts an empty element on the end of the array if the data stream ''is'' correctly terminated: {{{ # Bash unset arr i while IFS= read -r 'arr[i++]'; do :; done < <(your command) # Remove trailing empty element, if any. if [[ ${arr[i-1]} = "" ]]; then unset 'arr[--i]'; fi }}} Whether you prefer to read too many and then have to remove one, or read too few and then have to add one, is a personal choice. If you are trying to deal with records that might have embedded newlines, you might be using an alternative delimiter such as the NUL character ( \0 ) to separate the records. In that case, you'll want to use the -d argument to read as well: {{{ # Bash unset arr i while IFS= read -rd $'\0' 'arr[i++]'; do :; done < <(find . -name '*.ugly' -print0) if [[ ${arr[i-1]} = "" ]]; then unset 'arr[--i]'; fi # or while read -rd $'\0'; do arr[i++]=$REPLY; done < <(find . -name '*.ugly' -print0) [[ $REPLY ]] && arr[i++]=$REPLY }}} See ProcessSubstitution and [[BashFAQ/024|FAQ #24]] for more details on that syntax. NOTE: it is necessary to quote the `'arr[i++]'` passed to read, so that the square brackets aren't interpreted as [[glob]]s. This is also true for other non-keyword builtins that take a subscripted variable name, such as `let` and `unset`. If you wish to append data to an existing array, there are several approaches. The most flexible is to keep a separate index variable: {{{ # Bash/ksh93 arr[i++]="new item" }}} If you don't want to keep an index variable, but you happen to know that your array is ''not sparse'', then you can use the highest existing index: {{{ # Bash/ksh # This will FAIL if the array has holes (is sparse). arr[${#arr[*]}]="new item" }}} If you don't know whether your array is sparse or not, but you don't mind re-indexing the entire array (and also being very slow), then you can use: {{{ # Bash arr=("${arr[@]}" "new item") # Ksh set -A arr -- "${arr[@]}" "new item" }}} If you're in bash 3.1 or higher, then you can use the {{{+=}}} operator: {{{ # Bash 3.1 arr+=("new item") }}} For examples of using arrays to hold complex shell commands, see [[BashFAQ/050|FAQ #50]] and [[BashFAQ/040|FAQ #40]]. === Retrieving values from an array === Using array elements ''en masse'' is one of the key features. In exactly the same way that {{{"$@"}}} is expanded for positional parameters, {{{"${arr[@]}"}}} is expanded to a list of words, one array element per word. For example, {{{ # Korn/Bash for x in "${arr[@]}"; do echo "next element is '$x'" done }}} This works even if the elements contain whitespace. You always end up with the same number of words as you have array elements. If one simply wants to dump the full array, one element per line, this is the simplest approach: {{{ # Bash/ksh printf "%s\n" "${arr[@]}" }}} For more complex array-dumping, {{{"${arr[*]}"}}} will cause the elements to be concatenated together, with the first character of {{{IFS}}} (or a space if IFS isn't set) between them. As it happens, {{{"$*"}}} is expanded the same way for positional parameters. {{{ # Bash arr=(x y z) IFS=/; echo "${arr[*]}"; unset IFS # prints x/y/z }}} |
Line 60: | Line 209: |
# Bash 3.0 or higher arr=(0 1 2 3) arr[42]='what was the question?' unset arr[2] |
|
Line 61: | Line 213: |
# using the previous array, prints 0 1 3 42}}} [:BashFAQ#faq73:Parameter Expansions] may be performed on array elements ''en masse'' as well: {{{ |
# prints 0 1 3 42 }}} Retrieving the indices is extremely important in certain kinds of tasks, such as maintaining two arrays with the same indices (a cheap way to mimic having an array of `struct`s in a language with no `struct`): {{{ # Bash 3.0 or higher unset title artist i for f in *.mp3; do title[i]=$(mp3info -p %t "$f") artist[i++]=$(mp3info -p %a "$f") done # Later, iterate over every song for i in ${!title[*]}; do echo "${title[i]} is by ${artist[i]}" done }}} Bash's [[BashFAQ/073|Parameter Expansions]] may be performed on array elements ''en masse'': {{{ # Bash |
Line 68: | Line 237: |
echo "${arr[@]/[aeiou]/}" # prints bc df gh jkl}}} | echo "${arr[@]/[aeiou]/}" # prints bc df gh jkl }}} |
Line 73: | Line 243: |
# Bash | |
Line 76: | Line 247: |
echo "${@:(-2):1}" # second-to-last positional parameter}}} | echo "${@:(-2):1}" # second-to-last positional parameter }}} |
Line 79: | Line 251: |
For examples of loading data into arrays, see [:BashFAQ#faq1:FAQ #1]. For examples of using arrays to hold complex shell commands, see [:BashFAQ#faq50:FAQ #50] and [:BashFAQ#faq40:FAQ #40]. |
How can I use array variables?
BASH and KornShell have one-dimensional arrays indexed by a numerical expression, e.g.:
# Bash host=(micky minnie goofy) n=${#host[*]} for ((i=0;i<n;i++)); do echo "host number $i is ${host[i]}" done
The indexing always begins with 0.
Ksh93 and Bash 4.0 have AssociativeArrays as well. These are not available in Bourne, POSIX, ksh88 or older bash shells.
The awkward expression ${#host[*]} or ${#host[@]} returns the number of elements for the array host. Also noteworthy for BASH is the fact that inside the square brackets, i++ works as a C programmer would expect. The square brackets in an array reference force an ArithmeticExpression. (That shortcut does not work in ksh88.)
When accessing the number of elements, * is quicker than @ in BASH in our testing on Bash-3, and gives the same result. (man bash: Arrays: ${#name[subscript]} expands to the length of ${name[subscript]}. If subscript is * or @, the expansion is the number of elements in the array.)
BASH and Korn shell arrays are also sparse. Elements may be added and deleted out of sequence.
# Bash/ksh arr[0]=0 arr[1]=1 arr[2]=2 arr[42]="what was the question?" unset arr[2] echo "${arr[*]}" # prints 0 1 what was the question?
You should try to write your code in such a way that it can handle sparse arrays, unless you know in advance that an array will never have holes.
1. Loading values into an array
Assigning one element at a time is simple, and portable:
# Bash/ksh arr[0]=0 arr[42]='the answer'
It's possible to assign multiple values to an array at once, but the syntax differs across shells.
# Bash array=(zero one two three four) # Korn set -A array -- zero one two three four
When initializing in this way, the first index will be 0.
You can also initialize an array using a glob:
# Bash oggs=(*.ogg) # Korn set -A oggs -- *.ogg
(see also NullGlob), or a substitution of any kind:
# Bash words=($sentence) set -f; O=$IFS IFS=$'\n' lines=($(< myfile)) IFS=$O; set +f letters=({a..z}) # Bash 3.0 or higher
When the arrname=(...) syntax is used, any substitutions inside the parentheses undergo WordSplitting according to the regular shell rules. Thus, in the second example above, if we want the lines of the input file to become individual array elements (even if they contain whitespace), we must set IFS appropriately (in this case: to a newline).
set -f and set +f disable and re-enable glob expansion, respectively, so that a line like * will not be expanded into filenames. (We could have used that in the words=($sentence) example too, just in case someone slipped a wildcard into a word.) In some scripts, set -f may be in effect already, and therefore running set +f may be undesirable. This is something you must manage properly yourself; there is no easy or elegant way to "store" the glob expansion switch setting and restore it later. (And don't try to say parsing the output of set -o is easy, because it's not.)
Here are some more ksh examples:
# Korn set -A words -- $sentence set -f; O=$IFS IFS=' '; set -A lines -- $(< myfile); IFS=$O; set +f
That's a literal newline (and nothing else) between the single quotes in the second example. The set -f caveats apply to the first example here, just as they did to the bash version.
Relying on IFS WordSplitting can cause issues if you have repeated whitespace delimiters that you wanted to be treated as multiple delimiters; e.g., a file with blank lines will have repeated newline characters. If you wanted the blank lines to be stored as empty array elements, IFS's behavior will backfire on you; the blank lines will disappear.
The solution to that is to read the elements one at a time, in a loop. Remember that in most shells (including bash), the subcommands of a pipeline are executed in subshells, so you might need to use something like this:
# Bash unset arr i while read -r; do arr[i++]=$REPLY; done < yourfile # or while read -r; do arr[i++]=$REPLY; done < <(your command)
Rather than piping your command to a while read loop, which would cause the array to be set in a subshell -- not very useful in most cases.
If your file or data stream might be missing its final delimiter (e.g. a text file that might be missing a closing newline), the final read command in the loop might "fail" (terminating the loop) but still contain data. There are a couple ways to work around that:
# Bash unset arr i while read -r; do arr[i++]=$REPLY; done < <(your command) # Append unterminated data line if there was one. [[ $REPLY ]] && arr[i++]=$REPLY
Some people prefer reading directly into the array, which works great if there's an unterminated line (since the array element is populated with the partial data before the exit status of read is checked). Unfortunately, this puts an empty element on the end of the array if the data stream is correctly terminated:
# Bash unset arr i while IFS= read -r 'arr[i++]'; do :; done < <(your command) # Remove trailing empty element, if any. if [[ ${arr[i-1]} = "" ]]; then unset 'arr[--i]'; fi
Whether you prefer to read too many and then have to remove one, or read too few and then have to add one, is a personal choice.
If you are trying to deal with records that might have embedded newlines, you might be using an alternative delimiter such as the NUL character ( \0 ) to separate the records. In that case, you'll want to use the -d argument to read as well:
# Bash unset arr i while IFS= read -rd $'\0' 'arr[i++]'; do :; done < <(find . -name '*.ugly' -print0) if [[ ${arr[i-1]} = "" ]]; then unset 'arr[--i]'; fi # or while read -rd $'\0'; do arr[i++]=$REPLY; done < <(find . -name '*.ugly' -print0) [[ $REPLY ]] && arr[i++]=$REPLY
See ProcessSubstitution and FAQ #24 for more details on that syntax.
NOTE: it is necessary to quote the 'arr[i++]' passed to read, so that the square brackets aren't interpreted as globs. This is also true for other non-keyword builtins that take a subscripted variable name, such as let and unset.
If you wish to append data to an existing array, there are several approaches. The most flexible is to keep a separate index variable:
# Bash/ksh93 arr[i++]="new item"
If you don't want to keep an index variable, but you happen to know that your array is not sparse, then you can use the highest existing index:
# Bash/ksh # This will FAIL if the array has holes (is sparse). arr[${#arr[*]}]="new item"
If you don't know whether your array is sparse or not, but you don't mind re-indexing the entire array (and also being very slow), then you can use:
# Bash arr=("${arr[@]}" "new item") # Ksh set -A arr -- "${arr[@]}" "new item"
If you're in bash 3.1 or higher, then you can use the += operator:
# Bash 3.1 arr+=("new item")
For examples of using arrays to hold complex shell commands, see FAQ #50 and FAQ #40.
2. Retrieving values from an array
Using array elements en masse is one of the key features. In exactly the same way that "$@" is expanded for positional parameters, "${arr[@]}" is expanded to a list of words, one array element per word. For example,
# Korn/Bash for x in "${arr[@]}"; do echo "next element is '$x'" done
This works even if the elements contain whitespace. You always end up with the same number of words as you have array elements.
If one simply wants to dump the full array, one element per line, this is the simplest approach:
# Bash/ksh printf "%s\n" "${arr[@]}"
For more complex array-dumping, "${arr[*]}" will cause the elements to be concatenated together, with the first character of IFS (or a space if IFS isn't set) between them. As it happens, "$*" is expanded the same way for positional parameters.
# Bash arr=(x y z) IFS=/; echo "${arr[*]}"; unset IFS # prints x/y/z
BASH 3.0 added the ability to retrieve the list of index values in an array, rather than just iterating over the elements:
# Bash 3.0 or higher arr=(0 1 2 3) arr[42]='what was the question?' unset arr[2] echo ${!arr[*]} # prints 0 1 3 42
Retrieving the indices is extremely important in certain kinds of tasks, such as maintaining two arrays with the same indices (a cheap way to mimic having an array of structs in a language with no struct):
# Bash 3.0 or higher unset title artist i for f in *.mp3; do title[i]=$(mp3info -p %t "$f") artist[i++]=$(mp3info -p %a "$f") done # Later, iterate over every song for i in ${!title[*]}; do echo "${title[i]} is by ${artist[i]}" done
Bash's Parameter Expansions may be performed on array elements en masse:
# Bash arr=(abc def ghi jkl) echo "${arr[@]#?}" # prints bc ef hi kl echo "${arr[@]/[aeiou]/}" # prints bc df gh jkl
Parameter Expansion can also be used to extract elements from an array:
# Bash echo "${arr[@]:1:3}" # three elements starting at #1 (second element) echo "${arr[@]:(-2)}" # last two elements echo "${@:(-1)}" # last positional parameter echo "${@:(-2):1}" # second-to-last positional parameter
The @ array (the array of positional parameters) can be used just like any regularly named array.