Differences between revisions 30 and 47 (spanning 17 versions)
Revision 30 as of 2010-04-15 15:36:05
Size: 11779
Editor: GreyCat
Comment: link to IFS
Revision 47 as of 2013-03-04 14:37:05
Size: 18138
Editor: ormaaj
Comment: Undo most of my last edit. Accidentally put it in the wrong section.
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
#pragma section-numbers 2
Line 3: Line 4:
BASH and KornShell have one-dimensional arrays indexed by a numerical expression, e.g.:

 {{{
 # Bash
 host=(mickey minnie goofy)
 n=${#host[*]}
 for ((i=0;i<n;i++)); do
     echo "host number $i is ${host[i]}"
 done
 }}}

The indexing always begins with 0, unless you specifically choose otherwise. The awkward expression `${#host[*]}` or `${#host[@]}` returns the number of elements for the array {{{host}}}. (We'll go into more detail on syntax below.)

Ksh93 and Bash 4.0 have [[BashGuide/Arrays#Associative_Arrays|Associative Arrays]] as well. These are not available in Bourne, POSIX, ksh88 or older bash shells.

POSIX and Bourne shells are not guaranteed to have arrays at all (although a POSIX shell on any particular system may have them, they are not in the standard).

BASH and Korn shell arrays are ''sparse''. Elements may be added and deleted out of sequence.

 {{{
 # Bash/ksh
 arr[0]=0
 arr[1]=1
 arr[2]=2
 arr[42]="what was the question?"
 unset 'arr[2]'
 echo "${arr[*]}"
 # prints 0 1 what was the question?
 }}}

You should try to write your code in such a way that it can handle sparse arrays, unless you know in advance that an array will never have holes.
This answer assumes you have a basic understanding of what arrays ''are''. If you're new to this kind of programming, you may wish to start with [[BashGuide/Arrays|the guide's explanation]]. This page is more thorough. See [[#See_Also|links]] at the bottom for more resources.

<<TableOfContents>>

=== Intro ===
One-dimensional integer-indexed arrays are implemented by Bash, Zsh, and most KornShell varieties including AT&T ksh88 or later, mksh, and pdksh. Arrays are not specified by POSIX and not available in legacy or minimalist shells such as BourneShell and Dash. The POSIX-compatible shells that do feature arrays mostly agree on their basic principles, but there are some significant differences in the details. Advanced users of multiple shells should be sure to research the specifics. Ksh93, Zsh, and Bash 4.0 additionally have [[BashGuide/Arrays#Associative_Arrays|Associative Arrays]]. This article focuses on indexed arrays as they are the most common and useful type.

Here is a typical usage pattern featuring an array named {{{host}}}:

{{{
# Bash

# Assign the values "mickey", "minnie", and "goofy" to sequential indexes starting with zero.
host=(mickey minnie goofy)

# Iterate over the indexes of "host".
for idx in "${!host[@]}"; do
    printf 'Host number %d is %s' "$idx" "${host[idx]}"
done
}}}
`"${!host[@]}"` expands to the indices of of the {{{host}}} array, each as a separate argument. (We'll go into more detail on syntax below.)

Indexed arrays are ''sparse'', and elements may be inserted and deleted out of sequence.

{{{
# Bash/ksh

# Simple assignment syntax.
arr[0]=0
arr[2]=2
arr[1]=1
arr[42]='what was the question?'

# Unset the seceond element of "arr"
unset -v 'arr[2]'

# Concatenate the values, to a single argument separated by spaces, and echo the result.
echo "${arr[*]}"
# outputs: "0 1 what was the question?"
}}}
It is good practice to write your code in such a way that it can handle sparse arrays, even if you think you can guarantee that there will never be any "holes". Only treat arrays as "lists" if you're certain, and the savings in complexity is significant enough for it to be justified.
Line 36: Line 47:
Line 39: Line 49:
 {{{
 # Bash/ksh
 arr[0]=0
 arr[42]='the answer'
 }}}

It's possible to assign multiple values to an array at once, but the syntax differs across shells.

 {{{
 # Bash/ksh93
 array=(zero one two three four)

 # Korn
 set -A array -- zero one two three four
 }}}

When initializing in this way, the first index will be 0.

You can also initialize an array using a [[glob]] (see also NullGlob):

 {{{
 # Bash/ksh93
 oggs=(*.ogg)

 # Korn
 set -A oggs -- *.ogg
 }}}

or using a substitution of any kind:

 {{{
 # Bash
 words=($sentence)
 letters=({a..z}) # Bash 3.0 or higher

 # Korn
 set -A words -- $sentence
 }}}

When the `arrname=(...)` syntax is used, any unquoted substitutions inside the parentheses undergo WordSplitting and [[glob]] expansion according to the regular shell rules. In the first example above, if any of the words in `$sentence` contain glob characters, filename expansion may occur.

`set -f` and `set +f` may be used to disable and re-enable [[glob]] expansion, respectively, so that words like `*` will not be expanded into filenames. In some scripts, `set -f` may be in effect already, and therefore running `set +f` may be undesirable. This is something you must manage properly yourself; there is no easy or elegant way to "store" the glob expansion switch setting and restore it later. (And don't try to say parsing the output of `set -o` is easy, because it's not.)
{{{
# Bash/ksh
arr[0]=0
arr[42]='the answer'
}}}
It's possible to assign multiple values to an array at once, but the syntax differs across shells. Bash supports only the {{{arrName=(args...)}}} syntax. ksh88 supports only the {{{set -A arrName -- args...}}} syntax. ksh93, mksh, and zsh support both. There are subtle differences in both methods between all of these shells if you look closely.

{{{
# Bash, ksh93, mksh, zsh
array=(zero one two three four)
}}}
{{{
# ksh88/93, mksh, zsh
set -A array -- zero one two three four
}}}
When initializing in this way, the first index will be 0 unless a different index is specified.

With compound assignment, the space between the parentheses is evaluated in the same way as the arguments to a command, including [[glob|pathname expansion]] and WordSplitting. Any type of expansion or substitution may be used. All the usual [[Quotes|quoting]] rules apply within.

{{{
# Bash/ksh93
oggs=(*.ogg)
}}}
With ksh88-style assignment using {{{set}}}, the arguments are just ordinary arguments to a command.

{{{
# Korn
set -A oggs -- *.ogg
}}}
{{{
# Bash (brace expansion requires 3.0 or higher)
homeDirs=(~{,root}) # brace expansion occurs in a different order in ksh, so this is bash-only.
letters=({a..z}) # Not all shells with sequence-expansion can use letters.
}}}
{{{
# Korn
set -A args -- "$@"
}}}
Line 83: Line 88:
Line 87: Line 91:
 # Bash 4
 mapfile -t lines < myfile

 # or
 mapfile -t lines < <(some command)
}}}

See ProcessSubstitution and [[BashFAQ/024|FAQ #24]] for more details on the `<()` syntax.

`mapfile` handles blank lines (it inserts them as empty array elements), and it also handles missing final newlines from the input stream. Both those things become problematic when reading data in other ways, as we shall see momentarily.

`mapfile` does have one serious drawback: it can ''only'' handle newlines as line terminators. It can't, for example, handle NUL-delimited files from `find -print0`.

In other shells, we might start out like this:

{{{
 # These examples only work with certain kinds of input files.

 # Bash
 set -f; O=$IFS IFS=$'\n' lines=($(< myfile)) IFS=$O; set +f

 # Korn
 set -f; O=$IFS IFS='
 '; set -A lines -- $(< myfile); IFS=$O; set +f
}}}

We use [[IFS]] (setting it to a newline) because we want each ''line'' of input to become an array element, not each ''word''. (This particular syntax may have undesired results with blank lines of input; see below for alternatives.)

That's a literal newline (and nothing else) between the single quotes in the Korn example.

Relying on IFS WordSplitting can cause issues if you have repeated whitespace delimiters that you wanted to be treated as multiple delimiters; e.g., a file with blank lines will have repeated newline characters. If you wanted the blank lines to be stored as empty array elements, IFS's behavior will backfire on you; the blank lines will disappear.

The solution to that is to read the elements one at a time, in a loop. Remember that in most shells (including bash), the subcommands of a pipeline are executed in [[SubShell|subshells]], so you might need to use something like this:

 {{{
 # Bash
 unset arr i
 while read -r; do arr[i++]=$REPLY; done < yourfile

 # or
 while read -r; do arr[i++]=$REPLY; done < <(your command)
 }}}

Rather than piping your command to a `while read` loop, which would cause the array to be [[BashFAQ/024|set in a subshell]] -- not very useful in most cases.

Also noteworthy for BASH is the fact that inside the square brackets, `i++` works as a C programmer would expect. The square brackets in an array reference force an ArithmeticExpression. (That shortcut does not work in ksh88.)

If your file or data stream might be missing its final delimiter (e.g. a text file that might be missing a closing newline), the final `read` command in the loop might "fail" (terminating the loop) but still contain data. There are a couple ways to work around that:

 {{{
 # Bash
 unset arr i
 while read -r; do arr[i++]=$REPLY; done < <(your command)
 # Append unterminated data line if there was one.
 [[ $REPLY ]] && arr[i++]=$REPLY
 }}}

Some people prefer reading directly into the array, which works great if there's an unterminated line (since the array element is populated with the partial data before the exit status of `read` is checked). Unfortunately, this puts an empty element on the end of the array if the data stream ''is'' correctly terminated:

 {{{
 # Bash
 unset arr i
 while IFS= read -r 'arr[i++]'; do :; done < <(your command)
 # Remove trailing empty element, if any.
 if [[ ${arr[i-1]} = "" ]]; then unset 'arr[--i]'; fi
 }}}
# Bash 4
mapfile -t lines <myfile

# or
mapfile -t lines < <(some command)
}}}
See ProcessSubstitution and [[BashFAQ/024|FAQ #24]] for more details on the `<(...)` syntax.

`mapfile` handles blank lines by inserting them as empty array elements, and also missing final newlines from the input stream. These can be problematic when reading data in other ways (see the next section). `mapfile` does have one serious drawback: it can ''only'' handle newlines as line terminators. Not all options supported by `read` are handled by `mapfile, and visa-versa. `mapfile` can't, for example, handle NUL-delimited files from `find -print0`. When mapfile isn't available, we have to work '''very hard''' to try to duplicate it. There are a great number of ways to ''almost'' get it right, but fail in subtle ways.

These examples will duplicate most of `mapfile`'s basic functionality:

{{{
# Bash, Ksh93, mksh
while IFS= read -r; do
    lines+=("$REPLY")
done <file
[[ $REPLY ]] && lines+=("$REPLY")
}}}
The `+=` operator, when used together with parentheses, appends the element to one greater than the current highest numbered index in the array.

{{{
# Korn
# Ksh88 doesn't support pre/post increment/decrement. mksh and others do.
i=0
while IFS= read -r; do
    lines[i+=1,$i]=$REPLY
done <file
[[ $REPLY ]] && lines[i]=$REPLY
}}}
The square brackets create a [[ArithmeticExpression|math context]]. The result of the expression is the index used for assignment.

===== Handling newlines (or lack thereof) at the end of a file =====
`read` returns false when it reads the last line of a file. This presents a problem: if the file contains a trailing newline, then `read` will be false when reading/assigning that final line, otherwise, it will be false when reading/assigning the last line of data. Without a special check for these cases, no matter what logic is used, you will always end up either with an extra blank element in the resulting array, or a missing final element.

To be clear - most text files ''should'' contain a newline as the last character in the file. Newlines are added to the ends of files by most text editors, and also by [[HereDocument|Here documents]] and [[HereStromg|Here strings]]. Most of the time, this is only an issue when reading output from pipes or process substitutions, or from "broken" text files created with broken or misconfigured tools. Let's look at some examples.

This approach reads the elements one by one, using a loop.

{{{
# Doesn't work correctly!
unset -v arr i
while IFS= read -r 'arr[i++]'; do
    :
done < <(printf '%s\n' {a..d})
}}}
Unfortunately, if the file or input stream contains a trailing newline, a blank element is added at the end of the array, because the `read -r arr[i++]` is executed one extra time after the last line containing text before returning false.

{{{
# Still doesn't work correctly!
unset -v arr i
while read -r; do
    arr[i++]=$REPLY
done < <(printf %s {a..c}$'\n' d)
}}}
The square brackets create a [[ArithmeticExpression|math context]]. Inside them, `i++` works as a C programmer would expect (in all but ksh88).

This approach fails in the reverse case - it correctly handles blank lines and inputs terminated with a newline, but fails to record the last line of input. If the file or stream is missing its final newline. So we need to handle that case specially:

{{{
# Bash, ksh93, mksh
unset -v arr i
while IFS= read -r; do
    arr[i++]=$REPLY
done <file
[[ $REPLY ]] && arr[i++]=$REPLY # Append unterminated data line, if there was one.
}}}
This is very close to the "final solution" we gave earlier -- handling both blank lines inside the file, and an unterminated final line. The null [[IFS]] is used to prevent `read` from stripping possible whitespace from the beginning and end of lines, in the event you wish to preserve them.

Another workaround is to remove the empty element after the loop:

{{{
# Bash
unset -v arr i
while IFS= read -r 'arr[i++]'; do
    :
done <file

# Remove trailing empty element, if any.
[[ ${arr[i-1]} ]] || unset -v 'arr[--i]'
}}}
Line 156: Line 174:
NOTE: it is necessary to quote the `'arr[i++]'` passed to read, so that the square brackets aren't interpreted as [[glob]]s. This is also true for other non-keyword builtins that take a subscripted variable name, such as `let` and `unset`.

If you are trying to deal with records that might have embedded newlines, you might be using an alternative delimiter such as the NUL character ( \0 ) to separate the records. In that case, you'll want to use the -d argument to read as well:
 {{{
 # Bash
 unset arr i
 while IFS= read -rd '' 'arr[i++]'; do :; done < <(find . -name '*.ugly' -print0)
 if [[ ${arr[i-1]} = "" ]]; then unset 'arr[--i]'; fi

 # or
 while read -rd ''; do arr[i++]=$REPLY; done < <(find . -name '*.ugly' -print0)
 [[ $REPLY ]] && arr[i++]=$REPLY
 }}}
NOTE: it is necessary to quote the `'arr[i++]'` passed to read, so that the square brackets aren't interpreted as [[glob|globs]]. This is also true for other non-keyword builtins that take a subscripted variable name, such as `let` and `unset`.

===== Other methods =====
Sometimes stripping blank lines actually is desirable, or you may know that the input will always be newline delimited, such as input generated internally by your script. It is possible in some shells to use the `-d` flag to set `read`'s line delimiter to null, then abuse the `-a` or `-A` (depending on the shell) flag normally used for reading the fields of a line into an array for reading lines. Effectively, the entire input is treated as a single line, and the fields are newline-delimited.

{{{
# Bash 4
    IFS=$'\n' read -rd '' -a lines <file
}}}
{{{
# mksh, zsh
    IFS=$'\n' read -rd '' -A lines <file
}}}

===== Don't read lines with for! =====
'''[[DontReadLinesWithFor|NEVER READ LINES USING for..in LOOPS]]!''' Relying on [[IFS]] WordSplitting causes issues if you have repeated whitespace delimiters, because they will be consolidated. It is not possible to preserve blank lines by having them stored as empty array elements this way. Even worse, special globbing chracters will be expanded without going to lengths to disable and then re-enable it. Just never use this approach - it is problematic, the workarounds are all ugly, and not all problems are solvable.

Because this is such an incredibly common mistake, below illustrates close to the best possible version of this hack, and how much harder it is than just doing it correctly -- and it still can't preserve consecutive newlines! It only gets worse from here. See DontReadLinesWithFor for details.

{{{
# Bash
# WARNING: Don't do this!

evilReadLines() {
    [[ -e $2 ]] || return

    # Try hard to preserve the previous glob and trap states.
    # But if the caller sets ERR or DEBUG, we're still in trouble!
    if [[ $- != *f* ]]; then
        set -f
        local oReturn=$(trap -p RETURN)
        trap 'set +f; trap "${oReturn:--}" RETURN' RETURN
    fi

    local line idx IFS=$'\n'
    for line in ${1:+$(<"$2")}; do
        printf -v "${1}[idx++]" %s "$line"
    done

    # This is an equally bad alternative to the above for loop, albeit slightly faster:
    # IFS=$'\n' declare -a ${1:+"$1"'=( $(<"$2") )'} 2>/dev/null
}
declare -ft evilReadLines # Inherit traps from the caller.

# Pass in an array name and file name
evilReadLines myArray myFile
}}}
==== Reading NUL-delimited streams ====
If you are trying to deal with records that might have embedded newlines, you will be using an alternative delimiter such as the NUL character ( \0 ) to separate the records. In that case, you'll need to use the `-d` argument to `read` as well:

{{{
# Bash
unset -v arr i
while IFS= read -rd '' 'arr[i++]'; do
    :
done < <(find . -name '*.ugly' -print0)

# or
while read -rd ''; do
    arr[i++]=$REPLY
done < <(find . -name '*.ugly' -print0)

# or (bash 3.1 and up)
while read -rd ''; do
    arr+=("$REPLY")
done < <(find . -name '*.ugly' -print0)
}}}
`read -d ''` tells Bash to keep reading until a NUL byte instead of until a newline. This isn't certain to work in all shells with a `-d` feature.
Line 171: Line 244:

If you wish to append data to an existing array, there are several approaches. The most flexible is to keep a separate index variable:

 {{{
 # Bash/ksh93
 arr[i++]="new item"
 }}}

If you don't want to keep an index variable, but you happen to know that your array is ''not sparse'', then you can use the highest existing index:

 {{{
 # Bash/ksh
 # This will FAIL if the array has holes (is sparse).
 arr[${#arr[*]}]="new item"
 }}}

If you don't know whether your array is sparse or not, but you don't mind re-indexing the entire array (and also being very slow), then you can use:

 {{{
 # Bash
 arr=("${arr[@]}" "new item")

 # Ksh
 set -A arr -- "${arr[@]}" "new item"
 }}}
As previously mentioned, arrays are ''sparse'' - that is, numerically adjacent indexes are not guaranteed to be occupied by a value. This confuses what it means to "append" to an existing array. There are several approaches.

If you've been keeping track of the highest-numbered index with a variable (for example, as a side-effect of populating an array in a loop), and can guarantee it's correct, you can just use it and continue to ensure it remains in-sync.

{{{
# Bash/ksh93
arr[++i]="new item"
}}}
If you don't want to keep an index variable, but happen to know that your array is ''not sparse'', then you can use the number of elements to calculate the offset (not recommended):

{{{
# Bash/ksh
# This will FAIL if the array has holes (is sparse).
arr[${#arr[@]}]="new item"
}}}
If you don't know whether your array is sparse or not, but don't mind re-indexing the entire array (very inefficient), then you can use:

{{{
# Bash
arr=("${arr[@]}" "new item")

# Ksh
set -A arr -- "${arr[@]}" "new item"
}}}
Line 199: Line 270:
 {{{
 
# Bash 3.1
 
arr+=("new item")
 }}}

NOTE: the parentheses are required, just as when assigning to an array. (Or you will end up appending to `${arr[0]}` which `$arr` is a synonym for.)
{{{
# Bash 3.1, ksh93, mksh, zsh
arr+=(item 'another item')
}}}
NOTE: the parentheses are required, just as when assigning to an array. Otherwise you will end up appending to `${arr[0]}` which `$arr` is a synonym for. If your shell supports this type of appending, it is the preferred method.
Line 209: Line 279:

`${#arr[*]}` or `${#arr[@]}` gives the number of elements in an array:

 {{{
 # Bash
 shopt -s nullglob
 oggs=(*.ogg)
 echo "There are ${#oggs[*]} Ogg files."
 }}}

When accessing the number of elements, * is quicker than @ in BASH in our testing on Bash-3, and gives the same result. (man bash: Arrays: `${#name[subscript]}` expands to the length of `${name[subscript]}`. If subscript is * or @, the expansion is the number of elements in the array.)

Using array elements ''en masse'' is one of the key features of shell arrays. In exactly the same way that {{{"$@"}}} is expanded for positional parameters, {{{"${arr[@]}"}}} is expanded to a list of words, one array element per word. For example,

 {{{
 # Korn/Bash
 for x in "${arr[@]}"; do
   echo "next element is '$x'"
 done
 }}}
`${#arr[@]}` or `${#arr[*]}` expand to the number of elements in an array:

{{{
# Bash
shopt -s nullglob
oggs=(*.ogg)
echo "There are ${#oggs[@]} Ogg files."
}}}
Single elements are retrieved by index:

{{{
echo "${foo[0]} - ${bar[j+1]}"
}}}
The square brackets are a [[ArithmeticExpression|math context]]. Within an arithmetic context, variables, including arrays, can be referenced by name. For example, in the expansion:

{{{
${arr[x[3+arr[2]]]}
}}}
`arr`'s index will be the value from the array `x` whose index is 3 plus the value of `arr[2]`.

Using array elements ''en masse'' is one of the key features of shell arrays. In exactly the same way that `"$@"` is expanded for positional parameters, `"${arr[@]}"` is expanded to a list of words, one array element per word. For example,

{{{
# Korn/Bash
for x in "${arr[@]}"; do
  echo "next element is '$x'"
done
}}}
Line 234: Line 311:
 {{{
 
# Bash/ksh
 printf "%s\n" "${arr[@]}"
 }}}

For slightly more complex array-dumping, {{{"${arr[*]}"}}} will cause the elements to be concatenated together, with the first character of {{{IFS}}} (or a space if IFS isn't set) between them. As it happens, {{{"$*"}}} is expanded the same way for positional parameters.

 {{{
 
# Bash
 arr=(x y z)
 IFS=/; echo "${arr[*]}"; unset IFS
 # prints x/y/z
 }}}
{{{
# Bash/ksh
printf "%s\n" "${arr[@]}"
}}}
For slightly more complex array-dumping, `"${arr[*]}"` will cause the elements to be concatenated together, with the first character of [[IFS]] (or a space if IFS isn't set) between them. As it happens, `"$*"` is expanded the same way for positional parameters.

{{{
# Bash
arr=(x y z)
IFS=/; echo "${arr[*]}"; unset IFS
# prints x/y/z
}}}
Line 250: Line 325:
 {{{
 
# Bash/ksh
 arr=(x y z)
 x=$(printf "%s<=>" "${arr[@]}")
 echo "${x%<=>}" # Remove the extra <=> from the end.
 # prints x<=>y<=>z
 }}}

BASH 3.0 added the ability to retrieve the list of index values in an array, rather than just iterating over the elements:

 {{{
 
# Bash 3.0 or higher
 arr=(0 1 2 3) arr[42]='what was the question?'
 unset 'arr[2]'
 echo ${!arr[*]}
 
# prints 0 1 3 42
 }}}

Retrieving the indices is extremely important in certain kinds of tasks, such as maintaining parallel arrays with the same indices (a cheap way to mimic having an array of `struct`s in a language with no `struct`):
 {{{
 
# Bash 3.0 or higher
 unset file title artist i
 
for f in ./*.mp3; do
   file[i]=$f
  title[i]=$(mp3info -p %t "$f")
   artist[i++]=$(mp3info -p %a "$f")
 done

 
# Later, iterate over every song.
 # This works even if the arrays are spare, just so long as they all have
 # the SAME holes.
 for i in ${!file[*]}; do
   echo "${file[i]} is ${title[i]} by ${artist[i]}"
 done
 }}}
{{{
# Bash/ksh
arr=(x y z)
tmp=$(printf "%s<=>" "${arr[@]}")
echo "${tmp%<=>}" # Remove the extra <=> from the end.
# prints x<=>y<=>z
}}}
Or using array slicing, described in the next section.

{{{
# Bash/ksh
typeset -a a=([0]=x [5]=y [10]=z)
printf '%s<=>' "${a[@]::${#a[@]}-1}"
printf '%s\n' "${a[@]:(-1)}"
}}}
This also shows how sparse arrays can be assigned multiple elements at once. Note using the `arr=([key]=value ...)` notation differs between shells. In ksh93, this syntax gives you an associative array by default unless you specify otherwise, and using it requires that every value be explicitly given an index, unlike bash, where omitted indexes begin at the previous index. This example was written in a way that's compatible between the two.


BASH 3.0 added the ability to retrieve the list of index values in an array:

{{{
# Bash 3.0 or higher
arr=(0 1 2 3) arr[42]='what was the question?'
unset 'arr[2]'
echo "${!arr[@]}"
# prints 0 1 3 42
}}}
Retrieving the indices is extremely important for certain kinds of tasks, such as maintaining parallel arrays with the same indices (a cheap way to mimic having an array of `struct`s in a language with no `struct`):

{{{
# Bash 3.0 or higher
unset file title artist i
for f in ./*.mp3; do
  file[i]=$f
  title[i]=$(mp3info -p %t "$f")
  artist[i++]=$(mp3info -p %a "$f")
done

# Later, iterate over every song.
# This works even if the arrays are sparse, just so long as they all have
# the SAME holes.
for i in "${!file[@]}"; do
  echo "${file[i]} is ${title[i]} by ${artist[i]}"
done
}}}
==== Retrieving with modifications ====
Line 288: Line 372:
 {{{
 
# Bash
 arr=(abc def ghi jkl)
 echo "${arr[@]#?}" # prints bc ef hi kl
 echo "${arr[@]/[aeiou]/}" # prints bc df gh jkl
 }}}

Parameter Expansion can also be used to extract elements from an array:

 {{{
 
# Bash
 echo "${arr[@]:1:3}" # three elements starting at #1 (second element)
 echo "${arr[@]:(-2)}" # last two elements
 echo "${@:(-1)}" # last positional parameter
 echo "${@:(-2):1}" # second-to-last positional parameter
 }}}

The {{{
@}}} array (the array of positional parameters) can be used just like any regularly named array.
{{{
# Bash
arr=(abc def ghi jkl)
echo "${arr[@]#?}" # prints bc ef hi kl
echo "${arr[@]/[aeiou]/}" # prints bc df gh jkl
}}}
Parameter Expansion can also be used to extract elements from an array. Some people call this ''slicing'':

{{{
# Bash
echo "${arr[@]:1:3}" # three elements starting at #1 (second element)
echo "${arr[@]:(-2)}" # last two elements
echo "${@:(-1)}" # last positional parameter
echo "${@:(-2):1}" # second-to-last positional parameter
}}}
=== Using
@ as a pseudo-array ===
As we see above, the `@`
array (the array of positional parameters) can be used almost like a regularly named array.  This is the ''only'' array available for use in POSIX or Bourne shells. It has certain limitations: you cannot individually set or unset single elements, and it cannot be sparse. Nevertheless, it still makes certain POSIX shell tasks possible that would otherwise require external tools:

{{{
# POSIX
set -- *.mp3
if [ -e "$1" ]; then
  echo "there are $# MP3 files"
else
  echo "there are 0 MP3 files"
fi
}}}
{{{
# POSIX
...
# Add an option to our dynamically generated list of options
set -- "$@" -f "$somefile"
...
foocommand "$@"
}}}
(Compare to [[BashFAQ/050|FAQ #50]]'s dynamically generated commands using named arrays.)

== See Also ==
 * [[http://wiki.bash-hackers.org/syntax/arrays|Bash-hackers array documentation]]
 * [[BashGuide/Arrays]]
 * [[BashSheet#Arrays|BashSheet Array reference]]
 * [[BashFAQ/006#Associative_Arrays|BashFAQ 6 - explaining associative arrays]]

How can I use array variables?

This answer assumes you have a basic understanding of what arrays are. If you're new to this kind of programming, you may wish to start with the guide's explanation. This page is more thorough. See links at the bottom for more resources.

1. Intro

One-dimensional integer-indexed arrays are implemented by Bash, Zsh, and most KornShell varieties including AT&T ksh88 or later, mksh, and pdksh. Arrays are not specified by POSIX and not available in legacy or minimalist shells such as BourneShell and Dash. The POSIX-compatible shells that do feature arrays mostly agree on their basic principles, but there are some significant differences in the details. Advanced users of multiple shells should be sure to research the specifics. Ksh93, Zsh, and Bash 4.0 additionally have Associative Arrays. This article focuses on indexed arrays as they are the most common and useful type.

Here is a typical usage pattern featuring an array named host:

# Bash

# Assign the values "mickey", "minnie", and "goofy" to sequential indexes starting with zero.
host=(mickey minnie goofy)

# Iterate over the indexes of "host".
for idx in "${!host[@]}"; do
    printf 'Host number %d is %s' "$idx" "${host[idx]}"
done

"${!host[@]}" expands to the indices of of the host array, each as a separate argument. (We'll go into more detail on syntax below.)

Indexed arrays are sparse, and elements may be inserted and deleted out of sequence.

# Bash/ksh

# Simple assignment syntax.
arr[0]=0
arr[2]=2
arr[1]=1
arr[42]='what was the question?'

# Unset the seceond element of "arr"
unset -v 'arr[2]'

# Concatenate the values, to a single argument separated by spaces, and echo the result.
echo "${arr[*]}"
# outputs: "0 1 what was the question?"

It is good practice to write your code in such a way that it can handle sparse arrays, even if you think you can guarantee that there will never be any "holes". Only treat arrays as "lists" if you're certain, and the savings in complexity is significant enough for it to be justified.

2. Loading values into an array

Assigning one element at a time is simple, and portable:

# Bash/ksh
arr[0]=0
arr[42]='the answer'

It's possible to assign multiple values to an array at once, but the syntax differs across shells. Bash supports only the arrName=(args...) syntax. ksh88 supports only the set -A arrName -- args... syntax. ksh93, mksh, and zsh support both. There are subtle differences in both methods between all of these shells if you look closely.

# Bash, ksh93, mksh, zsh
array=(zero one two three four)

# ksh88/93, mksh, zsh
set -A array -- zero one two three four

When initializing in this way, the first index will be 0 unless a different index is specified.

With compound assignment, the space between the parentheses is evaluated in the same way as the arguments to a command, including pathname expansion and WordSplitting. Any type of expansion or substitution may be used. All the usual quoting rules apply within.

# Bash/ksh93
oggs=(*.ogg)

With ksh88-style assignment using set, the arguments are just ordinary arguments to a command.

# Korn
set -A oggs -- *.ogg

# Bash (brace expansion requires 3.0 or higher)
homeDirs=(~{,root}) # brace expansion occurs in a different order in ksh, so this is bash-only.
letters=({a..z})    # Not all shells with sequence-expansion can use letters.

# Korn
set -A args -- "$@"

2.1. Loading lines from a file or stream

In bash 4, the mapfile command (also known as readarray) accomplishes this:

# Bash 4
mapfile -t lines <myfile

# or
mapfile -t lines < <(some command)

See ProcessSubstitution and FAQ #24 for more details on the <(...) syntax.

mapfile handles blank lines by inserting them as empty array elements, and also missing final newlines from the input stream. These can be problematic when reading data in other ways (see the next section). mapfile does have one serious drawback: it can only handle newlines as line terminators. Not all options supported by read are handled by mapfile, and visa-versa. mapfile can't, for example, handle NUL-delimited files from find -print0`. When mapfile isn't available, we have to work very hard to try to duplicate it. There are a great number of ways to almost get it right, but fail in subtle ways.

These examples will duplicate most of mapfile's basic functionality:

# Bash, Ksh93, mksh
while IFS= read -r; do
    lines+=("$REPLY")
done <file
[[ $REPLY ]] && lines+=("$REPLY")

The += operator, when used together with parentheses, appends the element to one greater than the current highest numbered index in the array.

# Korn
# Ksh88 doesn't support pre/post increment/decrement. mksh and others do.
i=0
while IFS= read -r; do
    lines[i+=1,$i]=$REPLY
done <file
[[ $REPLY ]] && lines[i]=$REPLY

The square brackets create a math context. The result of the expression is the index used for assignment.

2.1.1. Handling newlines (or lack thereof) at the end of a file

read returns false when it reads the last line of a file. This presents a problem: if the file contains a trailing newline, then read will be false when reading/assigning that final line, otherwise, it will be false when reading/assigning the last line of data. Without a special check for these cases, no matter what logic is used, you will always end up either with an extra blank element in the resulting array, or a missing final element.

To be clear - most text files should contain a newline as the last character in the file. Newlines are added to the ends of files by most text editors, and also by Here documents and Here strings. Most of the time, this is only an issue when reading output from pipes or process substitutions, or from "broken" text files created with broken or misconfigured tools. Let's look at some examples.

This approach reads the elements one by one, using a loop.

# Doesn't work correctly!
unset -v arr i
while IFS= read -r 'arr[i++]'; do
    :
done < <(printf '%s\n' {a..d})

Unfortunately, if the file or input stream contains a trailing newline, a blank element is added at the end of the array, because the read -r arr[i++] is executed one extra time after the last line containing text before returning false.

# Still doesn't work correctly!
unset -v arr i
while read -r; do
    arr[i++]=$REPLY
done < <(printf %s {a..c}$'\n' d)

The square brackets create a math context. Inside them, i++ works as a C programmer would expect (in all but ksh88).

This approach fails in the reverse case - it correctly handles blank lines and inputs terminated with a newline, but fails to record the last line of input. If the file or stream is missing its final newline. So we need to handle that case specially:

# Bash, ksh93, mksh
unset -v arr i
while IFS= read -r; do
    arr[i++]=$REPLY
done <file
[[ $REPLY ]] && arr[i++]=$REPLY # Append unterminated data line, if there was one.

This is very close to the "final solution" we gave earlier -- handling both blank lines inside the file, and an unterminated final line. The null IFS is used to prevent read from stripping possible whitespace from the beginning and end of lines, in the event you wish to preserve them.

Another workaround is to remove the empty element after the loop:

# Bash
unset -v arr i
while IFS= read -r 'arr[i++]'; do
    :
done <file

# Remove trailing empty element, if any.
[[ ${arr[i-1]} ]] || unset -v 'arr[--i]'

Whether you prefer to read too many and then have to remove one, or read too few and then have to add one, is a personal choice.

NOTE: it is necessary to quote the 'arr[i++]' passed to read, so that the square brackets aren't interpreted as globs. This is also true for other non-keyword builtins that take a subscripted variable name, such as let and unset.

2.1.2. Other methods

Sometimes stripping blank lines actually is desirable, or you may know that the input will always be newline delimited, such as input generated internally by your script. It is possible in some shells to use the -d flag to set read's line delimiter to null, then abuse the -a or -A (depending on the shell) flag normally used for reading the fields of a line into an array for reading lines. Effectively, the entire input is treated as a single line, and the fields are newline-delimited.

# Bash 4
    IFS=$'\n' read -rd '' -a lines <file

# mksh,  zsh
    IFS=$'\n' read -rd '' -A lines <file

2.1.3. Don't read lines with for!

NEVER READ LINES USING for..in LOOPS! Relying on IFS WordSplitting causes issues if you have repeated whitespace delimiters, because they will be consolidated. It is not possible to preserve blank lines by having them stored as empty array elements this way. Even worse, special globbing chracters will be expanded without going to lengths to disable and then re-enable it. Just never use this approach - it is problematic, the workarounds are all ugly, and not all problems are solvable.

Because this is such an incredibly common mistake, below illustrates close to the best possible version of this hack, and how much harder it is than just doing it correctly -- and it still can't preserve consecutive newlines! It only gets worse from here. See DontReadLinesWithFor for details.

# Bash
# WARNING: Don't do this!

evilReadLines() {
    [[ -e $2 ]] || return

    # Try hard to preserve the previous glob and trap states.
    # But if the caller sets ERR or DEBUG, we're still in trouble!
    if [[ $- != *f* ]]; then
        set -f
        local oReturn=$(trap -p RETURN)
        trap 'set +f; trap "${oReturn:--}" RETURN' RETURN
    fi

    local line idx IFS=$'\n'
    for line in ${1:+$(<"$2")}; do
        printf -v "${1}[idx++]" %s "$line"
    done

    # This is an equally bad alternative to the above for loop, albeit slightly faster:
    # IFS=$'\n' declare -a ${1:+"$1"'=( $(<"$2") )'} 2>/dev/null
}
declare -ft evilReadLines # Inherit traps from the caller.

# Pass in an array name and file name
evilReadLines myArray myFile

2.2. Reading NUL-delimited streams

If you are trying to deal with records that might have embedded newlines, you will be using an alternative delimiter such as the NUL character ( \0 ) to separate the records. In that case, you'll need to use the -d argument to read as well:

# Bash
unset -v arr i
while IFS= read -rd '' 'arr[i++]'; do
    :
done < <(find . -name '*.ugly' -print0)

# or
while read -rd ''; do
    arr[i++]=$REPLY
done < <(find . -name '*.ugly' -print0)

# or (bash 3.1 and up)
while read -rd ''; do
    arr+=("$REPLY")
done < <(find . -name '*.ugly' -print0)

read -d '' tells Bash to keep reading until a NUL byte instead of until a newline. This isn't certain to work in all shells with a -d feature.

2.3. Appending to an existing array

As previously mentioned, arrays are sparse - that is, numerically adjacent indexes are not guaranteed to be occupied by a value. This confuses what it means to "append" to an existing array. There are several approaches.

If you've been keeping track of the highest-numbered index with a variable (for example, as a side-effect of populating an array in a loop), and can guarantee it's correct, you can just use it and continue to ensure it remains in-sync.

# Bash/ksh93
arr[++i]="new item"

If you don't want to keep an index variable, but happen to know that your array is not sparse, then you can use the number of elements to calculate the offset (not recommended):

# Bash/ksh
# This will FAIL if the array has holes (is sparse).
arr[${#arr[@]}]="new item"

If you don't know whether your array is sparse or not, but don't mind re-indexing the entire array (very inefficient), then you can use:

# Bash
arr=("${arr[@]}" "new item")

# Ksh
set -A arr -- "${arr[@]}" "new item"

If you're in bash 3.1 or higher, then you can use the += operator:

# Bash 3.1, ksh93, mksh, zsh
arr+=(item 'another item')

NOTE: the parentheses are required, just as when assigning to an array. Otherwise you will end up appending to ${arr[0]} which $arr is a synonym for. If your shell supports this type of appending, it is the preferred method.

For examples of using arrays to hold complex shell commands, see FAQ #50 and FAQ #40.

3. Retrieving values from an array

${#arr[@]} or ${#arr[*]} expand to the number of elements in an array:

# Bash
shopt -s nullglob
oggs=(*.ogg)
echo "There are ${#oggs[@]} Ogg files."

Single elements are retrieved by index:

echo "${foo[0]} - ${bar[j+1]}"

The square brackets are a math context. Within an arithmetic context, variables, including arrays, can be referenced by name. For example, in the expansion:

${arr[x[3+arr[2]]]}

arr's index will be the value from the array x whose index is 3 plus the value of arr[2].

Using array elements en masse is one of the key features of shell arrays. In exactly the same way that "$@" is expanded for positional parameters, "${arr[@]}" is expanded to a list of words, one array element per word. For example,

# Korn/Bash
for x in "${arr[@]}"; do
  echo "next element is '$x'"
done

This works even if the elements contain whitespace. You always end up with the same number of words as you have array elements.

If one simply wants to dump the full array, one element per line, this is the simplest approach:

# Bash/ksh
printf "%s\n" "${arr[@]}"

For slightly more complex array-dumping, "${arr[*]}" will cause the elements to be concatenated together, with the first character of IFS (or a space if IFS isn't set) between them. As it happens, "$*" is expanded the same way for positional parameters.

# Bash
arr=(x y z)
IFS=/; echo "${arr[*]}"; unset IFS
# prints x/y/z

Unfortunately, you can't put multiple characters in between array elements using that syntax. You would have to do something like this instead:

# Bash/ksh
arr=(x y z)
tmp=$(printf "%s<=>" "${arr[@]}")
echo "${tmp%<=>}"    # Remove the extra <=> from the end.
# prints x<=>y<=>z

Or using array slicing, described in the next section.

# Bash/ksh
typeset -a a=([0]=x [5]=y [10]=z)
printf '%s<=>' "${a[@]::${#a[@]}-1}"
printf '%s\n' "${a[@]:(-1)}"

This also shows how sparse arrays can be assigned multiple elements at once. Note using the arr=([key]=value ...) notation differs between shells. In ksh93, this syntax gives you an associative array by default unless you specify otherwise, and using it requires that every value be explicitly given an index, unlike bash, where omitted indexes begin at the previous index. This example was written in a way that's compatible between the two.

BASH 3.0 added the ability to retrieve the list of index values in an array:

# Bash 3.0 or higher
arr=(0 1 2 3) arr[42]='what was the question?'
unset 'arr[2]'
echo "${!arr[@]}"
# prints 0 1 3 42

Retrieving the indices is extremely important for certain kinds of tasks, such as maintaining parallel arrays with the same indices (a cheap way to mimic having an array of structs in a language with no struct):

# Bash 3.0 or higher
unset file title artist i
for f in ./*.mp3; do
  file[i]=$f
  title[i]=$(mp3info -p %t "$f")
  artist[i++]=$(mp3info -p %a "$f")
done

# Later, iterate over every song.
# This works even if the arrays are sparse, just so long as they all have
# the SAME holes.
for i in "${!file[@]}"; do
  echo "${file[i]} is ${title[i]} by ${artist[i]}"
done

3.1. Retrieving with modifications

Bash's Parameter Expansions may be performed on array elements en masse:

# Bash
arr=(abc def ghi jkl)
echo "${arr[@]#?}"          # prints bc ef hi kl
echo "${arr[@]/[aeiou]/}"   # prints bc df gh jkl

Parameter Expansion can also be used to extract elements from an array. Some people call this slicing:

# Bash
echo "${arr[@]:1:3}"        # three elements starting at #1 (second element)
echo "${arr[@]:(-2)}"       # last two elements
echo "${@:(-1)}"            # last positional parameter
echo "${@:(-2):1}"          # second-to-last positional parameter

4. Using @ as a pseudo-array

As we see above, the @ array (the array of positional parameters) can be used almost like a regularly named array. This is the only array available for use in POSIX or Bourne shells. It has certain limitations: you cannot individually set or unset single elements, and it cannot be sparse. Nevertheless, it still makes certain POSIX shell tasks possible that would otherwise require external tools:

# POSIX
set -- *.mp3
if [ -e "$1" ]; then
  echo "there are $# MP3 files"
else
  echo "there are 0 MP3 files"
fi

# POSIX
...
# Add an option to our dynamically generated list of options
set -- "$@" -f "$somefile"
...
foocommand "$@"

(Compare to FAQ #50's dynamically generated commands using named arrays.)

See Also


CategoryShell

BashFAQ/005 (last edited 2024-07-18 13:37:28 by GreyCat)