Differences between revisions 30 and 44 (spanning 14 versions)
Revision 30 as of 2012-11-30 17:36:57
Size: 19111
Editor: geirha
Comment: Anchors and headings
Revision 44 as of 2021-10-23 00:14:01
Size: 25154
Editor: emanuele6
Comment: add -r to a read -a
Deletions are marked like this. Additions are marked like this.
Line 5: Line 5:
Line 15: Line 16:
    # This does NOT work in the general case
    $ files=$(ls ~/*.jpg); cp $files /backups/
# This does NOT work in the general case
$ files=$(ls ~/*.jpg); cp $files /backups/
Line 21: Line 22:
    # This DOES work in the general case
    $ files=(~/*.jpg); cp "${files[@]}" /backups/
# This DOES work in the general case
$ files=(~/*.jpg); cp "${files[@]}" /backups/
Line 44: Line 45:
    $ names=("Bob" "Peter" "$USER" "Big Bad John") $ names=("Bob" "Peter" "$USER" "Big Bad John")
Line 49: Line 50:
    $ names=([0]="Bob" [1]="Peter" [20]="$USER" [21]="Big Bad John")
    # or...
    $ names[0]="Bob"
$ names=([0]="Bob" [1]="Peter" [20]="$USER" [21]="Big Bad John")
# or...
$ names[0]="Bob"
Line 58: Line 59:
    $ photos=(~/"My Photos"/*.jpg) $ photos=(~/"My Photos"/*.jpg)
Line 65: Line 66:
    $ files=$(ls) # BAD, BAD, BAD!
    $ files=($(ls)) # STILL BAD!
$ files=$(ls) # BAD, BAD, BAD!
$ files=($(ls)) # STILL BAD!
Line 73: Line 74:
    $ files=(*) # Good!
}}}
This statement gives us an array where each filename is a separate element. Perfect!
$ files=(*) # Good!
}}}
This statement gives us an array where each filename is a separate element. The `*` is a glob pattern for `any string` which pathname-expands into all the filenames in the current directory (just like it would in eg. `rm *`). After the pathname expansion, the command will look like `files=([each file in the current directory that matches *])` which assigns all of the files to the array `files`. Perfect!
Line 84: Line 85:
    $ IFS=. read -a ip_elements <<< "127.0.0.1"
}}}
Here we use `IFS` with the value `.` to cut the given IP address into array elements wherever there's a `.`, resulting in an array with the elements `127`, `0`, `0` and `1`.

(The builtin command `read` and the `<<<` operator will be covered in more depth in the [[BashGuide/InputAndOutput|Input and Output]] chapter.)
$ IFS=. read -ra ip_elements <<< "127.0.0.1"
}}}
Here we use `IFS` with the value `.` to cut the given IP address into array elements wherever there's a `.`, resulting in an array with the elements `127`, `0`, `0` and `1`. (The builtin command `read` and the `<<<` operator will be covered in more depth in the [[BashGuide/InputAndOutput|Input and Output]] chapter.)
Line 101: Line 100:
    files=()
    while read -r -d $'\0'; do
     files+=("$REPLY")
    done < <(find /foo -print0)
files=()
while read -r -d ''; do
    files+=("$REPLY")
done < <(find /foo -print0)
Line 108: Line 107:
The first line `files=()` creates an empty array named `files`.

We're using a [[BashGuide/TestsAndConditionals#Conditional_Loops|while loop]] that runs a `read` command each time. The `read` command uses the `-d $'\0'` option, which means that instead of reading a line at a time (up to a newline), we're reading up to a NUL byte (`\0`). It also uses `-r` to prevent it from treating backslashes specially.

Once `read` has read some data and encountered a NUL byte, the `while` loop's body is executed. We put what we read (which is in the parameter `REPLY`) into our array.

To do this, we use the `+=()` syntax. This syntax adds one or more element(s) to the end of our array.

Finally, the `< <(..)` syntax is a combination of ''File Redirection'' (`<`) and ''Process Substitution'' (`<(..)`). Omitting the technical details for now, we'll simply say that this is how we send the output of the `find` command into our `while` loop.
 * The first line `files=()` creates an empty array named `files`.
 * We're using a [[BashGuide/TestsAndConditionals#Conditional_Loops|while loop]] that runs a `read` command each time. The `read` command uses the `-d ''` option to specify the delimiter and it interprets the empty string as a NUL byte (`\0`) (as Bash arguments can not contain NULs). This means that instead of reading a line at a time (up to a newline), we're reading up to a NUL byte. It also uses `-r` to prevent it from treating backslashes specially.
 * Once `read` has read some data and encountered a NUL byte, the `while` loop's body is executed. We put what we read (which is in the parameter `REPLY`) into our array.
 * To do this, we use the `+=()` syntax. This syntax adds one or more element(s) to the end of our array.
 * Finally, the `< <(..)` syntax is a combination of ''File Redirection'' (`<`) and ''Process Substitution'' (`<(..)`). Omitting the technical details for now, we'll simply say that this is how we send the output of the `find` command into our `while` loop.
Line 119: Line 114:

As an aside, check out [[glob#globstar_.28since_bash_4.0-alpha.29|globstar]] if you are using bash >= 4.0 and just want to recursively walk directories.
Line 133: Line 130:
Walking over array elements is really easy. Because an array is such a safe medium of storage, we can simply use a [[BashGuide/TestsAndConditionals#Conditional_Loops|for loop]] to iterate over its elements:

{{{
    $ for file in "${myfiles[@]}"; do
    > cp "$file" /backups/
    > done
}}}
Notice the syntax used to '''expand''' the array here. We use the '''quoted''' form: `"${myfiles[@]}"`. Bash replaces this syntax with every single element in the array, properly quoted.
Once we have an array, there are several things we can do with it.

=== Expanding Elements ===
First of all, we can print the contents to see what's in it:

{{{
$ declare -p myfiles
declare -a myfiles='([0]="/home/wooledg/.bashrc" [1]="billing codes.xlsx" [2]="hello.c")'
}}}

The `declare -p` command prints the contents of one or more variables. In addition, it shows you what Bash thinks the ''type'' of the variable is, and it does all of this using code that you could copy and paste into a script. In our case, the `-a` means this is an array. There are three elements, with indices 0, 1 and 2, and we can see what each one contains.

If we want something a bit less technical, we can also print the array using `printf`:

{{{
$ printf '%s\n' "${myfiles[@]}"
/home/wooledg/.bashrc
billing codes.xlsx
hello.c
}}}

This prints each array element, in order by index, with a newline after each one. Note that if one of the array elements happens to ''contain'' a newline character, we won't be able to tell where each element starts and ends, or even how many there are. That's why it's important to keep our data safely contained in the array as long as possible. Once we print it out, there's no way to reverse that.

The syntax `"${myfiles[@]}"` is extremely important. It works just like `"$@"` does for the positional parameters: it expands to a list of words, with each array element as ''one'' word, no matter what it contains. Even if there are spaces, tabs, newlines, quotation marks, or any other kind of characters in one of the array elements, it'll still be passed along as one word to whichever command we're running.

The `printf` command implicitly loops over all of its arguments. But what if we wanted to do our own loop? In that case, we can use a [[BashGuide/TestsAndConditionals#Conditional_Loops|for loop]] to iterate over the elements:

{{{
$ for file in "${myfiles[@]}"; do
> cp "$file" /backups/
> done
}}}

We use the '''quoted''' form again here: `"${myfiles[@]}"`. Bash replaces this syntax with each element in the array properly quoted – similar to how positional parameters (arguments that were passed to the current script or function) are expanded.
Line 145: Line 168:
    $ names=("Bob" "Peter" "$USER" "Big Bad John")
    $ for name in "${names[@]}"; do echo "$name"; done
}}}
{{{
    $ for name in "Bob" "Peter" "$USER" "Big Bad John"; do echo "$name"; done
}}}
$ names=("Bob" "Peter" "$USER" "Big Bad John")
$ for name in "${names[@]}"; do echo "$name"; done
}}}

{{{
$ for name in "Bob" "Peter" "$USER" "Big Bad John"; do echo "$name"; done
}}}
Line 158: Line 183:
    myfiles=(db.sql home.tbz2 etc.tbz2)
    cp "${myfiles[@]}" /backups/
}}}
$ myfiles=(db.sql home.tbz2 etc.tbz2)
$ cp "${myfiles[@]}" /backups/
}}}
Line 164: Line 190:
    cp "db.sql" "home.tbz2" "etc.tbz2" /backups/
}}}
$ cp "db.sql" "home.tbz2" "etc.tbz2" /backups/
}}}
Line 168: Line 195:
Of course, a `for` loop offers the ultimate flexibility, but `printf` and its implicit looping over arguments can cover many of the simpler cases. It can even produce NUL-delimited streams, perfect for later retrieval:

{{{
$ printf "%s\0" "${myarray[@]}" > myfile
}}}
Line 171: Line 204:
    $ echo "The first name is: ${names[0]}"
    $ echo "The second name is: ${names[1]}"
}}}
$ echo "The first name is: ${names[0]}"
$ echo "The second name is: ${names[1]}"
}}}
Line 179: Line 213:
    $ names=("Bob" "Peter" "$USER" "Big Bad John")
    $ echo "Today's contestants are: ${names[*]}"
    Today's contestants are: Bob Peter lhunath Big Bad John
}}}
$ names=("Bob" "Peter" "$USER" "Big Bad John")
$ echo "Today's contestants are: ${names[*]}"
Today's contestants are: Bob Peter lhunath Big Bad John
}}}
Line 185: Line 220:
Remember to still keep everything nicely '''quoted'''! If you don't keep `${arrayname[*]}` quoted, once again Bash's ''Wordsplitting'' will cut it into bits. Remember to still keep everything nicely '''quoted'''! If you don't keep `${arrayname[*]}` quoted, once again Bash's WordSplitting will cut it into bits.
Line 190: Line 225:
    $ names=("Bob" "Peter" "$USER" "Big Bad John")
    $ ( IFS=,; echo "Today's contestants are: ${names[*]}" )
    Today's contestants are: Bob,Peter,lhunath,Big Bad John
}}}
$ names=("Bob" "Peter" "$USER" "Big Bad John")
$ ( IFS=,; echo "Today's contestants are: ${names[*]}" )
Today's contestants are: Bob,Peter,lhunath,Big Bad John
}}}
Line 198: Line 234:
The `printf` command deserves special mention here, because it's a supremely elegant way to dump an array:

{{{
    $ names=("Bob" "Peter" "$USER" "Big Bad John")
    $ printf "%s\n" "${names[@]}"
    Bob
    Peter
    lhunath
    Big Bad John
}}}
Of course, a `for` loop offers the ultimate flexibility, but `printf` and its implicit looping over arguments can cover many of the simpler cases. It can even produce NUL-delimited streams, perfect for later retrieval:

{{{
    $ printf "%s\0" "${myarray[@]}" > myfile
}}}
Line 217: Line 237:
    $ array=(a b c)
    $ echo ${#array[@]}
    3
}}}
$ array=(a b c)
$ echo ${#array[@]}
3
}}}

=== Expanding Indices ===

Sometimes a problem requires more than just expanding the values of an array in order. You may need to refer to multiple elements at the same time, or refer to the same index in multiple arrays at the same time. In these cases, it's better to expand the array indices, instead of the array values.

Let's say we have two arrays: `first` and `last`. These will hold the first names, and the last names, of a list of people. Obviously we need to make sure that the first and last names of each person are properly matched up. We do this by keeping careful control of the indices.

{{{
$ first=(Jessica Sue Peter)
$ last=(Jones Storm Parker)
}}}

Now, to print the full name of the person with index 1:

{{{
$ echo "${first[1]} ${last[1]}"
Sue Storm
}}}

If we want to loop over all of the names, we can't just loop over `"${first[@]}"` or `"${last[@]}"`. If we do that, we won't have an index into the ''other'' array, so we won't know how to match them up. Instead, we'll loop over the indices of one of the arrays (arbitrarily chosen), and then use that same index in both arrays together:

{{{
$ for i in "${!first[@]}"; do
> echo "${first[i]} ${last[i]}"
> done
Jessica Jones
Sue Storm
Peter Parker
}}}

So, we have a new piece of syntax: `"${!arrayname[@]}"` expands to a list of the ''indices'' of an array, in sequential order.

Another feature worth mentioning is that the `[...]` around the index of an array actually creates an arithmetic context. You can do math there, without wrapping it in `$((...))`. Let's suppose we want to process an array two elements at a time, and let's also suppose we know this array can't be ''sparse''. Then:

{{{
$ a=(a b c q w x y z)
$ for ((i=0; i<${#a[@]}; i+=2)); do
> echo "${a[i]} and ${a[i+1]}"
> done
}}}

We use the arithmetic expression `i+1` as an array index. Bash will evaluate the `i` parameter first, and keep evaluating the value it receives as long as it is a valid ''Name'', until it gets to an integer. Then it will add 1, and use that as the real index.

=== Sparse Arrays ===

We've mentioned sparse arrays already, so this will be brief. Most arrays simply have indices 0, 1, 2, 3, etc. But an array can also have "holes" in the sequence. This can be done either by assigning directly to an index that's way out past the current end of the array, or by ''removing'' an element from an existing array.

{{{
$ nums=(zero one two three four)
$ nums[70]="seventy"
$ unset 'nums[3]'
$ declare -p nums
declare -a nums='([0]="zero" [1]="one" [2]="two" [4]="four" [70]="seventy")'
}}}

Note that we quoted `'nums[3]'` in the `unset` command. This is because an ''unquoted'' `nums[3]` could be interpreted by Bash as a filename glob. If it happens to match a file in the current directory, then `nums[3]` becomes `nums3` and we will unset the wrong variable. That would be bad!

If you follow the practices we've outlined so far, sparse arrays shouldn't cause you any special concerns.

 * Don't assume that your indices are sequential.
 * If the index values matter, always iterate over the indices instead of making assumptions about them.
 * If you loop over the values instead, don't assume anything about which index you might be on currently.
 * In particular, don't assume that just because you're currently in the first iteration of your loop, that you must be on index 0!

When you expand the values of a sparse array using `"${arrayname[@]}"` you will get a list with no gaps in it. There is no way to tell what kind of array these values came from. This can be useful if you want to ''re-index'' an array to remove all of the gaps:

{{{
$ array=("${array[@]}") # This re-creates the indices.
}}}
Line 230: Line 318:
Until recently, [[BASH]] could only use numbers (more specifically, non-negative integers) as keys of arrays. This means you could not "map" or "translate" one string to another. This is something a lot of people missed. People began to (ab)use [[BashFAQ/006|variable indirection]] as a means to address the issue.

Since [[BASH]] 4 was released, there is no longer any excuse to use indirection (or '''worse''', `eval`) for this purpose. You can now use full-featured associative arrays.

To create an associative array, you need to declare it as such (using `declare -A`). This is to guarantee backward compatibility with the standard indexed arrays. Here's how you do that:

{{{
    $ declare -A fullNames
    $ fullNames=( ["lhunath"]="Maarten Billemont" ["greycat"]="Greg Wooledge" )
    $ echo "Current user is: $USER. Full name: ${fullNames[$USER]}."
    Current user is: lhunath. Full name: Maarten Billemont.
}}}
With the same syntax as for indexed arrays, you can iterate over the keys of associative arrays:

{{{
    $ for user in "${!fullNames[@]}"
    > do echo "User: $user, full name: ${fullNames[$user]}."; done
    User: lhunath, full name: Maarten Billemont.
    User: greycat, full name: Greg Wooledge.
}}}
Two things to remember, here: First, the order of the keys you get back from an associative array using the `${!array[@]}` syntax is unpredictable; it won't necessarily be the order in which you assigned elements, or any kind of sorted order.

Second, you cannot omit the `$` if you're using a parameter as the key of an associative array. With standard indexed arrays, the `[...]` part is actually an arithmetic context (really, you can do math there without an explicit `$((...))` markup). In an arithmetic context, a ''Name'' can't possibly be a valid number, and so BASH assumes it's a parameter and that you want to use its content. This doesn't work with associative arrays, since a ''Name'' could just as well be a valid associative array key.
Until recently, Bash could only use numbers (more specifically, non-negative integers) as keys of arrays. This means you could not "map" or "translate" one string to another. This is something a lot of people missed. People began to (ab)use [[BashFAQ/006|variable indirection]] as a means to address the issue.

Since Bash 4 was released, there is no longer any excuse to use indirection (or '''worse''', `eval`) for this purpose. You can now use full-featured associative arrays.

To create an associative array, you need to declare it as such (using `declare -A`). This is necessary, because otherwise bash doesn't know what kind of array you're trying to make. Here's how you make an associative array:

{{{
$ declare -A fullNames
$ fullNames=( ["lhunath"]="Maarten Billemont" ["greycat"]="Greg Wooledge" )
$ echo "Current user is: $USER. Full name: ${fullNames[$USER]}."
Current user is: lhunath. Full name: Maarten Billemont.
}}}

We can print the contents of an associative array very much like we did with regular arrays:

{{{
$ declare -A dict
$ dict[astro]="Foo Bar"
$ declare -p dict
declare -A dict='([astro]="Foo Bar")'
}}}

With the same syntax as for indexed arrays, you can iterate over the keys (indices) of associative arrays:

{{{
$ for user in "${!fullNames[@]}"
> do echo "User: $user, full name: ${fullNames[$user]}."; done
User: lhunath, full name: Maarten Billemont.
User: greycat, full name: Greg Wooledge.
}}}

Two things to remember, here: First, the order of the keys you get back from an associative array using the `"${!array[@]}"` syntax is unpredictable; it won't necessarily be the order in which you assigned elements, or any kind of sorted order. Likewise, if you expand the elements using `"${array[@]}"` you will get them in an unpredictable order. Associative arrays are ''not'' well suited to storing lists that need to be processed in a specific order.

Second, you cannot omit the `$` if you're using a parameter as the key of an associative array. With standard indexed arrays, the `[...]` part is an arithmetic context. In an arithmetic context, a ''Name'' can't possibly be a valid number, and so BASH assumes it's a parameter and that you want to use its content. This doesn't work with associative arrays, since a ''Name'' could just as well be a valid associative array key.
Line 257: Line 356:
    $ indexedArray=( "one" "two" )
    $ declare -A associativeArray=( ["foo"]="bar" ["alpha"]="omega" )
    $ index=0 key="foo"
    $ echo "${indexedArray[$index]}"
    one
    
$ echo "${indexedArray[index]}"
    one
    
$ echo "${indexedArray[index + 1]}"
    two
    $ echo "${associativeArray[$key]}"
    bar
    $ echo "${associativeArray[key]}"

    $ echo "${associativeArray[key + 1]}"
}}}
As you can see, both `$index` and `index` work fine with indexed arrays. They both evaluate to `0`. You can even do math on it to increase it to `1` and get the second value. No go with associative arrays, though. Here, we need to use `$key`; the others fail.
<<Anchor(EndOfContent)>>
$ indexedArray=( "one" "two" )
$ declare -A associativeArray=( ["foo"]="bar" ["alpha"]="omega" )
$ index=0 key="foo"
$ echo "${indexedArray[$index]}"
one
$ echo "${indexedArray[index]}"
one
$ echo "${indexedArray[index + 1]}"
two
$ echo "${associativeArray[$key]}"
bar
$ echo "${associativeArray[key]}"

$ echo "${associativeArray[key + 1]}"
}}}

As you can see, both `$index` and `index` work fine with indexed arrays. They both evaluate to `0`. You can even do math on it to increase it to `1` and get the second value. No go with associative arrays, though. Here, we need to use `$key`; the others fail. <<Anchor(EndOfContent)>>

<- Tests and Conditionals | Input and Output ->


Arrays

As mentioned earlier, BASH provides three types of parameters: Strings, Integers and Arrays.

Strings are without a doubt the most used parameter type. But they are also the most misused parameter type. It is important to remember that a string holds just one element. Capturing the output of a command, for instance, and putting it in a string parameter means that parameter holds just one string of characters, regardless of whether that string represents twenty filenames, twenty numbers or twenty names of people.

And as is always the case when you put multiple items in a single string, these multiple items must be somehow delimited from each other. We, as humans, can usually decipher what the different filenames are when looking at a string. We assume that, perhaps, each line in the string represents a filename, or each word represents a filename. While this assumption is understandable, it is also inherently flawed. Each single filename can contain every character you might want to use to separate the filenames from each other in a string. That means there's technically no telling where the first filename in the string ends, because there's no character that can say: "I denote the end of this filename" because that character itself could be part of the filename.

Often, people make this mistake:

# This does NOT work in the general case
$ files=$(ls ~/*.jpg); cp $files /backups/

When this would probably be a better idea (using array notation, which is explained later, in the next section):

# This DOES work in the general case
$ files=(~/*.jpg); cp "${files[@]}" /backups/

The first attempt at backing up our files in the current directory is flawed. We put the output of ls in a string called files and then use the unquoted $files parameter expansion to cut that string into arguments (relying on Word Splitting). As mentioned before, argument and word splitting cuts a string into pieces wherever there is whitespace. Relying on it means we assume that none of our filenames will contain any whitespace. If they do, the filename will be cut in half or more. Conclusion: bad.

The only safe way to represent multiple string elements in Bash is through the use of arrays. An array is a type of variable that maps integers to strings. That basically means that it holds a numbered list of strings. Since each of these strings is a separate entity (element), it can safely contain any character, even whitespace.

For the best results and the least headaches, remember that if you have a list of things, you should always put it in an array.

Unlike some other programming languages, Bash does not offer lists, tuples, etc. Just arrays, and associative arrays (which are new in Bash 4).


  • Array: An array is a numbered list of strings: It maps integers to strings.


Creating Arrays

There are several ways you can create or fill your array with data. There is no one single true way: the method you'll need depends on where your data comes from and what it is.

The easiest way to create a simple array with data is by using the =() syntax:

$ names=("Bob" "Peter" "$USER" "Big Bad John")

This syntax is great for creating arrays with static data or a known set of string parameters, but it gives us very little flexibility for adding lots of array elements. If you need more flexibility, you can also specify explicit indexes:

$ names=([0]="Bob" [1]="Peter" [20]="$USER" [21]="Big Bad John")
# or...
$ names[0]="Bob"

Notice that there is a gap between indices 1 and 20 in this example. An array with holes in it is called a sparse array. Bash allows this, and it can often be quite useful.

If you want to fill an array with filenames, then you'll probably want to use Globs in there:

$ photos=(~/"My Photos"/*.jpg)

Notice here that we quoted the My Photos part because it contains a space. If we hadn't quoted it, Bash would have split it up into photos=('~/My' 'Photos/'*.jpg ) which is obviously not what we want. Also notice that we quoted only the part that contained the space. That's because we cannot quote the ~ or the *; if we do, they'll become literal and Bash won't treat them as special characters anymore.

Unfortunately, its really easy to equivocally create arrays with a bunch of filenames in the following way:

$ files=$(ls)    # BAD, BAD, BAD!
$ files=($(ls))  # STILL BAD!

Remember to always avoid using ls. The first would create a string with the output of ls. That string cannot possibly be used safely for reasons mentioned in the Arrays introduction. The second is closer, but it still splits up filenames with whitespace.

This is the right way to do it:

$ files=(*)      # Good!

This statement gives us an array where each filename is a separate element. The * is a glob pattern for any string which pathname-expands into all the filenames in the current directory (just like it would in eg. rm *). After the pathname expansion, the command will look like files=([each file in the current directory that matches *]) which assigns all of the files to the array files. Perfect!

This section that we're about to introduce contains some advanced concepts. If you get lost, you may want to return here after you've read the whole guide. You can skip ahead to Using Arrays if you want to keep things simple.

Now, sometimes we want to build an array from a string or the output of a command. Commands (generally) just output strings: for instance, running a find command will enumerate filenames, and separate these filenames with newlines (putting each filename on a separate line). So to parse that one big string into an array we need to tell Bash where each element ends. (Note, this is a bad example, because filenames can contain a newline, so it is not safe to delimit them with newlines! But see below.)

Breaking up a string is what IFS is used for:

$ IFS=. read -ra ip_elements <<< "127.0.0.1"

Here we use IFS with the value . to cut the given IP address into array elements wherever there's a ., resulting in an array with the elements 127, 0, 0 and 1. (The builtin command read and the <<< operator will be covered in more depth in the Input and Output chapter.)

We could do the same thing with a find command, by setting IFS to a newline. But then our script would fail when someone creates a filename with a newline in it (either accidentally or maliciously).

So, is there any way to get a list of elements from an external program (like find) into a Bash array? In general, the answer is yes, provided there is a reliable way to delimit the elements.

In the specific case of filenames, the answer to this problem is NUL bytes. A NUL byte is a byte which is just all zeros: 00000000. Bash strings can't contain NUL bytes, because of an artifact of the "C" programming language: NUL bytes are used in C to mark the end of a string. Since Bash is written in C and uses C's native strings, it inherits that behavior.

A data stream (like the output of a command, or a file) can contain NUL bytes. Streams are like strings with three big differences: they are read sequentially (you usually can't jump around); they're unidirectional (you can read from them, or write to them, but typically not both); and they can contain NUL bytes.

File names cannot contain NUL bytes (since they're implemented as C strings by Unix), and neither can the vast majority of human-readable things we would want to store in a script (people's names, IP addresses, etc.). That makes NUL a great candidate for separating elements in a stream. Quite often, the command whose output you want to read will have an option that makes it output its data separated by NUL bytes rather than newlines or something else. find (on GNU and BSD, anyway) has the option -print0, which we'll use in this example:

files=()
while read -r -d ''; do
    files+=("$REPLY")
done < <(find /foo -print0)

This is a safe way of parsing a command's output into strings. Understandably, it looks a little confusing and convoluted at first. So let's take it apart:

  • The first line files=() creates an empty array named files.

  • We're using a while loop that runs a read command each time. The read command uses the -d '' option to specify the delimiter and it interprets the empty string as a NUL byte (\0) (as Bash arguments can not contain NULs). This means that instead of reading a line at a time (up to a newline), we're reading up to a NUL byte. It also uses -r to prevent it from treating backslashes specially.

  • Once read has read some data and encountered a NUL byte, the while loop's body is executed. We put what we read (which is in the parameter REPLY) into our array.

  • To do this, we use the +=() syntax. This syntax adds one or more element(s) to the end of our array.

  • Finally, the < <(..) syntax is a combination of File Redirection (<) and Process Substitution (<(..)). Omitting the technical details for now, we'll simply say that this is how we send the output of the find command into our while loop.

The find command itself uses the -print0 option as mentioned before to tell it to separate the filenames it finds with a NUL byte.

As an aside, check out globstar if you are using bash >= 4.0 and just want to recursively walk directories.


  • Good Practice:
    Arrays are a safe list of strings. They are perfect for storing multiple filenames.
    If you have to parse a stream of data into component elements, there must be a way to tell where each element starts and ends. The NUL byte is very often the best choice for this job.
    If you have a list of things, keep it in list form as long as possible. Don't smash it into a string or a file until you absolutely have to. If you do have to write it out to a file and read it back in later, keep in mind the delimiter problem we mentioned above.




Using Arrays

Once we have an array, there are several things we can do with it.

Expanding Elements

First of all, we can print the contents to see what's in it:

$ declare -p myfiles
declare -a myfiles='([0]="/home/wooledg/.bashrc" [1]="billing codes.xlsx" [2]="hello.c")'

The declare -p command prints the contents of one or more variables. In addition, it shows you what Bash thinks the type of the variable is, and it does all of this using code that you could copy and paste into a script. In our case, the -a means this is an array. There are three elements, with indices 0, 1 and 2, and we can see what each one contains.

If we want something a bit less technical, we can also print the array using printf:

$ printf '%s\n' "${myfiles[@]}"
/home/wooledg/.bashrc
billing codes.xlsx
hello.c

This prints each array element, in order by index, with a newline after each one. Note that if one of the array elements happens to contain a newline character, we won't be able to tell where each element starts and ends, or even how many there are. That's why it's important to keep our data safely contained in the array as long as possible. Once we print it out, there's no way to reverse that.

The syntax "${myfiles[@]}" is extremely important. It works just like "$@" does for the positional parameters: it expands to a list of words, with each array element as one word, no matter what it contains. Even if there are spaces, tabs, newlines, quotation marks, or any other kind of characters in one of the array elements, it'll still be passed along as one word to whichever command we're running.

The printf command implicitly loops over all of its arguments. But what if we wanted to do our own loop? In that case, we can use a for loop to iterate over the elements:

$ for file in "${myfiles[@]}"; do
>     cp "$file" /backups/
> done

We use the quoted form again here: "${myfiles[@]}". Bash replaces this syntax with each element in the array properly quoted – similar to how positional parameters (arguments that were passed to the current script or function) are expanded.

The following two examples have the same effect:

$ names=("Bob" "Peter" "$USER" "Big Bad John")
$ for name in "${names[@]}"; do echo "$name"; done

$ for name in "Bob" "Peter" "$USER" "Big Bad John"; do echo "$name"; done

The first example creates an array named names which is filled up with a few elements. Then the array is expanded into these elements, which are then used by the for loop. In the second example, we skipped the array and just passed the list of elements directly to for.

Remember to quote the ${arrayname[@]} expansion properly. If you don't, you'll lose all benefit of having used an array at all: leaving arguments unquoted means you're telling Bash it's OK to wordsplit them into pieces and break everything again.

The above example expanded the array in a for-loop statement. But you can expand the array anywhere you want to put its elements as arguments; for instance in a cp command:

$ myfiles=(db.sql home.tbz2 etc.tbz2)
$ cp "${myfiles[@]}" /backups/

This runs the cp command, replacing the "${myfiles[@]}" part with every filename in the myfiles array, properly quoted. After expansion, Bash will effectively run this:

$ cp "db.sql" "home.tbz2" "etc.tbz2" /backups/

cp will then copy the files to your /backups/ directory.

Of course, a for loop offers the ultimate flexibility, but printf and its implicit looping over arguments can cover many of the simpler cases. It can even produce NUL-delimited streams, perfect for later retrieval:

$ printf "%s\0" "${myarray[@]}" > myfile

You can also expand single array elements by referencing their element number (called index). Remember that by default, arrays are zero-based, which means that their first element has the index zero:

$ echo "The first name is: ${names[0]}"
$ echo "The second name is: ${names[1]}"

(You could create an array with no element 0. Remember what we said about sparse arrays earlier -- you can have "holes" in the sequence of indices, and this applies to the beginning of the array as well as the middle. It's your responsibility as the programmer to know which of your arrays are potentially sparse, and which ones are not.)

There is also a second form of expanding all array elements, which is "${arrayname[*]}". This form is ONLY useful for converting arrays into a single string with all the elements joined together. The main purpose for this is outputting the array to humans:

$ names=("Bob" "Peter" "$USER" "Big Bad John")
$ echo "Today's contestants are: ${names[*]}"
Today's contestants are: Bob Peter lhunath Big Bad John

Notice that in the resulting string, there's no way to tell where the names begin and end! This is why we keep everything separate as long as possible.

Remember to still keep everything nicely quoted! If you don't keep ${arrayname[*]} quoted, once again Bash's WordSplitting will cut it into bits.

You can combine IFS with "${arrayname[*]}" to indicate the character to use to delimit your array elements as you merge them into a single string. This is handy, for example, when you want to comma delimit names:

$ names=("Bob" "Peter" "$USER" "Big Bad John")
$ ( IFS=,; echo "Today's contestants are: ${names[*]}" )
Today's contestants are: Bob,Peter,lhunath,Big Bad John

Notice how in this example we put the IFS=,; echo ... statement in a Subshell by wrapping ( and ) around it. We do this because we don't want to change the default value of IFS in the main shell. When the subshell exits, IFS still has its default value and no longer just a comma. This is important because IFS is used for a lot of things, and changing its value to something non-default will result in very odd behavior if you don't expect it!

Alas, the "${array[*]}" expansion only uses the first character of IFS to join the elements together. If we wanted to separate the names in the previous example with a comma and a space, we would have to use some other technique (for example, a for loop).

One final tip: you can get the number of elements of an array by using ${#array[@]}

$ array=(a b c)
$ echo ${#array[@]}
3

Expanding Indices

Sometimes a problem requires more than just expanding the values of an array in order. You may need to refer to multiple elements at the same time, or refer to the same index in multiple arrays at the same time. In these cases, it's better to expand the array indices, instead of the array values.

Let's say we have two arrays: first and last. These will hold the first names, and the last names, of a list of people. Obviously we need to make sure that the first and last names of each person are properly matched up. We do this by keeping careful control of the indices.

$ first=(Jessica Sue Peter)
$ last=(Jones Storm Parker)

Now, to print the full name of the person with index 1:

$ echo "${first[1]} ${last[1]}"
Sue Storm

If we want to loop over all of the names, we can't just loop over "${first[@]}" or "${last[@]}". If we do that, we won't have an index into the other array, so we won't know how to match them up. Instead, we'll loop over the indices of one of the arrays (arbitrarily chosen), and then use that same index in both arrays together:

$ for i in "${!first[@]}"; do
> echo "${first[i]} ${last[i]}"
> done
Jessica Jones
Sue Storm
Peter Parker

So, we have a new piece of syntax: "${!arrayname[@]}" expands to a list of the indices of an array, in sequential order.

Another feature worth mentioning is that the [...] around the index of an array actually creates an arithmetic context. You can do math there, without wrapping it in $((...)). Let's suppose we want to process an array two elements at a time, and let's also suppose we know this array can't be sparse. Then:

$ a=(a b c q w x y z)
$ for ((i=0; i<${#a[@]}; i+=2)); do
> echo "${a[i]} and ${a[i+1]}"
> done

We use the arithmetic expression i+1 as an array index. Bash will evaluate the i parameter first, and keep evaluating the value it receives as long as it is a valid Name, until it gets to an integer. Then it will add 1, and use that as the real index.

Sparse Arrays

We've mentioned sparse arrays already, so this will be brief. Most arrays simply have indices 0, 1, 2, 3, etc. But an array can also have "holes" in the sequence. This can be done either by assigning directly to an index that's way out past the current end of the array, or by removing an element from an existing array.

$ nums=(zero one two three four)
$ nums[70]="seventy"
$ unset 'nums[3]'
$ declare -p nums
declare -a nums='([0]="zero" [1]="one" [2]="two" [4]="four" [70]="seventy")'

Note that we quoted 'nums[3]' in the unset command. This is because an unquoted nums[3] could be interpreted by Bash as a filename glob. If it happens to match a file in the current directory, then nums[3] becomes nums3 and we will unset the wrong variable. That would be bad!

If you follow the practices we've outlined so far, sparse arrays shouldn't cause you any special concerns.

  • Don't assume that your indices are sequential.
  • If the index values matter, always iterate over the indices instead of making assumptions about them.
  • If you loop over the values instead, don't assume anything about which index you might be on currently.
  • In particular, don't assume that just because you're currently in the first iteration of your loop, that you must be on index 0!

When you expand the values of a sparse array using "${arrayname[@]}" you will get a list with no gaps in it. There is no way to tell what kind of array these values came from. This can be useful if you want to re-index an array to remove all of the gaps:

$ array=("${array[@]}")      # This re-creates the indices.


  • Good Practice:
    Always quote your array expansions properly, just like you would your normal parameter expansions.
    Use "${myarray[@]}" to expand all your array elements and ONLY use "${myarray[*]}" when you want to merge all your array elements into a single string.


Associative Arrays

Until recently, Bash could only use numbers (more specifically, non-negative integers) as keys of arrays. This means you could not "map" or "translate" one string to another. This is something a lot of people missed. People began to (ab)use variable indirection as a means to address the issue.

Since Bash 4 was released, there is no longer any excuse to use indirection (or worse, eval) for this purpose. You can now use full-featured associative arrays.

To create an associative array, you need to declare it as such (using declare -A). This is necessary, because otherwise bash doesn't know what kind of array you're trying to make. Here's how you make an associative array:

$ declare -A fullNames
$ fullNames=( ["lhunath"]="Maarten Billemont" ["greycat"]="Greg Wooledge" )
$ echo "Current user is: $USER.  Full name: ${fullNames[$USER]}."
Current user is: lhunath.  Full name: Maarten Billemont.

We can print the contents of an associative array very much like we did with regular arrays:

$ declare -A dict
$ dict[astro]="Foo Bar"
$ declare -p dict
declare -A dict='([astro]="Foo Bar")'

With the same syntax as for indexed arrays, you can iterate over the keys (indices) of associative arrays:

$ for user in "${!fullNames[@]}"
> do echo "User: $user, full name: ${fullNames[$user]}."; done
User: lhunath, full name: Maarten Billemont.
User: greycat, full name: Greg Wooledge.

Two things to remember, here: First, the order of the keys you get back from an associative array using the "${!array[@]}" syntax is unpredictable; it won't necessarily be the order in which you assigned elements, or any kind of sorted order. Likewise, if you expand the elements using "${array[@]}" you will get them in an unpredictable order. Associative arrays are not well suited to storing lists that need to be processed in a specific order.

Second, you cannot omit the $ if you're using a parameter as the key of an associative array. With standard indexed arrays, the [...] part is an arithmetic context. In an arithmetic context, a Name can't possibly be a valid number, and so BASH assumes it's a parameter and that you want to use its content. This doesn't work with associative arrays, since a Name could just as well be a valid associative array key.

Let's demonstrate with examples:

$ indexedArray=( "one" "two" )
$ declare -A associativeArray=( ["foo"]="bar" ["alpha"]="omega" )
$ index=0 key="foo"
$ echo "${indexedArray[$index]}"
one
$ echo "${indexedArray[index]}"
one
$ echo "${indexedArray[index + 1]}"
two
$ echo "${associativeArray[$key]}"
bar
$ echo "${associativeArray[key]}"

$ echo "${associativeArray[key + 1]}"

As you can see, both $index and index work fine with indexed arrays. They both evaluate to 0. You can even do math on it to increase it to 1 and get the second value. No go with associative arrays, though. Here, we need to use $key; the others fail.


<- Tests and Conditonals | Input and Output ->

BashGuide/Arrays (last edited 2021-10-23 00:18:04 by emanuele6)