<- Avoiding code injection | Data structures | Example 1: Modifying a config file ->

Data structures

Most of the programs that you'll be writing in bash will require only the basic variable types that are built into bash. A program that requires more advanced data structures is usually better suited to a more full-featured language. However, it is possible to create some data structures in bash. This chapter will explore these techniques.

Contents

Data structures

Basic variable types

As a reminder, bash only has native support for three types of variables:

Strings
Sparsely indexed arrays of strings
Associative arrays of strings

There are no integer types (declare -i doesn't count), no floating point support at all beyond rounding values for output, no structures/records, and no classes/objects. The two types of arrays can only hold strings. You can't make an array of arrays, other than by serializing the inner array as a string for storage.

So, what can we build?

Lists

Lists of strings are easily built using indexed arrays. While we won't have the full toolbox of list operators that we might expect in a higher-level language, we can perform basic operations.

# Initialization
list1=(this is a list)    # Constant strings
list2=()                  # An empty list
list3=(*.txt)             # Pathname expansion

# Length
len=${#list1[@]}

# Append new item(s)
list1+=("$new")

# Iterate by value
for item in "${list1[@]}"; do
  ...
done

# Iterate by index
for i in "${!list1[@]}"; do
  ... "${list1[i]}" ...
done

# Pass all list values as arguments
somecmd "${list1[@]}"

If we don't remove any items from the list, the array's indices will also correspond to positional values for the list items (index 0 for the first item, index 1 for the second, and so on). We can make use of that to grab sub-lists:

# Sub-list (if no items have been deleted)
sublist=( "${list1[@]:startindex:length}" )

We can remove an item from the list by unsetting the corresponding array element. Doing so breaks the relationship between the array index and the list position, which makes extracting sub-lists impossible, unless we re-index the array.

# Remove an item by index
unset 'list1[i]'

# Re-index the array (optional, only needed if we want sub-lists later)
list1=("${list1[@]}")

We can also join the elements of a list together into a string. There are two ways to do this: using IFS, or using printf. If we use IFS, we can have one character as the separator between elements (or nothing at all).

# Join list elements into a string with / as separator
IFS=/
string=${list1[*]}
unset IFS

There is no simple syntax to set IFS temporarily for an assignment. However, we can use a function, where IFS can be a local variable, or we can abuse eval.

# Use a function to support localized IFS
join() { local IFS=/; string=${list1[*]}; }; join

# Abuse eval
IFS=/ eval 'string=${list1[*]}'

Using printf allows us to have a multi-character separator. The simplest way to do this is to generate a string with the separator appearing after each item, and then remove the final separator in a second step.

# Use :: as separator
printf -v string %s:: "${list1[@]}"
string=${string%::}

Queues

An indexed array can also be the foundation for a queue (FIFO, or first-in, first-out). Enqueueing is simply appending a new element to the end of the array. Dequeueing is a bit more complex. There are a few basic approaches we may use for dequeueing.

Unset the first element. No additional bookkeeping.
Unset the first element, and keep an index variable which points to the next element.
Don't unset anything; just keep advancing the index variable.
Unset the first element and re-index the array.

Each method has its own disadvantages. In the first method, discovering the index value of the first element is a nontrivial operation. Hence, keeping an index in a separate value (method two) is an obvious compensation, but introduces a new variable for additional complexity. The third method has the same additional complexity, and also doesn't free any memory when dequeueing, which may be significant if the queue is going to live for a long time. The fourth method is the simplest, but the slowest.

For purposes of this document, let's use method two: unset the first element, and keep an index pointing to the next element.

# Initialization
queue=()
queue_i=0

# Enqueue an item
queue+=("$new")

# Number of items queued
count=${#queue[@]}

# Dequeue an item into the 'item' variable
if (( ${#queue[@]} )); then
  item=${queue[queue_i]}
  unset 'queue[queue_i]'
  if (( ${#queue[@]} )); then
    ((queue_i++))
  else
    queue_i=0
  fi
else
  # optional error message here
  item=
fi

The complexity of dequeueing is visible here. If dequeueing causes the queue to become empty, then the index has to be reset to 0, because the next enqueue will place an item at index 0. However, if the dequeueing does not empty the queue, then we keep incrementing the index.

Stacks

If we're going to build a queue, then we ought to do a stack (LIFO, or last-in, first-out) as well. As with a queue, we can use an indexed array as the foundation. A stack is actually simpler, because we can easily determine the index of the last item in the array, and we will never have a sparse array to deal with.

# Initialization
stack=()

# Push (add new item)
stack+=("$new")

# Number of items
count=${#stack[@]}

# Pop (retrieve item) into variable 'item'
i=$(( ${#stack[@]} - 1 ))
if ((i >= 0)); then
  item=${stack[i]}
  unset 'stack[i]'
else
  # optional error message here
  item=
fi

The temporary variable i should be made local if possible, and of course you may use a different variable name if needed.

Sets

A set (or "hash set", as it's known in some languages) may be implemented using an associative array. In this case, we use a convenient constant as the associative array element's value. For this document, we'll be using the value "1", and using a string length test (empty or non-empty) as a set membership test.

# Initialization
declare -A set=()

# Add an element
set[$new]=1

# Number of elements
count=${#set[@]}

# Membership test
if [[ ${set[$item]} ]]; then
  printf '"%s" is a member\n' "$item"
else
  printf '"%s" is not a member\n' "$item"
fi

Set operations like intersection and union may be performed by iterating.

# Intersection of two sets (items in common)
declare -A inter=()
for i in "${!set1[@]}"; do
  if [[ ${set2[$i]} ]]; then
    inter[$i]=1
  fi
done

# Union (items in either set)
declare -A union=()
for i in "${!set1[@]}"; do union[$i]=1; done
for i in "${!set2[@]}"; do union[$i]=1; done

It should be noted that this approach fails if set -u is in effect. I strongly discourage the use of set -u, but if you insist on using it despite this advice, then you'll have to find another implementation strategy. See BashFAQ/112 for more discussion of set -u.

Ordered dictionary

Typically, a dictionary (associative array) does not remember the order in which elements have been added. Iterating over the keys gives you the keys in an order that has no intrinsic meaning. If you'd like your dictionary's keys to retain an ordering as well as their values, then you may use a list (indexed array) alongside the associative array.

# Ordered dictionary
types=(RTNE PORT LONG)
declare -A type=(
  [RTNE]="Routine EEG"
  [PORT]="Portable EEG"
  [LONG]="Long EEG"
)

for t in "${types[@]}"; do
  printf '%-5s: %s\n' "$t" "${type[$t]}"
done

When we need to retrieve the keys in sequence, we iterate over the list (indexed array), rather than using ${!type[@]} to fetch the keys from the associative array.

Multi-dimensional array

There are two ways to build a multi-dimensional array. Which one is better depends on the task at hand.

Pick a sufficiently large integer constant c, and represent (x,y) as x + y*c (or x*c + y) in an indexed array.
Use an associative array, and represent (x,y) as the key string "x,y".

The first method requires indices to be relatively small non-negative integers. The second allows far more flexibility.