Differences between revisions 24 and 67 (spanning 43 versions)
Revision 24 as of 2011-05-02 18:56:44
Size: 13013
Editor: GreyCat
Comment: "the buglouse trick"
Revision 67 as of 2025-03-22 09:32:45
Size: 20021
Editor: ormaaj
Comment: update nameref section to reflect current support status.
Deletions are marked like this. Additions are marked like this.
Line 3: Line 3:
This is a complex page, because it's a complex topic. It's been divided into roughly three parts: associative arrays, evaluating indirect variables, and assigning indirect variables. There are discussions of programming issues and concepts scattered throughout.

=== Obligatory Note ===
Putting variable names or any other [[BashFAQ/050|bash syntax inside parameters]] is generally a bad idea. It violates the separation between code and data, and as such puts you on a slippery slope toward bugs, security issues, etc. ''Even'' when you know you "got it right", because you "know and ''understand'' exactly what you're doing", bugs happen to all of us and it pays to respect separation practices to minimize the extent of damage they can cause.

Aside from that, it also makes your code non-obvious and non-transparent.

Normally, in bash scripting, you won't need indirect references at all. Generally, people look at this for a solution when they don't understand or know about [[BashGuide/Arrays|Bash Arrays]] or haven't fully considered other Bash features such as functions.
This is a complex page, because it's a complex topic. It's been divided into roughly four parts: associative arrays, name references, evaluating indirect variables, and assigning indirect variables. There are discussions of programming issues and concepts scattered throughout.

<<TableOfContents>>
Line 15: Line 9:
There are certain tasks for which associative arrays are a powerful and completely appropriate tool. There are others for which they are overkill, or simply unsuitable.

To map from one string to another, you need arrays indexed by a string instead of a number. These exists in AWK as "associative arrays", in Perl as "hashes", and in Tcl simply as "arrays". They also exist in [[KornShell|ksh93]], where you'd use them like this:

 {{{
 # ksh93
 typeset -A homedir # Declare ksh93 associative array
 homedir[jim]=/home/jim
 homedir[silvia]=/home/silvia
 homedir[alex]=/home/alex
 
 for user in "${!homedir[@]}" # Enumerate all indices (user names)
 do
     echo "Home directory of user $user is ${homedir[$user]}"
 done
 }}}

BASH supports them from version 4 and up:

 {{{
 # Bash 4 and up
 declare -A homedir
 homedir[jim]=/home/jim
 # or
 homedir=( [jim]=/home/jim
           [silvia]=/home/silvia
           [alex]=/home/alex )
 ...
 }}}

Prior to Bash 4 or if you can't use ksh93, your options are limited. Either move to another interpreter (awk, perl, python, ruby, tcl, ...) or re-evaluate your problem to ''simplify it''.

Suppose we have several subservient hosts with slightly different configuration, and that we want to ssh to each one and run slightly different commands. One way we could set it up would be to hard-code a bunch of ssh commands in per-hostname functions in a single script and just run them in series or in parallel. (Don't reject this out of hand! Simple is good.) Another way would be to store each group of commands as an element of an associative array keyed by the hostname:

 {{{
 source "$conf"
 for host in "${!commands[@]}"; do
     ssh "$host" "${commands[$host]}"
 done

 # Where "$conf" is a file like this:
 declare -A commands
 commands=( [host1]="mvn clean install && cd webapp && mvn jetty:run"
            [host2]="..."
 )
 }}}

This is the kind of approach we'd expect in a high-level language, where we can store hierarchical information in advanced data structures. The difficulty here is that we really want each element of the associative array to be a ''list'' or ''another array'' of command strings. But the shell simply doesn't permit that kind of data structure.
We introduce associative arrays first, because we observe that inexperienced programmers often conjure arcane solutions to problems that would be solved more cleanly with associative arrays.

An [[https://en.wikipedia.org/wiki/Associative_array|associative array]] is an unordered collection of key-value pairs. A value may be retrieved by supplying its corresponding key. Since strings are the only datatype most shells understand, associative arrays map strings to strings, unlike indexed arrays, which map integers to strings. Associative arrays exist in AWK as "associative arrays", in Perl as "hashes", in Tcl as "arrays", in Python and C# as "dictionaries", in Java as a "Map", and in C++11 STL as `std::unordered_map`.

{{{#!highlight bash
# Bash 4 / ksh93

typeset -A homedir # Declare associative array
homedir=( # Compound assignment
    [jim]=/home/jim
    [silvia]=/u/silvia
    [alex]=/net/home/alex
)

homedir[ormaaj]=/home/ormaaj # Ordinary assignment adds another single element

for user in "${!homedir[@]}"; do # Enumerate all indices (user names)
    printf 'Home directory of user %q is: %q\n' "$user" "${homedir[$user]}"
done
}}}

Prior to Bash 4 or if you can't use ksh93, your options are limited. Either move to another interpreter (awk, perl, python, ruby, tcl, ...) or re-evaluate your problem to ''simplify it''. There are certain tasks for which associative arrays are a powerful and completely appropriate tool. There are others for which they are overkill, or simply unsuitable.

Suppose we have several remote hosts with slightly different configuration, and that we want to ssh to each one and run slightly different commands. One way we could set it up would be to hard-code a bunch of ssh commands in per-hostname functions in a single script and just run them in series or in parallel. (Don't reject this out of hand! Simple is good.) Another way would be to store each group of commands as an element of an associative array keyed by the hostname:

{{{#!highlight bash
declare -A commands
commands=(
  [host1]="mvn clean install && cd webapp && mvn jetty:run"
  [host2]="..."
)

for host in "${!commands[@]}"; do
    ssh -- "$host" "${commands[$host]}"
done
}}}

This solution works, because we're encoding a very short shell script in a string, and storing it as an array element. When we call ssh, it passes the string directly to the remote host, where a shell evaluates it and executes it. But what if the scripts were much longer, or more complicated?

If we want to get ''fancy'' and store each sub-command (`cd webapp` for example) as an element of a list, and then have each hostname map to a list of sub-commands, we'd quickly find that we can't do that in a shell. That's the kind of approach we'd expect in a high-level language, where we can store hierarchical information in advanced data structures. We want each element of the associative array to be a ''list'' or ''another array'' of command strings. But the shell simply doesn't offer that kind of data structure.
Line 66: Line 52:
 {{{
 
# A series of conf files named for the hosts we need to run our commands on:
 for conf in /etc/myapp/*; do
  host=${conf##*/}
     ssh "$host" bash < "$conf"
 done

 
# /etc/myapp/hostname is just a script:
 mvn clean install &&
 cd webapp &&
 mvn jetty:run
 }}}

Now we've removed the need for associative arrays, and also the need to maintain a bunch of extremely horrible quoting issues....
{{{#!highlight bash
#
A series of conf files named for the hosts we need to run our commands on:
for conf in /etc/myapp/*; do
    host=${conf##*/}
    ssh -- "$host" bash < "$conf"
done

# /etc/myapp/hostname is just a script:
mvn clean install &&
cd ./webapp &&
mvn jetty:run
}}}

Now we've removed the need for associative arrays, and also the need to maintain a bunch of extremely horrible quoting issues. It is also easy to parallelize using GNU Parallel:

{{{#!highlight bash
parallel ssh -- {/} bash "<" {} ::: /etc/myapp/*
}}}
Line 86: Line 76:
 2. The variable names must match the RegularExpression {{{^[a-zA-Z_][a-zA-Z_0-9]*}}} -- i.e., a variable name cannot contain arbitrary characters but only letters, digits, and underscores. We cannot have a variable's name contain Unix usernames, for instance -- consider a user named {{{hong-hu}}}. A dash '-' cannot be part of a variable name, so the entire attempt to make a variable named `homedir_hong-hu` is doomed from the start.
 3. Quoting is hard to get right. If a content string (not a variable name) can contain whitespace characters and quotes, it's hard to quote it right to preserve it through both shell parsings. And that's just for ''constants'', known at the time you write the program. (Bash's `printf %q` helps, but nothing analogous is available in POSIX shells.)
 4. If the program handles unsanitized user input, it can be [[BashFAQ/048|VERY dangerous]]!
 1. The variable names must be a single line and match the RegularExpression {{{^[a-zA-Z_][a-zA-Z_0-9]*$}}} -- i.e., a variable name cannot contain arbitrary characters but only letters, digits, and underscores. We cannot have a variable's name contain Unix usernames, for instance -- consider a user named {{{hong-hu}}}. A dash '-' cannot be part of a variable name, so the entire attempt to make a variable named `homedir_hong-hu` is doomed from the start.
 1. Quoting is hard to get right. If a content string (not a variable name) can contain whitespace characters and quotes, it's hard to quote it right to preserve it through both shell parsings. And that's just for ''constants'', known at the time you write the program. (Bash's `printf %q` helps, but nothing analogous is available in POSIX shells.)
 1. If the program handles unsanitized user input, it can be [[BashFAQ/048|VERY dangerous]]!
Line 94: Line 84:
=== Evaluating indirect/reference variables ===
[[BASH]] allows you to expand a parameter ''indirectly'' -- that is, one variable may contain the name of another variable:
 {{{
 # Bash
 realvariable=contents
 ref=realvariable
 echo "${!ref}" # prints the contents of the real variable
 }}}

KornShell (ksh93) has a completely different, more powerful syntax -- the `nameref` command (also known as `typeset -n`):
 {{{
 # ksh93
 realvariable=contents
 nameref ref=realvariable
 echo "$ref" # prints the contents of the real variable
 }}}

Unfortunately, for shells other than Bash and ksh93, there is no syntax for ''evaluating'' a referenced variable. You would have to use [[BashFAQ/048|eval]], which means you would have to undergo extreme measures to sanitize your data to avoid catastrophe.
=== Name References ===
ksh93 introduced ''name references'', which are variables that work like symbolic links. The content of a nameref variable is the name of a second variable. Assignments, expansions, and other operations on a nameref variable are redirected to the variable they "point to". `nameref` variables were subsequently ported to mksh, bash 4.3, and zsh versions > 5.9.

Currently, the bash, mksh, and zsh implementations do not support the full power of the ksh93 nameref system.

 * The zsh implementation is currently the most complete in comparison with ksh93, supporting most features including reference parameters to a limited degree, which have been adapted to support its dynamic scoping system. Variables passed to functions by reference in zsh work correctly in most cases. In certain complex cases the implementation is imperfect and several details aren't quite correct yet. These are probably fixable and could be addressed in a future release. It is also the newest implementation, currently unreleased.
 * The bash implementation is less complete. reference parameters are completely unsupported as of 5.3, thus the name that's pointed to by a variable can only be resolved using the same dynamic scoping rules as any other variable name. For example, if you have `declare -n ref=$1` and try to use `$ref`, the shell will look for whatever variable of that name is visible by the current function. There is no way to override this. You cannot say "this nameref points to the variable `foo` in the caller's scope" unambiguously.
 * The mksh implementation shares all of bash's limitations and additionally lacks support for the special `for` loop iteration feature over nameref variables. Namerefs are an mksh extension - oksh and other pdksh derivs have no nameref support.

{{{#!highlight bash
# ksh93 / Bash >= 4.3 / zsh > 5.9 / mksh (no printf builtin)
realvariable=contents
typeset -n ref=realvariable
printf '%s=%q\n' "${!ref}" "$ref" # print the name and contents of the real variable
}}}

As long as you avoid namespace collisions, namerefs can be extremely useful. They give the "indirection" that many people are looking for:

{{{#!highlight bash
arr1=(first array)
arr2=(second array)
declare -n ref
if [[ $someoption ]]; then
    ref=arr2
else
    ref=arr1
fi
for i in "${ref[@]}"; do ...; done
}}}

=== Indirection ===
In this section, we discuss various other tricks available in some older shells where namerefs aren't available.

==== Think before using indirection ====
Putting variable names or any other [[BashFAQ/050|bash syntax inside parameters]] is frequently done incorrectly and in inappropriate situations to solve problems that have better solutions. It violates the separation between code and data, and as such puts you on a slippery slope toward bugs and security issues. Indirection can make your code less transparent and harder to follow.

Normally, in bash scripting, you won't need indirect references at all. Generally, people look at this for a solution when they don't understand or know about [[BashGuide/Arrays|Bash Arrays]] (indexed or associative) or haven't fully considered other Bash features such as functions.

==== Evaluating indirect/reference variables ====
[[BASH]] allows for expanding parameters ''indirectly'' -- that is, one variable may contain the name of another variable. Name reference variables are the preferred method for performing variable indirection. Older versions of Bash could also use a `!` prefix operator in parameter expansions for variable indirection. Namerefs should be used unless portability to older bash versions is required. No other shell uses `${!variable}` for indirection and there are problems relating to use of that syntax for this purpose. It is also less flexible.

{{{#!highlight bash
# Bash
realvariable=contents
ref=realvariable
printf '%s\n' "${!ref}" # prints the contents of the real variable
}}}

Zsh allows you to access a parameter indirectly with the parameter expansion flag `P`:
{{{#!highlight bash
# zsh
realvariable=contents
ref=realvariable
echo ${(P)ref} # prints the contents of the real variable
}}}

zsh's ability to nest parameter expansions allow for referencing [[BashFAQ/005|arrays]] too:
{{{#!highlight bash
# zsh
myfunc() {
 local ref=$1
 echo "array $1 has ${#${(@P)ref}} elements"
}
realarray=(...)
myfunc realarray
}}}

Unfortunately, for shells other than Bash, ksh93, and zsh there is no syntax for ''evaluating'' a referenced variable. You would have to use [[BashFAQ/048|eval]], which means you would have to undergo extreme measures to sanitize your data to avoid catastrophe.
Line 115: Line 155:
ksh93's `nameref` allows us to work with references to [[BashFAQ/005|arrays]], as well as regular scalar variables. For example,
 {{{
 # ksh93
 myfunc() {
   nameref ref=$1
   echo "array $1 has ${#ref[*]} elements"
 }
 realarray=(...)
 myfunc realarray
 }}}

We are not aware of any trick that can duplicate that functionality in POSIX or Bourne shells (short of using [[BashFAQ/048|eval]], which is extremely difficult to do securely). Bash can ''almost'' do it -- some indirect array tricks work, and others do not, and we do not know whether the syntax involved will remain stable in future releases. So, consider this a ''use at your own risk'' hack.

 {{{
 # Bash -- trick #1. Seems to work in bash 2 and up.
 realarray=(...) ref=realarray; index=2
 tmp="$ref[$index]"
 echo "${!tmp}" # gives array element [2]

 # Bash -- trick #2. Seems to work in bash 3 and up.
 # Does NOT work in bash 2.05b.
 tmp="$ref[@]"
 printf "<%s> " "${!tmp}"; echo # Iterate whole array.
 }}}

We do not know of any way to retrieve the array indices, or even the number of elements, through this kind of indirection.

=== Assigning indirect/reference variables ===
Sometimes you'd like to "point" from one variable to another, for purposes of writing information to a dynamically configurable place. Typically this happens when you're trying to write a "reusable" function, and you want it to [[BashFAQ/084|put its output]] in a variable of the caller's choice instead of the function's choice. ([[BashWeaknesses|Reusability of shell functions]] is dubious at best, so this is something that should not happen ''often''.)

Assigning a value "through" a reference (or pointer, or indirect variable, or whatever you want to call it -- I'm going to use "ref" from now on) is more widely possible, but the means of doing so are extremely shell-specific.

In ksh93, we can just use `nameref` again:
 {{{
 # ksh93
 nameref ref=realvariable
 ref="contents"
 # realvariable now contains the string "contents"
 }}}

In Bash, we can use {{{read}}} and Bash's [[HereDocument|here string]] syntax:
 {{{
 # Bash
 ref=realvariable
 IFS= read -r $ref <<< "contents"
 # realvariable now contains the string "contents"
 }}}
However, this only works if there are no newlines in the content. If you need to assign multiline values, keep reading.
We are not aware of any trick that can duplicate that functionality in POSIX or Bourne shells without `eval`, which can be difficult to do securely. Older versions of Bash can ''almost'' do it -- some indirect array tricks work, and others do not, and we do not know whether the syntax involved will remain stable in future releases. So, consider this a ''use at your own risk'' hack.

{{{#!highlight bash
# Bash -- trick #1. Works in bash 2 and up, and ksh93v+ (when invoked as bash)
realarray=(...) ref=realarray; index=2
tmp=${ref}[index]
echo "${!tmp}" # gives array element [2]
}}}

{{{#!highlight bash
# Bash -- trick #2. Seems to work in bash 3 and up.
# Can't be combined with special expansions until 4.3. e.g. "${!tmp##*/}"
# Does NOT work in bash 2.05b -- Expands to one word instead of three in bash 2.
tmp=${ref}[@]
printf '<%s> ' "${!tmp}"; echo # Iterate whole array as one word per element.
}}}

It is not possible to retrieve array indices directly using the Bash ''${!var}'' indirect expansion.

==== Assigning indirect/reference variables ====
Sometimes you'd like to "point" from one variable to another, for purposes of writing information to a dynamically configurable place. Typically this happens when you're trying to write a "reusable" function or library, and you want it to [[BashFAQ/084|put its output]] in a variable of the caller's choice instead of the function's choice. (Various traits of Bash make safe [[BashWeaknesses|reusability of Bash functions]] difficult at best, so this is something that should not happen ''often''.)

Assigning a value "through" a reference (I'm going to use "ref" from now on) is more widely possible, but the means of doing so are usually extremely shell-specific. All shells with the sole exception of AT&T ksh93 lack real reference variables or pointers. Indirection can '''only''' be achieved by indirectly evaluating variable names. IOW, you can never have a real unambiguous reference to an object in memory; the best you can do is use the name of a variable to try simulating the effect. Therefore, '''you must control the value of the ref''' and ensure side-effects such as globbing, user-input, and conflicting local parameters can't affect parameter names. Names must either be deterministic or validated in a way that makes certain guarantees. If an end user can populate the ref variable with arbitrary strings, the result can be unexpected code injection. We'll show an example of this at the end.

In ksh93, we can use `nameref` again:

{{{#!highlight bash
# ksh93/mksh/Bash 4.3
typeset -n ref=realvariable
ref=contents
# realvariable now contains the string "contents"
}}}

In zsh, using parameter expansions `::=` and expansion flags `P`:

{{{#!highlight bash
# zsh
ref=realvariable
: ${(P)ref::=contents}
# redefines realvariable unconditionally to the string "contents"
}}}

In Bash, if you only want to assign '''a single line''' to the variable, you can use `read` and Bash's [[HereDocument#Here_Strings|here string]] syntax:

{{{#!highlight bash
# Bash/ksh93/mksh/zsh
ref=realvariable
IFS= read -r -- "$ref" <<<"contents"
# realvariable now contains the string "contents"
}}}

If you need to assign '''multiline values''', you can use a [[HereDocument]]:

{{{#!highlight bash
# Bash
ref=realvariable
IFS= read -r -d '' -- "$ref" <<EOF
The contents
go here.
EOF
}}}
Line 165: Line 218:
 {{{
 # Bash
 aref=realarray
 read -r -a $aref <<< "words go into array elements"
 echo "${realarray[1]}" # prints "go"
 }}}
(Again, newlines in the input will break this trick. [[IFS]] is used to delimit words, so you may or may not need to set that.)

Another trick is to use Bash's {{{printf -v}}} (only available in [[BashFAQ/061|recent versions]]):
 {{{
 # Bash 3.1 or higher
 ref=realvariable
 printf -v $ref %s "contents"
 }}}

The {{{printf -v}}} trick is handy if your contents aren't a constant string, but rather, something dynamically generated. You can use all of {{{printf}}}'s formatting capabilities. This trick also permits any string content, including embedded newlines (but not NUL bytes - no force in the universe can put NUL bytes into shell strings usefully). This is the best trick to use if you're in bash 3.1 or higher.

Yet another trick is Korn shell's {{{typeset}}} or Bash's {{{declare}}}. These are roughly equivalent to each other. Both of them cause a variable to become ''locally scoped'' to a function, if used inside a function; but if used outside a function, they can operate on global variables.

 {{{
 # Korn shell (all versions):
 typeset $ref="contents"

 # Bash:
 declare $ref="contents"
 }}}

The advantage of using `typeset` or `declare` over `eval` is that the right hand side of the assignment is ''not'' parsed by the shell. If you used `eval` here, you would have to sanitize/escape the entire right hand side first. This trick also preserves the contents exactly, including newlines, so this is the best trick to use if you're in bash older than 3.1 (or ksh88) and don't need to worry about accidentally changing your variable's scope (i.e., you're not using it inside a function).

''However'', with bash, you must still careful about what is on the ''left''-hand side of the assignment. Inside square brackets, expansions are still performed; thus declare can be just as dangerous as eval:
 {{{
 # Bash:
 ref='x[$(touch evilfile; echo 0)]'
 ls -l evilfile # No such file or directory
 declare "$ref=value"
 ls -l evilfile # It exists now!
 }}}
This problem also exists with `typeset` in mksh and pdksh, but apparently not ksh93.

{{{#!highlight bash
# Bash
aref=realarray
IFS=' ' read -d '' -ra "$aref" <<<'words go into array elements'

# ksh93/mksh/zsh
aref=realarray
IFS=' ' read -d '' -rA "$aref" <<<'words go into array elements'
}}}

[[IFS]] is used to delimit words, so you may or may not need to set that. Also note that the `read` command will return failure because there is no terminating NUL byte for the `-d ''` to catch. Be prepared to ignore that failure.

Another trick is to use Bash's `printf -v`, available in [[BashFAQ/061|bash 3.1 and newer]]:
{{{#!highlight bash
# Bash 3.1 or higher. Array assignments require 4.2 or higher.
ref=realvariable
printf -v "$ref" %s "contents"
}}}

You can use all of `printf`'s formatting capabilities. This trick also permits any string content, including embedded and trailing newlines.

Yet another trick is Korn shell's `typeset` or Bash's `declare`. The details of `typeset` vary greatly between shells, but can be used in compatible ways in limited situations. Both of them cause a variable to become ''locally scoped'' to a function, if used inside a function; but if used outside all functions, they can operate on global variables.

{{{#!highlight bash
# Bash/ksh (any)/zsh
typeset -- "${ref}=contents"

# Bash
declare -- "${ref}=contents"
}}}

Bash 4.2 adds `declare -g` which assigns variables to the global scope from any context.
Line 205: Line 253:
 {{{
 # Bourne
 ref=realvariable
 read $ref <<EOF
 contents
 EOF
 }}}
(Alas, `read` means we're back to only getting at most one line of content. This is the most portable trick, but it's limited to single-line content.)

Remember that, when using a here document, if the sentinel word ({{{EOF}}} in our example) is unquoted, then parameter expansions will be performed inside the body. If the sentinel is quoted, then parameter expansions are not performed. Use whichever is more convenient for your task.

Finally, some people just ''cannot'' resist throwing `eval` into the picture:

 {{{
 # Bourne
 ref=myVar
 eval "$ref=\$value"
 }}}
{{{#!highlight bash
# Bourne
ref=realvariable
IFS= read -r -- "$ref" <<'EOF'
contents
EOF
}}}

Alas, `read` without `-d` means we're back to only getting at most one line of content. This is the most portable trick, but it's limited to single-line content.

Remember that when using a here document, if the sentinel word (`EOF` in our example) is unquoted, then parameter expansions will be performed inside the body. If the sentinel is quoted, then parameter expansions are not performed. Use whichever is more convenient for your task.

===== Security Concerns =====
Some people mistakenly believe that `typeset` or `declare` is "safer" than `eval`. It turns out, they're just as dangerous. Possibly even more so, because people ''think'' they're safe. An `eval` merits an immediate scrutiny, but `declare` is often overlooked.

Another drawback of `typeset` or `declare` is that they always affect the scope of the assigned variable. `eval` leaves the scope untouched.

With any indirect assignment, you must be careful about what you're assigning to. Inside square brackets, expansions are still performed; thus, with a tainted ref, `declare` or `printf -v` can be just as dangerous as `eval`:

{{{#!highlight bash
# Bash:
ref='x[$(touch evilfile)0]'
ls -l evilfile # No such file or directory

declare "${ref}=value"
ls -l evilfile # It exists now!

rm evilfile # Now it's gone.

printf -v "$ref" %s "value"
ls -l evilfile # It came back!
}}}

This problem also exists with `typeset` in mksh and pdksh, but apparently not ksh93. This is why the value of `ref` must be under ''your'' control at all times.

===== eval =====

{{{#!highlight bash
# Bourne
ref=myVar
eval "${ref}=\$value"
}}}
Line 226: Line 298:
 {{{
 myVar=$value
 }}}

The right-hand side is not parsed by the shell, so there is no danger of unwanted side effects. The drawback, here, is that every single shell metacharacter on the right hand side of the `=` must be escaped carefully. In the example shown here, there was only one. In a more complex situation, there could be dozens.

The good news is that if you can sanitize the right hand side correctly, this trick is fully portable, has no variable scope issues, and allows all content including newlines. The bad news is that if you fail to sanitize the right hand side correctly, you have a massive security hole. Use `eval` at your own risk.
{{{#!highlight bash
myVar=$value
}}}

The right-hand side is not parsed by the shell, so there is no danger of unwanted side effects. The drawback, here, is that every single shell metacharacter on the right hand side of the `=` must be quoted/escaped carefully. In the example shown here, there was only one. In a more complex situation, there could be dozens.

This is very often done incorrectly. Permutations like these are seen frequently all over the web even from experienced users that ought to know better:
{{{#!highlight bash
eval ${ref}=\"$value\" # WRONG!
eval "$ref='$value'" # WRONG!
eval "${ref}=\$value" # Correct (curly braced PE used for clarity)
eval "$ref"'=$value' # Correct (equivalent)
}}}

The good news is that if you can sanitize the right hand side correctly, this trick is fully portable, has no variable scope issues, and allows all content including newlines. The bad news is that if you fail to sanitize the right hand side correctly, you have a massive security hole. Use `eval` if you know what you're doing and are very careful.

The following code demonstrates how to correctly pass a scalar variable name into a function by reference for the purpose of "returning" a value:

{{{#!highlight bash
# POSIX

f() {
    # Check that the referenced variable name is not empty, and
    # is a valid variable name.
    if [ "$#" != 1] || [ -z "$1" ]; then
        echo >&2 "usage: f varname"
        return 1
    fi

    if printf '%s\n' "$1" | LC_ALL=C grep -v -q '^[A-Za-z_][A-Za-z0-9_]*$' ||
       [ "$(printf '%s\n' "$1" | wc -l)" != 1 ]; then
        echo >&2 "f: invalid varname argument"
        return 2
    fi

    # Code goes here that eventually sets the variable "x".
    # In shells with local variables, x should be local.
    # x contains the value we'd like to return to the caller.
    x=foo

    # Return the value into the caller's variable.
    eval "${1}=\$x"
}
}}}

=== See Also ===
  * [[ https://web.archive.org/web/20230402053523/https://wiki.bash-hackers.org/syntax/arrays#indirection | More advanced indirection on arrays]]
  * [[https://gist.github.com/ormaaj/5682807 | Bash vs Mksh vs ksh93 namerefs]]
  * [[BashFAQ/005]]
  * [[BashGuide/Arrays]]
  * [[BashSheet#Arrays|BashSheet Array reference]]

How can I use variable variables (indirect variables, pointers, references) or associative arrays?

This is a complex page, because it's a complex topic. It's been divided into roughly four parts: associative arrays, name references, evaluating indirect variables, and assigning indirect variables. There are discussions of programming issues and concepts scattered throughout.

Associative Arrays

We introduce associative arrays first, because we observe that inexperienced programmers often conjure arcane solutions to problems that would be solved more cleanly with associative arrays.

An associative array is an unordered collection of key-value pairs. A value may be retrieved by supplying its corresponding key. Since strings are the only datatype most shells understand, associative arrays map strings to strings, unlike indexed arrays, which map integers to strings. Associative arrays exist in AWK as "associative arrays", in Perl as "hashes", in Tcl as "arrays", in Python and C# as "dictionaries", in Java as a "Map", and in C++11 STL as std::unordered_map.

   1 # Bash 4 / ksh93
   2 
   3 typeset -A homedir    # Declare associative array
   4 homedir=(             # Compound assignment
   5     [jim]=/home/jim
   6     [silvia]=/u/silvia
   7     [alex]=/net/home/alex
   8 )
   9 
  10 homedir[ormaaj]=/home/ormaaj # Ordinary assignment adds another single element
  11 
  12 for user in "${!homedir[@]}"; do   # Enumerate all indices (user names)
  13     printf 'Home directory of user %q is: %q\n' "$user" "${homedir[$user]}"
  14 done

Prior to Bash 4 or if you can't use ksh93, your options are limited. Either move to another interpreter (awk, perl, python, ruby, tcl, ...) or re-evaluate your problem to simplify it. There are certain tasks for which associative arrays are a powerful and completely appropriate tool. There are others for which they are overkill, or simply unsuitable.

Suppose we have several remote hosts with slightly different configuration, and that we want to ssh to each one and run slightly different commands. One way we could set it up would be to hard-code a bunch of ssh commands in per-hostname functions in a single script and just run them in series or in parallel. (Don't reject this out of hand! Simple is good.) Another way would be to store each group of commands as an element of an associative array keyed by the hostname:

   1 declare -A commands
   2 commands=(
   3   [host1]="mvn clean install && cd webapp && mvn jetty:run"
   4   [host2]="..."
   5 )
   6 
   7 for host in "${!commands[@]}"; do
   8     ssh -- "$host" "${commands[$host]}"
   9 done

This solution works, because we're encoding a very short shell script in a string, and storing it as an array element. When we call ssh, it passes the string directly to the remote host, where a shell evaluates it and executes it. But what if the scripts were much longer, or more complicated?

If we want to get fancy and store each sub-command (cd webapp for example) as an element of a list, and then have each hostname map to a list of sub-commands, we'd quickly find that we can't do that in a shell. That's the kind of approach we'd expect in a high-level language, where we can store hierarchical information in advanced data structures. We want each element of the associative array to be a list or another array of command strings. But the shell simply doesn't offer that kind of data structure.

So, often it pays to step back and think in terms of shells rather than other programming languages. Aren't we just running a script on a remote host? Then why don't we just store the configuration sets as scripts? Then it's simple:

   1 # A series of conf files named for the hosts we need to run our commands on:
   2 for conf in /etc/myapp/*; do
   3     host=${conf##*/}
   4     ssh -- "$host" bash < "$conf"
   5 done
   6 
   7 # /etc/myapp/hostname is just a script:
   8 mvn clean install &&
   9 cd ./webapp &&
  10 mvn jetty:run

Now we've removed the need for associative arrays, and also the need to maintain a bunch of extremely horrible quoting issues. It is also easy to parallelize using GNU Parallel:

   1 parallel ssh -- {/} bash "<" {} ::: /etc/myapp/*

Associative array hacks in older shells

Before you think of using eval to mimic associative arrays in an older shell (probably by creating a set of variable names like homedir_alex), try to think of a simpler or completely different approach that you could use instead. If this hack still seems to be the best thing to do, consider the following disadvantages:

  1. It's really hard to read, to keep track of, and to maintain.
  2. The variable names must be a single line and match the RegularExpression ^[a-zA-Z_][a-zA-Z_0-9]*$ -- i.e., a variable name cannot contain arbitrary characters but only letters, digits, and underscores. We cannot have a variable's name contain Unix usernames, for instance -- consider a user named hong-hu. A dash '-' cannot be part of a variable name, so the entire attempt to make a variable named homedir_hong-hu is doomed from the start.

  3. Quoting is hard to get right. If a content string (not a variable name) can contain whitespace characters and quotes, it's hard to quote it right to preserve it through both shell parsings. And that's just for constants, known at the time you write the program. (Bash's printf %q helps, but nothing analogous is available in POSIX shells.)

  4. If the program handles unsanitized user input, it can be VERY dangerous!

Read BashGuide/Arrays or BashFAQ/005 for a more in-depth description and examples of how to use arrays in Bash.

If you need an associative array but your shell doesn't support them, please consider using AWK instead.

Name References

ksh93 introduced name references, which are variables that work like symbolic links. The content of a nameref variable is the name of a second variable. Assignments, expansions, and other operations on a nameref variable are redirected to the variable they "point to". nameref variables were subsequently ported to mksh, bash 4.3, and zsh versions > 5.9.

Currently, the bash, mksh, and zsh implementations do not support the full power of the ksh93 nameref system.

  • The zsh implementation is currently the most complete in comparison with ksh93, supporting most features including reference parameters to a limited degree, which have been adapted to support its dynamic scoping system. Variables passed to functions by reference in zsh work correctly in most cases. In certain complex cases the implementation is imperfect and several details aren't quite correct yet. These are probably fixable and could be addressed in a future release. It is also the newest implementation, currently unreleased.
  • The bash implementation is less complete. reference parameters are completely unsupported as of 5.3, thus the name that's pointed to by a variable can only be resolved using the same dynamic scoping rules as any other variable name. For example, if you have declare -n ref=$1 and try to use $ref, the shell will look for whatever variable of that name is visible by the current function. There is no way to override this. You cannot say "this nameref points to the variable foo in the caller's scope" unambiguously.

  • The mksh implementation shares all of bash's limitations and additionally lacks support for the special for loop iteration feature over nameref variables. Namerefs are an mksh extension - oksh and other pdksh derivs have no nameref support.

   1 # ksh93 / Bash >= 4.3 / zsh > 5.9 / mksh (no printf builtin)
   2 realvariable=contents
   3 typeset -n ref=realvariable
   4 printf '%s=%q\n' "${!ref}" "$ref"      # print the name and contents of the real variable

As long as you avoid namespace collisions, namerefs can be extremely useful. They give the "indirection" that many people are looking for:

   1 arr1=(first array)
   2 arr2=(second array)
   3 declare -n ref
   4 if [[ $someoption ]]; then
   5     ref=arr2
   6 else
   7     ref=arr1
   8 fi
   9 for i in "${ref[@]}"; do ...; done

Indirection

In this section, we discuss various other tricks available in some older shells where namerefs aren't available.

Think before using indirection

Putting variable names or any other bash syntax inside parameters is frequently done incorrectly and in inappropriate situations to solve problems that have better solutions. It violates the separation between code and data, and as such puts you on a slippery slope toward bugs and security issues. Indirection can make your code less transparent and harder to follow.

Normally, in bash scripting, you won't need indirect references at all. Generally, people look at this for a solution when they don't understand or know about Bash Arrays (indexed or associative) or haven't fully considered other Bash features such as functions.

Evaluating indirect/reference variables

BASH allows for expanding parameters indirectly -- that is, one variable may contain the name of another variable. Name reference variables are the preferred method for performing variable indirection. Older versions of Bash could also use a ! prefix operator in parameter expansions for variable indirection. Namerefs should be used unless portability to older bash versions is required. No other shell uses ${!variable} for indirection and there are problems relating to use of that syntax for this purpose. It is also less flexible.

   1 # Bash
   2 realvariable=contents
   3 ref=realvariable
   4 printf '%s\n' "${!ref}"   # prints the contents of the real variable

Zsh allows you to access a parameter indirectly with the parameter expansion flag P:

   1 # zsh
   2 realvariable=contents
   3 ref=realvariable
   4 echo ${(P)ref}   # prints the contents of the real variable

zsh's ability to nest parameter expansions allow for referencing arrays too:

   1 # zsh
   2 myfunc() {
   3  local ref=$1
   4  echo "array $1 has ${#${(@P)ref}} elements"
   5 }
   6 realarray=(...)
   7 myfunc realarray

Unfortunately, for shells other than Bash, ksh93, and zsh there is no syntax for evaluating a referenced variable. You would have to use eval, which means you would have to undergo extreme measures to sanitize your data to avoid catastrophe.

It's difficult to imagine a practical use for this that wouldn't be just as easily performed by using an associative array. But people ask it all the time (it is genuinely a frequently asked question).

We are not aware of any trick that can duplicate that functionality in POSIX or Bourne shells without eval, which can be difficult to do securely. Older versions of Bash can almost do it -- some indirect array tricks work, and others do not, and we do not know whether the syntax involved will remain stable in future releases. So, consider this a use at your own risk hack.

   1 # Bash -- trick #1.  Works in bash 2 and up, and ksh93v+ (when invoked as bash)
   2 realarray=(...) ref=realarray; index=2
   3 tmp=${ref}[index]
   4 echo "${!tmp}"            # gives array element [2]

   1 # Bash -- trick #2.  Seems to work in bash 3 and up.
   2 # Can't be combined with special expansions until 4.3. e.g. "${!tmp##*/}"
   3 # Does NOT work in bash 2.05b -- Expands to one word instead of three in bash 2.
   4 tmp=${ref}[@]
   5 printf '<%s> ' "${!tmp}"; echo    # Iterate whole array as one word per element.

It is not possible to retrieve array indices directly using the Bash ${!var} indirect expansion.

Assigning indirect/reference variables

Sometimes you'd like to "point" from one variable to another, for purposes of writing information to a dynamically configurable place. Typically this happens when you're trying to write a "reusable" function or library, and you want it to put its output in a variable of the caller's choice instead of the function's choice. (Various traits of Bash make safe reusability of Bash functions difficult at best, so this is something that should not happen often.)

Assigning a value "through" a reference (I'm going to use "ref" from now on) is more widely possible, but the means of doing so are usually extremely shell-specific. All shells with the sole exception of AT&T ksh93 lack real reference variables or pointers. Indirection can only be achieved by indirectly evaluating variable names. IOW, you can never have a real unambiguous reference to an object in memory; the best you can do is use the name of a variable to try simulating the effect. Therefore, you must control the value of the ref and ensure side-effects such as globbing, user-input, and conflicting local parameters can't affect parameter names. Names must either be deterministic or validated in a way that makes certain guarantees. If an end user can populate the ref variable with arbitrary strings, the result can be unexpected code injection. We'll show an example of this at the end.

In ksh93, we can use nameref again:

   1 # ksh93/mksh/Bash 4.3
   2 typeset -n ref=realvariable
   3 ref=contents
   4 # realvariable now contains the string "contents"

In zsh, using parameter expansions ::= and expansion flags P:

   1 # zsh
   2 ref=realvariable
   3 : ${(P)ref::=contents}
   4 # redefines realvariable unconditionally to the string "contents"

In Bash, if you only want to assign a single line to the variable, you can use read and Bash's here string syntax:

   1 # Bash/ksh93/mksh/zsh
   2 ref=realvariable
   3 IFS= read -r -- "$ref" <<<"contents"
   4 # realvariable now contains the string "contents"

If you need to assign multiline values, you can use a HereDocument:

   1 # Bash
   2 ref=realvariable
   3 IFS= read -r -d '' -- "$ref" <<EOF
   4 The contents
   5 go here.
   6 EOF

A similar trick works for Bash array variables too:

   1 # Bash
   2 aref=realarray
   3 IFS=' ' read -d '' -ra "$aref" <<<'words go into array elements'
   4 
   5 # ksh93/mksh/zsh
   6 aref=realarray
   7 IFS=' ' read -d '' -rA "$aref" <<<'words go into array elements'

IFS is used to delimit words, so you may or may not need to set that. Also note that the read command will return failure because there is no terminating NUL byte for the -d '' to catch. Be prepared to ignore that failure.

Another trick is to use Bash's printf -v, available in bash 3.1 and newer:

   1 # Bash 3.1 or higher. Array assignments require 4.2 or higher.
   2 ref=realvariable
   3 printf -v "$ref" %s "contents"

You can use all of printf's formatting capabilities. This trick also permits any string content, including embedded and trailing newlines.

Yet another trick is Korn shell's typeset or Bash's declare. The details of typeset vary greatly between shells, but can be used in compatible ways in limited situations. Both of them cause a variable to become locally scoped to a function, if used inside a function; but if used outside all functions, they can operate on global variables.

   1 # Bash/ksh (any)/zsh
   2 typeset -- "${ref}=contents"
   3 
   4 # Bash
   5 declare -- "${ref}=contents"

Bash 4.2 adds declare -g which assigns variables to the global scope from any context.

If you aren't using Bash or Korn shell, you can do assignments to referenced variables using HereDocument syntax:

   1 # Bourne
   2 ref=realvariable
   3 IFS= read -r -- "$ref" <<'EOF'
   4 contents
   5 EOF

Alas, read without -d means we're back to only getting at most one line of content. This is the most portable trick, but it's limited to single-line content.

Remember that when using a here document, if the sentinel word (EOF in our example) is unquoted, then parameter expansions will be performed inside the body. If the sentinel is quoted, then parameter expansions are not performed. Use whichever is more convenient for your task.

Security Concerns

Some people mistakenly believe that typeset or declare is "safer" than eval. It turns out, they're just as dangerous. Possibly even more so, because people think they're safe. An eval merits an immediate scrutiny, but declare is often overlooked.

Another drawback of typeset or declare is that they always affect the scope of the assigned variable. eval leaves the scope untouched.

With any indirect assignment, you must be careful about what you're assigning to. Inside square brackets, expansions are still performed; thus, with a tainted ref, declare or printf -v can be just as dangerous as eval:

   1 # Bash:
   2 ref='x[$(touch evilfile)0]'
   3 ls -l evilfile   # No such file or directory
   4 
   5 declare "${ref}=value"
   6 ls -l evilfile   # It exists now!
   7 
   8 rm evilfile # Now it's gone.
   9 
  10 printf -v "$ref" %s "value"
  11 ls -l evilfile   # It came back!

This problem also exists with typeset in mksh and pdksh, but apparently not ksh93. This is why the value of ref must be under your control at all times.

eval

   1 # Bourne
   2 ref=myVar
   3 eval "${ref}=\$value"

This expands to the statement that is executed:

   1 myVar=$value

The right-hand side is not parsed by the shell, so there is no danger of unwanted side effects. The drawback, here, is that every single shell metacharacter on the right hand side of the = must be quoted/escaped carefully. In the example shown here, there was only one. In a more complex situation, there could be dozens.

This is very often done incorrectly. Permutations like these are seen frequently all over the web even from experienced users that ought to know better:

   1 eval ${ref}=\"$value\" # WRONG!
   2 eval "$ref='$value'"   # WRONG!
   3 eval "${ref}=\$value"  # Correct (curly braced PE used for clarity)
   4 eval "$ref"'=$value'   # Correct (equivalent)

The good news is that if you can sanitize the right hand side correctly, this trick is fully portable, has no variable scope issues, and allows all content including newlines. The bad news is that if you fail to sanitize the right hand side correctly, you have a massive security hole. Use eval if you know what you're doing and are very careful.

The following code demonstrates how to correctly pass a scalar variable name into a function by reference for the purpose of "returning" a value:

   1 # POSIX
   2 
   3 f() {
   4     # Check that the referenced variable name is not empty, and
   5     # is a valid variable name.
   6     if [ "$#" != 1] || [ -z "$1" ]; then
   7         echo >&2 "usage: f varname"
   8         return 1
   9     fi
  10 
  11     if printf '%s\n' "$1" | LC_ALL=C grep -v -q '^[A-Za-z_][A-Za-z0-9_]*$' ||
  12        [ "$(printf '%s\n' "$1" | wc -l)" != 1 ]; then
  13         echo >&2 "f: invalid varname argument"
  14         return 2
  15     fi
  16 
  17     # Code goes here that eventually sets the variable "x".
  18     # In shells with local variables, x should be local.
  19     # x contains the value we'd like to return to the caller.
  20     x=foo
  21 
  22     # Return the value into the caller's variable.
  23     eval "${1}=\$x"
  24 }

See Also


CategoryShell

BashFAQ/006 (last edited 2025-03-22 09:32:45 by ormaaj)