4066
Comment: remove all eval examples. give better examples.
|
21857
typo fix
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
[[Anchor(faq6)]] | <<Anchor(faq6)>> |
Line 3: | Line 3: |
Before starting to use dynamically created variables, think again of a simpler approach. If it still seems to be the best thing to do, have a look at the following disadvantages: 1. It's hard to read and to maintain. 1. The variable names must match the regular expression {{{^[a-zA-Z_][a-zA-Z_0-9]*}}} -- i.e., a variable name cannot contain arbitrary characters but only letters, digits, and underscores. We cannot have a variable's name contain Unix usernames; consider a user named {{{hong-hu}}}. A dash '-' cannot be a valid part of a variable name. 1. Quoting is hard to get right. If content strings (not variable name) can contain whitespace characters and quotes, it's hard to quote it right to preserve it. 1. If the program handles unsanitized user input, it can be [#faq48 VERY dangerous]! Bash (but not Korn shell, POSIX or Bourne shell) allows you to expand a parameter ''indirectly'' -- that is, one variable may contain the name of another variable: {{{ realvariable=contents ref=realvariable echo "${!ref}" # prints the contents of the real variable}}} This works for evaluating, but not for assigning a value. In order to assign a value "through" a reference (or pointer, or indirect variable, or whatever you want to call it -- I'm going to use "ref" from now on), you have to resort to tricks. One such trick is to use {{{read}}} and Bash's ''here string'' syntax: {{{ ref=realvariable read $ref <<< "contents" # realvariable now contains the string "contents"}}} This works equally well with Bash array variables too: {{{ aref=realarray read -a $aref <<< "words go into array elements" echo "${realarray[1]}" # prints "go"}}} Another is to use Bash's {{{printf -v}}} (only available in [#faq61 recent versions]): {{{ ref=realvariable printf -v $ref "contents"}}} The {{{printf -v}}} trick is handy if your contents aren't a constant string, but rather, something dynamically generated. You can use all of {{{printf}}}'s formatting capabilities. Yet another is Korn shell's {{{typeset}}} or Bash's {{{declare}}}. These are roughly equivalent to each other. Both of them cause a variable to become ''locally scoped'' to a function, if used inside a function; but if used outside a function, they can substitute for {{{read}}} in this case: {{{ # Korn shell: typeset $ref="contents" # Bash: declare $ref="contents"}}} If you aren't using Bash or Korn shell, you can still do assignments to referenced variables using ''here document'' syntax: {{{ # Portable code. ref=realvariable read $ref <<EOF contents EOF}}} Remember that, when using a here document, if the sentinel word ({{{EOF}}} in our example) is unquoted, then parameter expansions will be performed inside the body. If the sentinel is quoted, then parameter expansions are not performed. Use whichever is more convenient for your task. Unfortunately, for shells other than Bash, there is no syntax for ''evaluating'' a referenced variable. You would have to use [#faq48 eval], which means you would have to undergo extreme measures to sanitize your data to avoid catastrophe. Sometimes it's convenient to have associative arrays, arrays indexed by a string. Awk has associative arrays. Perl calls them "hashes", while Tcl simply calls them "arrays". KornShell93 supports this kind of array: {{{ # KornShell93 script - does not work with BASH typeset -A homedir # Declare KornShell93 associative array homedir[jim]=/home/jim homedir[silvia]=/home/silvia homedir[alex]=/home/alex for user in ${!homedir[@]} # Enumerate all indices (user names) do echo "Home directory of user $user is ${homedir[$user]}" done}}} BASH (including version 3.x) does not support them, unfortunately. Either use [#faq48 eval] after sanitizing your data, or switch to awk, perl, ksh93, tcl, etc. |
This is a complex page, because it's a complex topic. It's been divided into roughly three parts: associative arrays, evaluating indirect variables, and assigning indirect variables. There are discussions of programming issues and concepts scattered throughout. === TODO / Note === For readers, the '''important takeaway''' is: 99% of the time, indirection is used on function parameters to compensate for POSIX shells having badly designed functions that can't return useful data except through indirection. You '''should not''' use indirection as a substitute for arrays (associative or indexed, if available, see the first section below). You '''should sometimes''' use indirection to pass data in and out of functions when you cannot use the parameters and an I/O stream with a subshell to do so (see second section, but few examples apply to this situation). Most other uses of indirection are incorrect or unnecessary. Most of this page was written prior to Bash 4.3. Namerefs (`typeset/declare/local -n`) may significantly change the considerations for indirection. Bash's implementation is briefly described below (more info in [[BashFAQ/048#The_problem_with_bash.27s_name_references|FAQ 48]]), but it differs from that of the much earlier ksh93 variant in some significant ways. One must now consider whether to prefer backwards compatibility with `${!var}` or portability with `typeset -n`. `declare` and `local` are completely non-portable with this option and were only introduced by Bash in version 4.3.0 (released 2014) and ksh93v. The bash implementation is actually more similar to mksh's implementation, which isn't yet discussed but also predates that of bash. Bash also doesn't define the default `nameref` alias used by countless existing scripts in the wild. Recommending a best practice that applies to everyone is difficult at this time. This page doesn't have much discussion on dynamic scope or weigh the differences between explicit passing of variable names vs implicit assignment to a calling function's local variable. It also prioritizes methods like exploiting the fact that `read`, `printf`, and (sometimes) `declare` can evaluate variable names, rather than using the more straightforward predictable `eval` approach (explained at the very end). Future POSIX standards may still throw another wrench into the works as standardizing locals (and their scope behavior, which is tightly coupled with nameref behavior) is still discussed on the lists from time to time. One possibility is to standardize the ksh93 static scope for `function name` functions while defining dynamic scope (as used by ksh88) for POSIX `name()` type functions, but this has only been proposed informally and there hasn't been much discussion on how namerefs would be affected. [[http://thread.gmane.org/gmane.comp.standards.posix.austin.general/8371/focus=8377|discussion]] Overhauling this page will take some time and work. -- 2014/05/20 <<TableOfContents>> === Associative Arrays === We introduce associative arrays first, because in the majority of cases where people are trying to use indirect variable assignments/evaluations, they ought to be using associative arrays instead. For instance, we frequently see people asking how they can have a bunch of related variables like `IPaddr_hostname1`, `IPaddr_hostname2` and so on. A more appropriate way to store this data is in an associative array named `IPaddr` which is indexed by the hostname. An [[https://en.wikipedia.org/wiki/Associative_array|associative array]] stores an unordered collection of objects addressed by keys. An object in the collection can be looked up and retrieved by supplying its corresponding key. Since strings are the only real datatype most shells understand, associative arrays map strings to strings, unlike indexed arrays, which map integers to strings and implicitly evaluate the index in a math context (associative arrays do not). Associative arrays exist in AWK as "associative arrays", in Perl as "hashes", in Tcl as "arrays", in Python and C# as "dictionaries", and in Java as a "Map", and in C++11 STL as `std::unordered_map`. {{{#!highlight bash # Bash 4 / ksh93 typeset -A homedir # Declare associative array homedir=( # Compound assignment [jim]=/home/jim [silvia]=/home/silvia [alex]=/home/alex ) homedir[ormaaj]=/home/ormaaj # Ordinary assignment adds another single element for user in "${!homedir[@]}"; do # Enumerate all indices (user names) printf 'Home directory of user %s is: %q\n' "$user" "${homedir[$user]}" done }}} Prior to Bash 4 or if you can't use ksh93, your options are limited. Either move to another interpreter (awk, perl, python, ruby, tcl, ...) or re-evaluate your problem to ''simplify it''. There are certain tasks for which associative arrays are a powerful and completely appropriate tool. There are others for which they are overkill, or simply unsuitable. Suppose we have several subservient hosts with slightly different configuration, and that we want to ssh to each one and run slightly different commands. One way we could set it up would be to hard-code a bunch of ssh commands in per-hostname functions in a single script and just run them in series or in parallel. (Don't reject this out of hand! Simple is good.) Another way would be to store each group of commands as an element of an associative array keyed by the hostname: {{{#!highlight bash declare -A commands commands=( [host1]="mvn clean install && cd webapp && mvn jetty:run" [host2]="..." ) for host in "${!commands[@]}"; do ssh "$host" "${commands[$host]}" done }}} This is the kind of approach we'd expect in a high-level language, where we can store hierarchical information in advanced data structures. The difficulty here is that we really want each element of the associative array to be a ''list'' or ''another array'' of command strings. But the shell simply doesn't permit that kind of data structure. So, often it pays to step back and ''think in terms of shells'' rather than other programming languages. Aren't we just running a script on a remote host? Then why don't we just store the configuration sets ''as scripts''? Then it's simple: {{{#!highlight bash # A series of conf files named for the hosts we need to run our commands on: for conf in /etc/myapp/*; do host=${conf##*/} ssh "$host" bash < "$conf" done # /etc/myapp/hostname is just a script: mvn clean install && cd webapp && mvn jetty:run }}} Now we've removed the need for associative arrays, and also the need to maintain a bunch of extremely horrible quoting issues. It is also easy to parallelize using GNU Parallel: {{{#!highlight bash parallel ssh {/} bash "<" {} ::: /etc/myapp/* }}} ==== Associative array hacks in older shells ==== Before you think of using `eval` to mimic associative arrays in an older shell (probably by creating a set of variable names like `homedir_alex`), try to think of a simpler or completely different approach that you could use instead. If this hack still seems to be the best thing to do, consider the following disadvantages: 1. It's really hard to read, to keep track of, and to maintain. 2. The variable names must be a single line and match the RegularExpression {{{^[a-zA-Z_][a-zA-Z_0-9]*$}}} -- i.e., a variable name cannot contain arbitrary characters but only letters, digits, and underscores. We cannot have a variable's name contain Unix usernames, for instance -- consider a user named {{{hong-hu}}}. A dash '-' cannot be part of a variable name, so the entire attempt to make a variable named `homedir_hong-hu` is doomed from the start. 3. Quoting is hard to get right. If a content string (not a variable name) can contain whitespace characters and quotes, it's hard to quote it right to preserve it through both shell parsings. And that's just for ''constants'', known at the time you write the program. (Bash's `printf %q` helps, but nothing analogous is available in POSIX shells.) 4. If the program handles unsanitized user input, it can be [[BashFAQ/048|VERY dangerous]]! Read [[BashGuide/Arrays]] or [[BashFAQ/005]] for a more in-depth description and examples of how to use arrays in Bash. If you ''need'' an associative array but your shell doesn't support them, please consider using AWK instead. === Indirection === ==== Think before using indirection ==== Putting variable names or any other [[BashFAQ/050|bash syntax inside parameters]] is frequently done incorrectly and in inappropriate situations to solve problems that have better solutions. It violates the separation between code and data, and as such puts you on a slippery slope toward bugs and security issues. Indirection can make your code less transparent and harder to follow. Normally, in bash scripting, you won't need indirect references at all. Generally, people look at this for a solution when they don't understand or know about [[BashGuide/Arrays|Bash Arrays]] or haven't fully considered other Bash features such as functions. ==== Evaluating indirect/reference variables ==== [[BASH]] allows you to expand a parameter ''indirectly'' -- that is, one variable may contain the name of another variable: {{{#!highlight bash # Bash realvariable=contents ref=realvariable echo "${!ref}" # prints the contents of the real variable }}} KornShell (ksh93) has a completely different, more powerful syntax -- the `nameref` command (also known as `typeset -n`): {{{#!highlight bash # ksh93 / mksh / Bash 4.3 realvariable=contents typeset -n ref=realvariable echo "${!ref} = $ref" # prints the name and contents of the real variable }}} Zsh allows you to access a parameter indirectly with the parameter expansion flag `P`: {{{#!highlight bash # zsh realvariable=contents ref=realvariable echo ${(P)ref} # prints the contents of the real variable }}} Unfortunately, for shells other than Bash, ksh93, and zsh there is no syntax for ''evaluating'' a referenced variable. You would have to use [[BashFAQ/048|eval]], which means you would have to undergo extreme measures to sanitize your data to avoid catastrophe. It's difficult to imagine a practical use for this that wouldn't be just as easily performed by using an associative array. But people ask it all the time (it is genuinely a ''frequently'' asked question). ksh93's `nameref` allows us to work with references to [[BashFAQ/005|arrays]], as well as regular scalar variables. For example, {{{#!highlight bash # ksh93 function myfunc { nameref ref=$1 echo "array $1 has ${#ref[*]} elements." } realarray=(...) myfunc realarray }}} zsh's ability to nest parameter expansions allow for referencing [[BashFAQ/005|arrays]] too: {{{#!highlight bash # zsh myfunc() { local ref=$1 echo "array $1 has ${#${(@P)ref}} elements" } realarray=(...) myfunc realarray }}} We are not aware of any trick that can duplicate that functionality in POSIX or Bourne shells without [[BashFAQ/048|eval]], which can be difficult to do securely. Older versions of Bash can ''almost'' do it -- some indirect array tricks work, and others do not, and we do not know whether the syntax involved will remain stable in future releases. So, consider this a ''use at your own risk'' hack. {{{#!highlight bash # Bash -- trick #1. Works in bash 2 and up, and ksh93v+ (when invoked as bash) realarray=(...) ref=realarray; index=2 tmp=${ref}[index] echo "${!tmp}" # gives array element [2] }}} {{{#!highlight bash # Bash -- trick #2. Seems to work in bash 3 and up. # Can't be combined with special expansions until 4.3. e.g. "${!tmp##*/}" # Does NOT work in bash 2.05b -- Expands to one word instead of three in bash 2. tmp=${ref}[@] printf "<%s> " "${!tmp}"; echo # Iterate whole array as one word per element. }}} It is not possible to retrieve array indices directly using the Bash ''${!var}'' indirect expansion. ==== Assigning indirect/reference variables ==== Sometimes you'd like to "point" from one variable to another, for purposes of writing information to a dynamically configurable place. Typically this happens when you're trying to write a "reusable" function or library, and you want it to [[BashFAQ/084|put its output]] in a variable of the caller's choice instead of the function's choice. (Various traits of Bash make safe [[BashWeaknesses|reusability of Bash functions]] difficult at best, so this is something that should not happen ''often''.) Assigning a value "through" a reference (I'm going to use "ref" from now on) is more widely possible, but the means of doing so are usually extremely shell-specific. All shells with the sole exception of AT&T ksh93 lack real reference variables or pointers. Indirection can '''only''' be achieved by indirectly evaluating variable names. IOW, you can never have a real unambiguous reference to an object in memory, the best you can do is use the name of a variable to try simulating the effect. Therefore, '''you must control the value of the ref''' and ensure side-effects such as globbing, user-input, and conflicting local parameters can't affect parameter names. Names must either be deterministic or validated in a way that makes certain guarantees. If an end user can populate the ref variable with arbitrary strings, the result can be unexpected code injection. We'll show an example of this at the end. In ksh93, we can use `nameref` again: {{{#!highlight bash # ksh93/mksh/Bash 4.3 typeset -n ref=realvariable ref=contents # realvariable now contains the string "contents" }}} In zsh, using parameter expansions `::=` and expansion flags `P`: {{{#!highlight bash # zsh ref=realvariable : ${(P)ref::=contents} # redefines realvariable unconditionally to the string "contents" }}} In Bash, if you only want to assign '''a single line''' to the variable, you can use `read` and Bash's [[HereDocument|here string]] syntax: {{{#!highlight bash # Bash/ksh93/mksh/zsh ref=realvariable IFS= read -r "$ref" <<<"contents" # realvariable now contains the string "contents" }}} If you need to assign '''multiline values''', keep reading. A similar trick works for Bash array variables too: {{{#!highlight bash # Bash aref=realarray IFS=' ' read -d '' -ra "$aref" <<<'words go into array elements' # ksh93/mksh/zsh aref=realarray IFS=' ' read -d '' -rA "$aref" <<<'words go into array elements' }}} [[IFS]] is used to delimit words, so you may or may not need to set that. Also note that the `read` command will return failure because there is no terminating NUL byte for the `-d ''` to catch. Be prepared to ignore that failure. Another trick is to use Bash's {{{printf -v}}} (only available in [[BashFAQ/061|recent versions]]): {{{#!highlight bash # Bash 3.1 or higher ONLY. Array assignments require 4.2 or higher. ref=realvariable printf -v "$ref" %s "contents" }}} You can use all of {{{printf}}}'s formatting capabilities. This trick also permits any string content, including embedded and trailing newlines. Yet another trick is Korn shell's {{{typeset}}} or Bash's {{{declare}}}. The details of `typeset` vary greatly between shells, but can be used in compatible ways in limited situations. Both of them cause a variable to become ''locally scoped'' to a function, if used inside a function; but if used outside all functions, they can operate on global variables. {{{#!highlight bash # Bash/ksh (any)/zsh typeset "${ref}=contents" # Bash declare "${ref}=contents" }}} Bash 4.2 adds `declare -g` which assigns variables to the global scope from any context. There is very little advantage to `typeset` or `declare` over `eval` for scalar assignments, but many drawbacks. `typeset` cannot be made to not affect the scope of the assigned variable. This trick does preserve the exact contents, like `eval`, if correctly escaped. You must still be careful about what is on the ''left''-hand side of the assignment. Inside square brackets, expansions are still performed; thus, with a tainted ref, `declare` can be just as dangerous as `eval`: {{{#!highlight bash # Bash: ref='x[$(touch evilfile)0]' ls -l evilfile # No such file or directory declare "${ref}=value" ls -l evilfile # It exists now! }}} This problem also exists with `typeset` in mksh and pdksh, but apparently not ksh93. This is why the value of `ref` must be under ''your'' control at all times. If you aren't using Bash or Korn shell, you can do assignments to referenced variables using HereDocument syntax: {{{#!highlight bash # Bourne ref=realvariable IFS= read -r "$ref" <<EOF contents EOF }}} (Alas, `read` without `-d` means we're back to only getting at most one line of content. This is the most portable trick, but it's limited to single-line content.) Remember that when using a here document, if the sentinel word ({{{EOF}}} in our example) is unquoted, then parameter expansions will be performed inside the body. If the sentinel is quoted, then parameter expansions are not performed. Use whichever is more convenient for your task. ===== eval ===== {{{#!highlight bash # Bourne ref=myVar eval "${ref}=\$value" }}} This expands to the statement that is executed: {{{#!highlight bash myVar=$value }}} The right-hand side is not parsed by the shell, so there is no danger of unwanted side effects. The drawback, here, is that every single shell metacharacter on the right hand side of the `=` must be quoted/escaped carefully. In the example shown here, there was only one. In a more complex situation, there could be dozens. This is very often done incorrectly. Permutations like these are seen frequently all over the web even from experienced users that ought to know better: {{{#!highlight bash eval ${ref}=\"$value\" # WRONG! eval "$ref='$value'" # WRONG! eval "${ref}=\$value" # Correct (curly braced PE used for clarity) eval "$ref"'=$value' # Correct (equivalent) }}} The good news is that if you can sanitize the right hand side correctly, this trick is fully portable, has no variable scope issues, and allows all content including newlines. The bad news is that if you fail to sanitize the right hand side correctly, you have a massive security hole. Use `eval` if you know what you're doing and are very careful. The following code demonstrates how to correctly pass a scalar variable name into a function by reference for the purpose of "returning" a value: {{{#!highlight bash # POSIX f() { # Code goes here that eventually sets the variable "x" ${1:+:} return 1 eval "${1}=\$x" } }}} The following code is one way to pass the name of a ksh-compatible 1-dimensional indexed array into a function by reference. There are many ways to balance portability versus functionality depending on your needs. This is mainly for authors of library code that are comfortable working around the subtleties of arrays and functions in each shell. The required version detection code is omitted. More complex solutions are needed if you require robustly dealing with local variable namespace collisions in dynamically-scoped shells including Bash. {{{#!highlight bash # bash/ksh93/mksh/zsh ${ZSH_VERSION+false} || emulate ksh ${BASH_VERSION+shopt -s extglob lastpipe} 2>/dev/null function f { ${1:+:} return 1 ${is_ksh93+eval typeset -n "${1}=\$1"} typeset -a ref # code that generates "ref" goes here. We assume you're returning a non-sparse array. eval "$1"'=("${ref[@]}")' } function main { # The caller MUST always declare the variable. typeset -a arr f arr args... } main "$@" }}} * note: Almost always, you should use "`function`" and "`typeset`" in portable non-POSIX code. Most of this wiki features Bash-only or POSIX-only examples and intentionally avoids these. See the [[https://www.mirbsd.org/htman/i386/man1/mksh.htm|mksh manual]] for `function f` vs `f()` differences. * note2: Don't shift away your reference arg. Even though in ksh93, you can, you can't do that using this method, even if you use Bash-4.3 or mksh namerefs instead of or in addition to `eval`. * note3: Similarly, `eval typeset -n "$1=\$1"` is a ksh93-specific hack to work around dynamic scope breakage. This WILL break bash/mksh namerefs, so the `is_ksh93` in this example must be able to separate AT&T ksh93 from other kshes. ([[https://gist.github.com/ormaaj/5682807|details]]). The following is inefficient, ugly, easy to do incorrectly, and usually unnecessary. This simply documents the single-quote escaping method. It may be useful in cases where there are large numbers of metacharacters that must be put into a string literal and eval'd. You're almost always better off constructing the string in advance using e.g. a heredoc with an escaped sentinel and then just expanding a single variable on the RHS of the assignment as demonstrated in the above example. {{{#!highlight bash # POSIX evil="';" value='echo fail' ref=x { eval "${ref}='$(sed -e "s/'/'\\\''/g")'"; } <<EOF ${evil}${value} EOF [ "${evil}${value}" = "$x" ] && echo success }}} {{{#!highlight bash # Bash/ksh93/zsh eval "$(printf %s=%q "$ref" "$value")" }}} === See Also === * [[http://wiki.bash-hackers.org/syntax/arrays#indirection | More advanced indirection on arrays]] * [[https://gist.github.com/ormaaj/5682807 | Bash vs Mksh vs ksh93 namerefs]] * [[BashFAQ/005]] * [[BashGuide/Arrays]] * [[BashSheet#Arrays|BashSheet Array reference]] ---- CategoryShell |
How can I use variable variables (indirect variables, pointers, references) or associative arrays?
This is a complex page, because it's a complex topic. It's been divided into roughly three parts: associative arrays, evaluating indirect variables, and assigning indirect variables. There are discussions of programming issues and concepts scattered throughout.
TODO / Note
For readers, the important takeaway is: 99% of the time, indirection is used on function parameters to compensate for POSIX shells having badly designed functions that can't return useful data except through indirection. You should not use indirection as a substitute for arrays (associative or indexed, if available, see the first section below). You should sometimes use indirection to pass data in and out of functions when you cannot use the parameters and an I/O stream with a subshell to do so (see second section, but few examples apply to this situation). Most other uses of indirection are incorrect or unnecessary.
Most of this page was written prior to Bash 4.3. Namerefs (typeset/declare/local -n) may significantly change the considerations for indirection. Bash's implementation is briefly described below (more info in FAQ 48), but it differs from that of the much earlier ksh93 variant in some significant ways. One must now consider whether to prefer backwards compatibility with ${!var} or portability with typeset -n. declare and local are completely non-portable with this option and were only introduced by Bash in version 4.3.0 (released 2014) and ksh93v. The bash implementation is actually more similar to mksh's implementation, which isn't yet discussed but also predates that of bash. Bash also doesn't define the default nameref alias used by countless existing scripts in the wild. Recommending a best practice that applies to everyone is difficult at this time.
This page doesn't have much discussion on dynamic scope or weigh the differences between explicit passing of variable names vs implicit assignment to a calling function's local variable. It also prioritizes methods like exploiting the fact that read, printf, and (sometimes) declare can evaluate variable names, rather than using the more straightforward predictable eval approach (explained at the very end). Future POSIX standards may still throw another wrench into the works as standardizing locals (and their scope behavior, which is tightly coupled with nameref behavior) is still discussed on the lists from time to time. One possibility is to standardize the ksh93 static scope for function name functions while defining dynamic scope (as used by ksh88) for POSIX name() type functions, but this has only been proposed informally and there hasn't been much discussion on how namerefs would be affected. discussion
Overhauling this page will take some time and work.
-- 2014/05/20
Contents
Associative Arrays
We introduce associative arrays first, because in the majority of cases where people are trying to use indirect variable assignments/evaluations, they ought to be using associative arrays instead. For instance, we frequently see people asking how they can have a bunch of related variables like IPaddr_hostname1, IPaddr_hostname2 and so on. A more appropriate way to store this data is in an associative array named IPaddr which is indexed by the hostname.
An associative array stores an unordered collection of objects addressed by keys. An object in the collection can be looked up and retrieved by supplying its corresponding key. Since strings are the only real datatype most shells understand, associative arrays map strings to strings, unlike indexed arrays, which map integers to strings and implicitly evaluate the index in a math context (associative arrays do not). Associative arrays exist in AWK as "associative arrays", in Perl as "hashes", in Tcl as "arrays", in Python and C# as "dictionaries", and in Java as a "Map", and in C++11 STL as std::unordered_map.
1 # Bash 4 / ksh93
2
3 typeset -A homedir # Declare associative array
4 homedir=( # Compound assignment
5 [jim]=/home/jim
6 [silvia]=/home/silvia
7 [alex]=/home/alex
8 )
9
10 homedir[ormaaj]=/home/ormaaj # Ordinary assignment adds another single element
11
12 for user in "${!homedir[@]}"; do # Enumerate all indices (user names)
13 printf 'Home directory of user %s is: %q\n' "$user" "${homedir[$user]}"
14 done
Prior to Bash 4 or if you can't use ksh93, your options are limited. Either move to another interpreter (awk, perl, python, ruby, tcl, ...) or re-evaluate your problem to simplify it. There are certain tasks for which associative arrays are a powerful and completely appropriate tool. There are others for which they are overkill, or simply unsuitable.
Suppose we have several subservient hosts with slightly different configuration, and that we want to ssh to each one and run slightly different commands. One way we could set it up would be to hard-code a bunch of ssh commands in per-hostname functions in a single script and just run them in series or in parallel. (Don't reject this out of hand! Simple is good.) Another way would be to store each group of commands as an element of an associative array keyed by the hostname:
This is the kind of approach we'd expect in a high-level language, where we can store hierarchical information in advanced data structures. The difficulty here is that we really want each element of the associative array to be a list or another array of command strings. But the shell simply doesn't permit that kind of data structure.
So, often it pays to step back and think in terms of shells rather than other programming languages. Aren't we just running a script on a remote host? Then why don't we just store the configuration sets as scripts? Then it's simple:
Now we've removed the need for associative arrays, and also the need to maintain a bunch of extremely horrible quoting issues. It is also easy to parallelize using GNU Parallel:
1 parallel ssh {/} bash "<" {} ::: /etc/myapp/*
Associative array hacks in older shells
Before you think of using eval to mimic associative arrays in an older shell (probably by creating a set of variable names like homedir_alex), try to think of a simpler or completely different approach that you could use instead. If this hack still seems to be the best thing to do, consider the following disadvantages:
- It's really hard to read, to keep track of, and to maintain.
The variable names must be a single line and match the RegularExpression ^[a-zA-Z_][a-zA-Z_0-9]*$ -- i.e., a variable name cannot contain arbitrary characters but only letters, digits, and underscores. We cannot have a variable's name contain Unix usernames, for instance -- consider a user named hong-hu. A dash '-' cannot be part of a variable name, so the entire attempt to make a variable named homedir_hong-hu is doomed from the start.
Quoting is hard to get right. If a content string (not a variable name) can contain whitespace characters and quotes, it's hard to quote it right to preserve it through both shell parsings. And that's just for constants, known at the time you write the program. (Bash's printf %q helps, but nothing analogous is available in POSIX shells.)
If the program handles unsanitized user input, it can be VERY dangerous!
Read BashGuide/Arrays or BashFAQ/005 for a more in-depth description and examples of how to use arrays in Bash.
If you need an associative array but your shell doesn't support them, please consider using AWK instead.
Indirection
Think before using indirection
Putting variable names or any other bash syntax inside parameters is frequently done incorrectly and in inappropriate situations to solve problems that have better solutions. It violates the separation between code and data, and as such puts you on a slippery slope toward bugs and security issues. Indirection can make your code less transparent and harder to follow.
Normally, in bash scripting, you won't need indirect references at all. Generally, people look at this for a solution when they don't understand or know about Bash Arrays or haven't fully considered other Bash features such as functions.
Evaluating indirect/reference variables
BASH allows you to expand a parameter indirectly -- that is, one variable may contain the name of another variable:
KornShell (ksh93) has a completely different, more powerful syntax -- the nameref command (also known as typeset -n):
Zsh allows you to access a parameter indirectly with the parameter expansion flag P:
Unfortunately, for shells other than Bash, ksh93, and zsh there is no syntax for evaluating a referenced variable. You would have to use eval, which means you would have to undergo extreme measures to sanitize your data to avoid catastrophe.
It's difficult to imagine a practical use for this that wouldn't be just as easily performed by using an associative array. But people ask it all the time (it is genuinely a frequently asked question).
ksh93's nameref allows us to work with references to arrays, as well as regular scalar variables. For example,
zsh's ability to nest parameter expansions allow for referencing arrays too:
We are not aware of any trick that can duplicate that functionality in POSIX or Bourne shells without eval, which can be difficult to do securely. Older versions of Bash can almost do it -- some indirect array tricks work, and others do not, and we do not know whether the syntax involved will remain stable in future releases. So, consider this a use at your own risk hack.
1 # Bash -- trick #2. Seems to work in bash 3 and up.
2 # Can't be combined with special expansions until 4.3. e.g. "${!tmp##*/}"
3 # Does NOT work in bash 2.05b -- Expands to one word instead of three in bash 2.
4 tmp=${ref}[@]
5 printf "<%s> " "${!tmp}"; echo # Iterate whole array as one word per element.
It is not possible to retrieve array indices directly using the Bash ${!var} indirect expansion.
Assigning indirect/reference variables
Sometimes you'd like to "point" from one variable to another, for purposes of writing information to a dynamically configurable place. Typically this happens when you're trying to write a "reusable" function or library, and you want it to put its output in a variable of the caller's choice instead of the function's choice. (Various traits of Bash make safe reusability of Bash functions difficult at best, so this is something that should not happen often.)
Assigning a value "through" a reference (I'm going to use "ref" from now on) is more widely possible, but the means of doing so are usually extremely shell-specific. All shells with the sole exception of AT&T ksh93 lack real reference variables or pointers. Indirection can only be achieved by indirectly evaluating variable names. IOW, you can never have a real unambiguous reference to an object in memory, the best you can do is use the name of a variable to try simulating the effect. Therefore, you must control the value of the ref and ensure side-effects such as globbing, user-input, and conflicting local parameters can't affect parameter names. Names must either be deterministic or validated in a way that makes certain guarantees. If an end user can populate the ref variable with arbitrary strings, the result can be unexpected code injection. We'll show an example of this at the end.
In ksh93, we can use nameref again:
In zsh, using parameter expansions ::= and expansion flags P:
In Bash, if you only want to assign a single line to the variable, you can use read and Bash's here string syntax:
If you need to assign multiline values, keep reading.
A similar trick works for Bash array variables too:
IFS is used to delimit words, so you may or may not need to set that. Also note that the read command will return failure because there is no terminating NUL byte for the -d '' to catch. Be prepared to ignore that failure.
Another trick is to use Bash's printf -v (only available in recent versions):
You can use all of printf's formatting capabilities. This trick also permits any string content, including embedded and trailing newlines.
Yet another trick is Korn shell's typeset or Bash's declare. The details of typeset vary greatly between shells, but can be used in compatible ways in limited situations. Both of them cause a variable to become locally scoped to a function, if used inside a function; but if used outside all functions, they can operate on global variables.
Bash 4.2 adds declare -g which assigns variables to the global scope from any context.
There is very little advantage to typeset or declare over eval for scalar assignments, but many drawbacks. typeset cannot be made to not affect the scope of the assigned variable. This trick does preserve the exact contents, like eval, if correctly escaped.
You must still be careful about what is on the left-hand side of the assignment. Inside square brackets, expansions are still performed; thus, with a tainted ref, declare can be just as dangerous as eval:
This problem also exists with typeset in mksh and pdksh, but apparently not ksh93. This is why the value of ref must be under your control at all times.
If you aren't using Bash or Korn shell, you can do assignments to referenced variables using HereDocument syntax:
(Alas, read without -d means we're back to only getting at most one line of content. This is the most portable trick, but it's limited to single-line content.)
Remember that when using a here document, if the sentinel word (EOF in our example) is unquoted, then parameter expansions will be performed inside the body. If the sentinel is quoted, then parameter expansions are not performed. Use whichever is more convenient for your task.
eval
This expands to the statement that is executed:
1 myVar=$value
The right-hand side is not parsed by the shell, so there is no danger of unwanted side effects. The drawback, here, is that every single shell metacharacter on the right hand side of the = must be quoted/escaped carefully. In the example shown here, there was only one. In a more complex situation, there could be dozens.
This is very often done incorrectly. Permutations like these are seen frequently all over the web even from experienced users that ought to know better:
The good news is that if you can sanitize the right hand side correctly, this trick is fully portable, has no variable scope issues, and allows all content including newlines. The bad news is that if you fail to sanitize the right hand side correctly, you have a massive security hole. Use eval if you know what you're doing and are very careful.
The following code demonstrates how to correctly pass a scalar variable name into a function by reference for the purpose of "returning" a value:
The following code is one way to pass the name of a ksh-compatible 1-dimensional indexed array into a function by reference. There are many ways to balance portability versus functionality depending on your needs. This is mainly for authors of library code that are comfortable working around the subtleties of arrays and functions in each shell. The required version detection code is omitted. More complex solutions are needed if you require robustly dealing with local variable namespace collisions in dynamically-scoped shells including Bash.
1 # bash/ksh93/mksh/zsh
2
3 ${ZSH_VERSION+false} || emulate ksh
4 ${BASH_VERSION+shopt -s extglob lastpipe} 2>/dev/null
5
6 function f {
7 ${1:+:} return 1
8 ${is_ksh93+eval typeset -n "${1}=\$1"}
9 typeset -a ref
10
11 # code that generates "ref" goes here. We assume you're returning a non-sparse array.
12
13 eval "$1"'=("${ref[@]}")'
14 }
15
16 function main {
17 # The caller MUST always declare the variable.
18 typeset -a arr
19 f arr args...
20 }
21
22 main "$@"
note: Almost always, you should use "function" and "typeset" in portable non-POSIX code. Most of this wiki features Bash-only or POSIX-only examples and intentionally avoids these. See the mksh manual for function f vs f() differences.
note2: Don't shift away your reference arg. Even though in ksh93, you can, you can't do that using this method, even if you use Bash-4.3 or mksh namerefs instead of or in addition to eval.
note3: Similarly, eval typeset -n "$1=\$1" is a ksh93-specific hack to work around dynamic scope breakage. This WILL break bash/mksh namerefs, so the is_ksh93 in this example must be able to separate AT&T ksh93 from other kshes. (details).
The following is inefficient, ugly, easy to do incorrectly, and usually unnecessary. This simply documents the single-quote escaping method. It may be useful in cases where there are large numbers of metacharacters that must be put into a string literal and eval'd. You're almost always better off constructing the string in advance using e.g. a heredoc with an escaped sentinel and then just expanding a single variable on the RHS of the assignment as demonstrated in the above example.