7723
Comment: fix reads without -r, put IFS in appropriate places, mention limitations of each approach
|
10361
fix typo (k93 for ksh93); another bold->italics
|
Deletions are marked like this. | Additions are marked like this. |
Line 3: | Line 3: |
There are two halves to this: evaluating variables, and assigning values. We'll take each half separately: | There are two halves to this: evaluating variables, and assigning values. We'll take each half separately. === Obligatory Note === Putting variable names or any other bash syntax inside parameters is generally a bad idea. It violates the separation between code and data; and as such brings you on a slippery slope toward bugs, security issues etc. ''Even'' when you know you "got it right", because you "know and ''understand'' exactly what you're doing", bugs happen to all of us and it pays to respect separation practices to minimize the extent of damage they can have. Aside from that, it also makes your code non-obvious and non-transparent. Normally, in bash scripting, you won't need indirect references at all. Generally, people look at this for a solution when they don't understand or know about [[BashGuide/Arrays|Bash Arrays]] or haven't fully considered other Bash features such as functions. If you're trying to emulate associative arrays, skip the first two sections and look below for the section about associative arrays in particular. |
Line 87: | Line 98: |
''However'', with bash, you must still careful about what is on the ''left''-hand side of the assignment. Inside square brackets, expansions are still performed; thus declare can be just as dangerous as eval: {{{ # Bash: ref='x[$(touch evilfile; echo 0)]' ls -l evilfile # No such file or directory declare "$ref=value" ls -l evilfile # It exists now! }}} This problem also exists with `typeset` in mksh and pdksh, but apparently not ksh93. |
|
Line 127: | Line 148: |
for user in ${!homedir[@]} # Enumerate all indices (user names) | for user in "${!homedir[@]}" # Enumerate all indices (user names) |
Line 133: | Line 154: |
BASH version 4.0 finally supports them, though older versions do not. | BASH supports them from version 4 and up: |
Line 136: | Line 157: |
# bash 4.0 | # Bash 4 and up |
Line 139: | Line 160: |
... (same as the ksh93 example, other than declare vs. typeset) | # or homedir=( [jim]=/home/jim [silvia]=/home/silvia [alex]=/home/alex ) ... |
Line 142: | Line 167: |
If you can't use ksh93 or bash 4.0, consider switching to awk, perl, ksh93, tcl, etc. if you need this type of data structure to solve your problem. | Prior to Bash 4 or if you can't use ksh93, your options are limited. Either move to another interpreter (awk, perl, python, ruby, tcl, ...) or re-evaluate your problem to ''simplify it''. The following example removes the need for associative arrays when you're trying to configure named instances by putting the configuration in a file structure rather than a keyed array structure: |
Line 144: | Line 170: |
Before you think of using `eval` to mimic this behavior in an older shell (probably by creating a set of variable names like `homedir_alex`), try to think of a simpler approach that you could use instead. If this hack still seems to be the best thing to do, have a look at the following disadvantages: | {{{ # A series of conf files describe hosts where we need to run commands on. for conf in ~/.myapp/*.conf; do source "$conf" for command in "${commands[@]}"; do ssh "$host" "$command" done done |
Line 146: | Line 179: |
1. It's hard to read and to maintain. 1. The variable names must match the RegularExpression {{{^[a-zA-Z_][a-zA-Z_0-9]*}}} -- i.e., a variable name cannot contain arbitrary characters but only letters, digits, and underscores. We cannot have a variable's name contain Unix usernames, for instance -- consider a user named {{{hong-hu}}}. A dash '-' cannot be part of a variable name, so the entire attempt to make a variable named `homedir_hong-hu` is doomed from the start. 1. Quoting is hard to get right. If content strings (not variable names) can contain whitespace characters and quotes, it's hard to quote it right to preserve it through both shell parsings. And that's just for ''constants'', known at the time you write the program. 1. If the program handles unsanitized user input, it can be [[BashFAQ/048|VERY dangerous]]! |
# myhost.conf is plain bash syntax, and could look like this: host=buildserver commands=( "mvn clean install" "cd webapp && mvn jetty:run" ) }}} The bonus to this is that you can express even more complex structures than allowed by bash 4's associative arrays. You're basically mapping a name (myhost) to a host and an array of commands, which would require support for multi-dimensional/recursive arrays, which bash 4 doesn't have. Before you think of using `eval` to mimic this behavior in an older shell (probably by creating a set of variable names like `homedir_alex`), try to think of a simpler or completely different approach that you could use instead. If this hack still seems to be the best thing to do, consider the following disadvantages: 1. It's really hard to read, keep track of, and to maintain. 2. The variable names must match the RegularExpression {{{^[a-zA-Z_][a-zA-Z_0-9]*}}} -- i.e., a variable name cannot contain arbitrary characters but only letters, digits, and underscores. We cannot have a variable's name contain Unix usernames, for instance -- consider a user named {{{hong-hu}}}. A dash '-' cannot be part of a variable name, so the entire attempt to make a variable named `homedir_hong-hu` is doomed from the start. 3. Quoting is hard to get right. If content strings (not variable names) can contain whitespace characters and quotes, it's hard to quote it right to preserve it through both shell parsings. And that's just for ''constants'', known at the time you write the program. (Bash's `printf %q` helps, but nothing analogous is available in POSIX shells.) 4. If the program handles unsanitized user input, it can be [[BashFAQ/048|VERY dangerous]]! Read [[BashGuide/Arrays]] or [[BashFAQ/005]] for a more in-depth description and examples of how to use arrays in Bash. |
How can I use variable variables (indirect variables, pointers, references) or associative arrays?
There are two halves to this: evaluating variables, and assigning values. We'll take each half separately.
Obligatory Note
Putting variable names or any other bash syntax inside parameters is generally a bad idea. It violates the separation between code and data; and as such brings you on a slippery slope toward bugs, security issues etc. Even when you know you "got it right", because you "know and understand exactly what you're doing", bugs happen to all of us and it pays to respect separation practices to minimize the extent of damage they can have.
Aside from that, it also makes your code non-obvious and non-transparent.
Normally, in bash scripting, you won't need indirect references at all. Generally, people look at this for a solution when they don't understand or know about Bash Arrays or haven't fully considered other Bash features such as functions.
If you're trying to emulate associative arrays, skip the first two sections and look below for the section about associative arrays in particular.
Evaluating indirect/reference variables
BASH allows you to expand a parameter indirectly -- that is, one variable may contain the name of another variable:
# Bash realvariable=contents ref=realvariable echo "${!ref}" # prints the contents of the real variable
KornShell (ksh93) has a completely different, more powerful syntax -- the nameref command (also known as typeset -n):
# ksh93 realvariable=contents nameref ref=realvariable echo "$ref" # prints the contents of the real variable
ksh93's nameref allows us to work with references to arrays, as well as regular scalar variables. For example,
# ksh93 myfunc() { nameref ref=$1 echo "array $1 has ${#ref[*]} elements" } realarray=(...) myfunc realarray
We are not aware of any trick that can duplicate that functionality in Bash, POSIX or Bourne shells (short of using eval, which is extremely difficult to do securely).
Unfortunately, for shells other than Bash and ksh93, there is no syntax for evaluating a referenced variable. You would have to use eval, which means you would have to undergo extreme measures to sanitize your data to avoid catastrophe.
Assigning indirect/reference variables
Assigning a value "through" a reference (or pointer, or indirect variable, or whatever you want to call it -- I'm going to use "ref" from now on) is more widely possible, but the means of doing so are extremely shell-specific.
In ksh93, we can just use nameref again:
# ksh93 nameref ref=realvariable ref="contents" # realvariable now contains the string "contents"
In Bash, we can use read and Bash's here string syntax:
# Bash ref=realvariable IFS= read -r $ref <<< "contents" # realvariable now contains the string "contents"
However, this only works if there are no newlines in the content. If you need to assign multiline values, keep reading.
A similar trick works for Bash array variables too:
# Bash aref=realarray read -r -a $aref <<< "words go into array elements" echo "${realarray[1]}" # prints "go"
(Again, newlines in the input will break this trick. IFS is used to delimit words, so you may or may not need to set that.)
Another trick is to use Bash's printf -v (only available in recent versions):
# Bash 3.1 or higher ref=realvariable printf -v $ref %s "contents"
The printf -v trick is handy if your contents aren't a constant string, but rather, something dynamically generated. You can use all of printf's formatting capabilities. This trick also permits any string content, including embedded newlines (but not NUL bytes - no force in the universe can put NUL bytes into shell strings usefully). This is the best trick to use if you're in bash 3.1 or higher.
Yet another trick is Korn shell's typeset or Bash's declare. These are roughly equivalent to each other. Both of them cause a variable to become locally scoped to a function, if used inside a function; but if used outside a function, they can operate on global variables.
# Korn shell (all versions): typeset $ref="contents" # Bash: declare $ref="contents"
The advantage of using typeset or declare over eval is that the right hand side of the assignment is not parsed by the shell. If you used eval here, you would have to sanitize/escape the entire right hand side first. This trick also preserves the contents exactly, including newlines, so this is the best trick to use if you're in bash older than 3.1 (or ksh88) and don't need to worry about accidentally changing your variable's scope (i.e., you're not using it inside a function).
However, with bash, you must still careful about what is on the left-hand side of the assignment. Inside square brackets, expansions are still performed; thus declare can be just as dangerous as eval:
# Bash: ref='x[$(touch evilfile; echo 0)]' ls -l evilfile # No such file or directory declare "$ref=value" ls -l evilfile # It exists now!
This problem also exists with typeset in mksh and pdksh, but apparently not ksh93.
If you aren't using Bash or Korn shell, you can do assignments to referenced variables using here document syntax:
# Bourne ref=realvariable read $ref <<EOF contents EOF
(Alas, read means we're back to only getting at most one line of content. This is the most portable trick, but it's limited to single-line content.)
Remember that, when using a here document, if the sentinel word (EOF in our example) is unquoted, then parameter expansions will be performed inside the body. If the sentinel is quoted, then parameter expansions are not performed. Use whichever is more convenient for your task.
Finally, some people just cannot resist throwing eval into the picture:
# Bourne ref=myVar eval "$ref=\$value"
This expands to the statement that is executed:
myVar=$value
The right-hand side is not parsed by the shell, so there is no danger of unwanted side effects. The drawback, here, is that every single shell metacharacter on the right hand side of the = must be escaped carefully. In the example shown here, there was only one. In a more complex situation, there could be dozens.
The good news is that if you can sanitize the right hand side correctly, this trick is fully portable, has no variable scope issues, and allows all content including newlines. The bad news is that if you fail to sanitize the right hand side correctly, you have a massive security hole. Use eval at your own risk.
Associative Arrays
Sometimes it's convenient to have associative arrays, arrays indexed by a string. Awk has associative arrays. Perl calls them "hashes", while Tcl simply calls them "arrays". ksh93 supports this kind of array:
# ksh93 typeset -A homedir # Declare ksh93 associative array homedir[jim]=/home/jim homedir[silvia]=/home/silvia homedir[alex]=/home/alex for user in "${!homedir[@]}" # Enumerate all indices (user names) do echo "Home directory of user $user is ${homedir[$user]}" done
BASH supports them from version 4 and up:
# Bash 4 and up declare -A homedir homedir[jim]=/home/jim # or homedir=( [jim]=/home/jim [silvia]=/home/silvia [alex]=/home/alex ) ...
Prior to Bash 4 or if you can't use ksh93, your options are limited. Either move to another interpreter (awk, perl, python, ruby, tcl, ...) or re-evaluate your problem to simplify it. The following example removes the need for associative arrays when you're trying to configure named instances by putting the configuration in a file structure rather than a keyed array structure:
# A series of conf files describe hosts where we need to run commands on. for conf in ~/.myapp/*.conf; do source "$conf" for command in "${commands[@]}"; do ssh "$host" "$command" done done # myhost.conf is plain bash syntax, and could look like this: host=buildserver commands=( "mvn clean install" "cd webapp && mvn jetty:run" )
The bonus to this is that you can express even more complex structures than allowed by bash 4's associative arrays. You're basically mapping a name (myhost) to a host and an array of commands, which would require support for multi-dimensional/recursive arrays, which bash 4 doesn't have.
Before you think of using eval to mimic this behavior in an older shell (probably by creating a set of variable names like homedir_alex), try to think of a simpler or completely different approach that you could use instead. If this hack still seems to be the best thing to do, consider the following disadvantages:
- It's really hard to read, keep track of, and to maintain.
The variable names must match the RegularExpression ^[a-zA-Z_][a-zA-Z_0-9]* -- i.e., a variable name cannot contain arbitrary characters but only letters, digits, and underscores. We cannot have a variable's name contain Unix usernames, for instance -- consider a user named hong-hu. A dash '-' cannot be part of a variable name, so the entire attempt to make a variable named homedir_hong-hu is doomed from the start.
Quoting is hard to get right. If content strings (not variable names) can contain whitespace characters and quotes, it's hard to quote it right to preserve it through both shell parsings. And that's just for constants, known at the time you write the program. (Bash's printf %q helps, but nothing analogous is available in POSIX shells.)
If the program handles unsanitized user input, it can be VERY dangerous!
Read BashGuide/Arrays or BashFAQ/005 for a more in-depth description and examples of how to use arrays in Bash.