BASH Frequently Asked Questions
Note: This version is the same as the BashFAQ page but with the full text of the faqs, it is much slower to load but it is easier to use if you want to download the whole FAQ, notably the printable version is handy is you need a local copy |
These are answers to frequently asked questions on channel #bash on the freenode IRC network. These answers are contributed by the regular members of the channel (originally heiner, and then others including greycat and r00t), and by users like you. If you find something inaccurate or simply misspelled, please feel free to correct it!
All the information here is presented without any warranty or guarantee of accuracy. Use it at your own risk. When in doubt, please consult the man pages or the GNU info pages as the authoritative references.
BASH is a BourneShell compatible shell, which adds many new features to its ancestor. Most of them are available in the KornShell, too. The answers given in this FAQ may be slanted toward Bash, or they may be slanted toward the lowest common denominator Bourne shell, depending on who wrote the answer. In most cases, an effort is made to provide both a portable (Bourne) and an efficient (Bash, where appropriate) answer. If a question is not strictly shell specific, but rather related to Unix, it may be in the UnixFaq.
This FAQ assumes a certain level of familiarity with basic shell script syntax. If you're completely new to Bash or to the Bourne family of shells, you may wish to start with the (incomplete) BashGuide.
If you can't find the answer you're looking for here, try BashPitfalls. If you want to help, you can add new questions with answers here, or try to answer one of the BashOpenQuestions.
Chet Ramey's official Bash FAQ contains many technical questions not covered here.
Contents
- How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?
- How can I store the return value and/or output of a command in a variable?
- How can I sort or compare files based on some metadata attribute (newest / oldest modification time, size, etc)?
- How can I check whether a directory is empty or not? How do I check for any *.mpg files, or count how many there are?
- How can I use array variables?
- See Also
- How can I use variable variables (indirect variables, pointers, references) or associative arrays?
- Is there a function to return the length of a string?
- How can I recursively search all files for a string?
- What is buffering? Or, why does my command line produce no output: tail -f logfile | grep 'foo bar' | awk ...
- How can I recreate a directory hierarchy structure, without the files?
- How can I print the n'th line of a file?
- How do I invoke a shell command from a non-shell application?
- How can I concatenate two variables? How do I append a string to a variable?
- How can I redirect the output of multiple commands at once?
- How can I run a command on all files with the extension .gz?
- How can I use a logical AND/OR/NOT in a shell pattern (glob)?
- How can I group expressions in an if statement, e.g. if (A AND B) OR C?
- How can I use numbers with leading zeros in a loop, e.g. 01, 02?
- How can I split a file into line ranges, e.g. lines 1-10, 11-20, 21-30?
- How can I find and safely handle file names containing newlines, spaces or both?
- How can I replace a string with another string in a variable, a stream, a file, or in all the files in a directory?
- How can I calculate with floating point numbers instead of just integers?
- I want to launch an interactive shell that has special aliases and functions, not the ones in the user's ~/.bashrc.
- I set variables in a loop that's in a pipeline. Why do they disappear after the loop terminates? Or, why can't I pipe data to read?
- How can I access positional parameters after $9?
- How can I randomize (shuffle) the order of lines in a file? Or select a random line from a file, or select a random file from a directory?
- How can two unrelated processes communicate?
- How do I determine the location of my script? I want to read some config files from the same place.
- How can I display the target of a symbolic link?
- How can I rename all my *.foo files to *.bar, or convert spaces to underscores, or convert upper-case file names to lower case?
- What is the difference between test, [ and [[ ?
- How can I redirect the output of 'time' to a variable or file?
- How can I find a process ID for a process given its name?
- Can I do a spinner in Bash?
- How can I handle command-line options and arguments in my script easily?
- How can I get all lines that are: in both of two files (set intersection) or in only one of two files (set subtraction).
- How can I print text in various colors?
- How do Unix file permissions work?
- What are all the dot-files that bash reads?
- How do I use dialog to get input from the user?
- How do I determine whether a variable contains a substring?
- How can I find out if a process is still running?
- Why does my crontab job fail? 0 0 * * * some command > /var/log/mylog.`date +%Y%m%d`
- How do I create a progress bar? How do I see a progress indicator when copying/moving files?
- How can I ensure that only one instance of a script is running at a time (mutual exclusion, locking)?
- I want to check to see whether a word is in a list (or an element is a member of a set).
- How can I redirect stderr to a pipe?
- Eval command and security issues
- How can I view periodic updates/appends to a file? (ex: growing log file)
-
I'm trying to put a command in a variable, but the complex cases always fail!
- Things that do not work
- I'm trying to save a command so I can run it later without having to repeat it each time
- I only want to pass options if the runtime data needs them
- I want to generalize a task, in case the low-level tool changes later
- I'm constructing a command based on information that is only known at run time
- I want a log of my script's actions
- I want history-search just like in tcsh. How can I bind it to the up and down keys?
- How do I convert a file from DOS format to UNIX format (remove CRs from CR-LF line terminators)?
- I have a fancy prompt with colors, and now bash doesn't seem to know how wide my terminal is. Lines wrap around incorrectly.
- How can I tell whether a variable contains a valid number?
- Tell me all about 2>&1 -- what's the difference between 2>&1 >foo and >foo 2>&1, and when do I use which?
- How can I untar (or unzip) multiple tarballs at once?
- How can I group entries (in a file by common prefixes)?
- Can bash handle binary data?
- I saw this command somewhere: :(){ :|:& } (fork bomb). How does it work?
- I'm trying to write a script that will change directory (or set a variable), but after the script finishes, I'm back where I started (or my variable isn't set)!
- Is there a list of which features were added to specific releases (versions) of Bash?
- How do I create a temporary file in a secure manner?
- My ssh client hangs when I try to logout after running a remote background job!
- Why is it so hard to get an answer to the question that I asked in #bash?
- Is there a "PAUSE" command in bash like there is in MSDOS batch scripts? To prompt the user to press any key to continue?
- I want to check if [[ $var == foo || $var == bar || $var == more ]] without repeating $var n times.
- How can I trim leading/trailing white space from one of my variables?
- How do I run a command, and have it abort (timeout) after N seconds?
- I want to automate an ssh (or scp, or sftp) connection, but I don't know how to send the password....
- How do I convert Unix (epoch) times to human-readable values?
- How do I convert an ASCII character to its decimal (or hexadecimal) value and back? How do I do URL encoding or URL decoding?
- How can I ensure my environment is configured for cron, batch, and at jobs?
- How can I use parameter expansion? How can I get substrings? How can I get a file without its extension, or get just a file's extension? What are some good ways to do basename and dirname?
- How do I get the effects of those nifty Bash Parameter Expansions in older shells?
- How do I use 'find'? I can't understand the man page at all!
- How do I get the sum of all the numbers in a column?
- How do I log history or "secure" bash against history removal?
- I want to set a user's password using the Unix passwd command, but how do I script that? It doesn't read standard input!
- How can I grep for lines containing foo AND bar, foo OR bar? Or for files containing foo AND bar, possibly on separate lines? Or files containing foo but NOT bar?
- How can I make an alias that takes an argument?
- How can I determine whether a command exists anywhere in my PATH?
- Why is $(...) preferred over `...` (backticks)?
- How do I determine whether a variable is already defined? Or a function?
- How do I return a string (or large number, or negative number) from a function? "return" only lets me give a number from 0 to 255.
- How to write several times to a fifo without having to reopen it?
- How to ignore aliases, functions, or builtins when running a command?
- How can I get a file's permissions (or other metadata) without parsing ls -l output?
- How can I avoid losing any history lines?
- I'm reading a file line by line and running ssh or ffmpeg, only the first line gets processed!
- How do I prepend a text to a file (the opposite of >>)?
- I'm trying to get the number of columns or lines of my terminal but the variables COLUMNS / LINES are always empty.
- How do I write a CGI script that accepts parameters?
- How can I set the contents of my terminal's title bar?
- I want to get an alert when my disk is full (parsing df output).
- I'm getting "Argument list too long". How can I process a large list in chunks?
- ssh eats my word boundaries! I can't do ssh remotehost make CFLAGS="-g -O"!
- How do I determine whether a symlink is dangling (broken)?
- How to add localization support to your bash scripts
- How can I get the newest (or oldest) file from a directory?
- How do I do string manipulations in bash?
- Common utility functions (warn, die)
- How to get the difference between two dates
- How do I check whether my file was modified in a certain month or date range?
- Why doesn't foo=bar echo "$foo" print bar?
- Why doesn't set -e (or set -o errexit, or trap ERR) do what I expected?
- Logging! I want to send all of my script's output to a log file. But I want to do it from inside the script. And I want to see it on the terminal too!
- How do I add a timestamp to every line of a stream?
- How do I wait for several spawned processes?
- How can I tell whether my script was sourced (dotted in) or executed?
- How do I copy a file to a remote system, and specify a remote name which may contain spaces?
- What is the Shellshock vulnerability in Bash?
- What are the advantages and disadvantages of using set -u (or set -o nounset)?
- How do I extract data from an HTML or XML file?
- How do I operate on IP addresses and netmasks?
- How do I make a menu?
- I have two files. The first one contains bad IP addresses (plus other fields). I want to remove all of these bad addresses from a second file.
- I have a pipeline where a long-running command feeds into a filter. If the filter finds "foo", I want the long-running command to die.
- How do I print the contents of an array in reverse order, or reverse an array?
- What's the difference between "cmd < file" and "cat file | cmd"? What is a UUOC?
- How can I find out where this strange variable in my interactive shell came from?
- What does value too great for base mean? (Octal values in arithmetic.)
1. How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?
Don't try to use "for". Use a while loop and the read command. Here is the basic template; there are many variations to discuss:
line is a variable name, chosen by you. You can use any valid shell variable name(s) there; see field splitting below.
< "$file" redirects the loop's input from a file whose name is stored in a variable; see source selection below.
If you want to read lines from a file into an array, see FAQ 5.
Contents
1.1. Field splitting, whitespace trimming, and other input processing
The -r option to read prevents backslash interpretation (usually used as a backslash newline pair, to continue over multiple lines or to escape the delimiters). Without this option, any unescaped backslashes in the input will be discarded. You should almost always use the -r option with read.
The most common exception to this rule is when -e is used, which uses Readline to obtain the line from an interactive shell. In that case, tab completion will add backslashes to escape spaces and such, and you do not want them to be literally included in the variable. This would never be used when reading anything line-by-line, though, and -r should always be used when doing so.
By default, read modifies each line read, by removing all leading and trailing whitespace characters (spaces and tabs, if present in IFS). If that is not desired, the IFS variable may be cleared, as in the example above. If you want the trimming, leave IFS alone:
If you want to operate on individual fields within each line, you may supply additional variables to read:
If the field delimiters are not whitespace, you can set IFS (internal field separator):
For tab-delimited files, use IFS=$'\t' though beware that multiple tab characters in the input will be considered as one delimiter (and the Ksh93/Zsh IFS=$'\t\t' workaround won't work in Bash).
You do not necessarily need to know how many fields each line of input contains. If you supply more variables than there are fields, the extra variables will be empty. If you supply fewer, the last variable gets "all the rest" of the fields after the preceding ones are satisfied. For example,
Some people use the throwaway variable _ as a "junk variable" to ignore fields. It (or indeed any variable) can also be used more than once in a single read command, if we don't care what goes into it:
Note that this usage of _ is only guaranteed to work in Bash. Many other shells use _ for other purposes that will at best cause this to not have the desired effect, and can break the script entirely. It is better to choose a unique variable that isn't used elsewhere in the script, even though _ is a common Bash convention.
If avoiding comments starting with # is desired, you can simply skip them inside the loop:
1.2. Input source selection
The redirection < "$file" tells the while loop to read from the file whose name is in the variable file. If you would prefer to use a literal pathname instead of a variable, you may do that as well. If your input source is the script's standard input, then you don't need any redirection at all.
If your input source is the contents of a variable/parameter, bash can iterate over its lines using a here string:
The same can be done in any Bourne-type shell by using a "here document" (although read -r is POSIX, not Bourne):
One may also read from a command instead of a regular file:
This method is especially useful for processing the output of find with a block of commands:
This reads one filename at a time from the find command and renames the file, replacing spaces with underscores.
Note the usage of -print0 in the find command, which uses NUL bytes as filename delimiters; and -d '' in the read command to instruct it to read all text into the file variable until it finds a NUL byte. By default, find and read delimit their input with newlines; however, since filenames can potentially contain newlines themselves, this default behaviour will split up those filenames at the newlines and cause the loop body to fail. Additionally it is necessary to set IFS to an empty string, because otherwise read would still strip leading and trailing whitespace. See FAQ #20 for more details.
Using a pipe to send find's output into a while loop places the loop in a SubShell, which means any state changes you make (changing variables, cd, opening and closing files, etc.) will be lost when the loop finishes. To avoid that, you may use a ProcessSubstitution:
See FAQ 24 for more discussion.
1.3. My text files are broken! They lack their final newlines!
If there are some characters after the last line in the file (or to put it differently, if the last line is not terminated by a newline character), then read will read it but return false, leaving the broken partial line in the read variable(s). You can process this after the loop:
Or:
1 # This does not work:
2 printf 'line 1\ntruncated line 2' | while read -r line; do
3 echo $line
4 done
5
6 # This does not work either:
7 printf 'line 1\ntruncated line 2' | while read -r line; do
8 echo "$line"
9 done
10 [[ $line ]] && echo -n "$line"
11
12 # This works:
13 printf 'line 1\ntruncated line 2' | {
14 while read -r line; do
15 echo "$line"
16 done
17 [[ $line ]] && echo "$line"
18 }
The first example, beyond missing the after-loop test, is also missing quotes. See Quotes or Arguments for an explanation why. The Arguments page is an especially important read.
For a discussion of why the second example above does not work as expected, see FAQ #24.
Alternatively, you can simply add a logical OR to the while test:
1.4. How to keep other commands from "eating" the input
Some commands greedily eat up all available data on standard input. The examples above do not take precautions against such programs. For example,
will only print the contents of the first line, with the remaining contents going to "ignoredfile", as cat slurps up all available input.
One workaround is to use a numeric FileDescriptor rather than standard input:
1 # Bash
2 while IFS= read -r -u 9 line; do
3 cat > ignoredfile
4 printf '%s\n' "$line"
5 done 9< "$file"
6
7 # Note that read -u is not portable to every shell.
8 # Use a redirect to ensure it works in any POSIX compliant shell:
9 while IFS= read -r line <&9; do
10 cat > ignoredfile
11 printf '%s\n' "$line"
12 done 9< "$file"
Or:
This example will wait for the user to type something into the file ignoredfile at each iteration instead of eating up the loop input.
You might need this, for example, with mencoder which will accept user input if there is any, but will continue silently if there isn't. Other commands that act this way include ssh and ffmpeg. Additional workarounds for this are discussed in FAQ #89.
2. How can I store the return value and/or output of a command in a variable?
Well, that depends on whether you want to store the command's output (either stdout, or stdout + stderr) or its exit status (0 to 255, with 0 typically meaning "success").
If you want to capture the output, you use command substitution:
If you want the exit status, you use the special parameter $? after running the command:
If you want both:
The assignment to output has no effect on command's exit status, which is still in $?.
If you don't actually want to store the exit status, but simply want to take an action upon success or failure, just use if:
Or if you want to capture stdout as well as taking action on success/failure, without explicitly storing or checking $?:
If you don't understand the difference between standard output and standard error. here is a brief demonstration. A sane command writes the output that you request to standard output (stdout) and only writes errors to standard error (stderr). Like so:
$ dig +short A mywiki.wooledge.org 199.231.184.176 $ ip=$(dig +short A mywiki.wooledge.org) $ echo "{$ip}" {199.231.184.176} $ ls no-such-file ls: cannot access 'no-such-file': No such file or directory $ output=$(ls no-such-file) ls: cannot access 'no-such-file': No such file or directory $ echo "{$output}" {}
In the example above, dig wrote output to stdout, which was captured in the ip variable. ls encountered an error, so it did not write anything to stdout. It wrote to stderr, which was not captured (because we didn't use 2>&1). The error message appeared directly on the terminal instead.
Some commands are not well-written, however, and may write information to the wrong place. You must keep an eye out for such commands, and work around them when necessary. For example:
$ vers=$(python --version) Python 2.7.13 $ echo "{$vers}" {}
Even though we specifically asked for the version number, python wrote it to stderr. Thus, it appeared on the terminal, and was not captured in the vers variable. You'd need to use 2>&1 here.
What if you want the exit status of one command from a pipeline? If you want the last command's status, no problem -- it's in $? just like before. If you want some other command's status, use the PIPESTATUS array (BASH only. In the case of Zsh, it's lower-cased pipestatus). Say you want the exit status of grep in the following:
Bash 3.0 added a pipefail option as well, which can be used if you simply want to take action upon failure of the grep:
Now, some trickier stuff. Let's say you want only the stderr, but not stdout. Well, then first you have to decide where you do want stdout to go:
Since the last example may seem a bit confusing, here is the explanation. First, keep in mind that 1>&3- is equivalent to 1>&3 3>&-. So it will be easier to analyse the following sequence: $(... 3>&2 2>&1 1>&3 3>&-)
Redirection |
fd 0 (stdin) |
fd 1 (stdout) |
fd 2 (stderr) |
fd 3 |
Description |
initial |
/dev/tty |
/dev/tty |
/dev/tty |
|
Let's assume this is run in a terminal, so stdin, stdout and stderr are all initially connected to the terminal (tty). |
$(...) |
/dev/tty |
pipe |
/dev/tty |
|
First, the command substitution is set up. Command's stdout (FileDescriptor 1) gets captured (by using a pipe internally). Command's stderr (FD 2) still points to its regular place (the script's stderr). |
3>&2 |
/dev/tty |
pipe |
/dev/tty |
/dev/tty |
Next, FD 3 should point to what FD 2 points to at this very moment, meaning FD 3 will point to the script's stderr ("save stderr in FD 3"). |
2>&1 |
/dev/tty |
pipe |
pipe |
/dev/tty |
Next, FD 2 should point to what FD 1 currently points to, meaning FD 2 will point to stdout. Right now, both FD 2 and FD 1 would be captured. |
1>&3 |
/dev/tty |
/dev/tty |
pipe |
/dev/tty |
Next, FD 1 should point to what FD 3 currently points to, meaning FD 1 will point to the script's stderr. FD 1 is no longer captured. We have "swapped" FD 1 and FD 2. |
3>&- |
/dev/tty |
/dev/tty |
pipe |
|
Finally, we close FD 3 as it is no longer necessary. |
A little note: operation n>&m- is sometimes called moving FD m to FD n.
This way what the script writes to FD 2 (normally stderr) will be written to stdout because of the second redirection. What the script writes to FD 1 (normally stdout) will be written to stderr because of the first and third redirections. Stdout and stderr got replaced. Done.
It's possible, although considerably harder, to let stdout "fall through" to wherever it would've gone if there hadn't been any redirection. This involves "saving" the current value of stdout, so that it can be used inside the command substitution:
In the last example above, note that 1>&3- duplicates FD 3 and stores a copy in FD 1, and then closes FD 3. It could also be written 1>&3 3>&-.
What you cannot do is capture stdout in one variable, and stderr in another, using only FD redirections. You must use a temporary file (or a named pipe) to achieve that one.
Well, you can use a horrible hack like:
Obviously, this is not robust, because either the standard output or the standard error of the command could contain whatever separator string you employ.
And if you want the exit code of your cmd (here a modification in the case of if the cmd stdout nothing)
Note: the original question read, "How can I store the return value of a command in a variable?" This was, verbatim, an actual question asked in #bash, ambiguity and all.
3. How can I sort or compare files based on some metadata attribute (newest / oldest modification time, size, etc)?
The tempting solution is to use ls to output sorted filenames and operate on the results using e.g. awk. As usual, the ls approach cannot be made robust and should never be used in scripts due in part to the possibility of arbitrary characters (including newlines) present in filenames. Therefore, we need some other way to compare file metadata.
The most common requirements are to get the most or least recently modified, or largest or smallest files in a directory. Bash and all ksh variants can compare modification times (mtime) using the -nt and -ot operators of the conditional expression compound command:
Or to find the oldest:
Keep in mind that mtime on directories is that of the most recently added, removed, or renamed file in that directory. Also note that -nt and -ot are not specified by POSIX test, but many shells such as dash include them anyway. No bourne-like shell has analogous operators for comparing by atime or ctime, so one would need external utilities for that; however, it's nearly impossible to either produce output which can be safely parsed, or handle said output in a shell without using nonstandard features on both ends.
If the sorting criteria are different from "oldest or newest file by mtime", then GNU find and GNU sort may be used together to produce a sorted list of filenames + timestamps, delimited by NUL characters. This will operate recursively by default. GNU find's -maxdepth operator can limit the search depth to 1 directory if needed. Here are a few possibilities, which can be modified as necessary to use atime or ctime, or to sort in reverse order:
One disadvantage to these approaches is that the entire list is sorted, whereas simply iterating through the list to find the minimum or maximum timestamp (assuming we want just one file) would be faster. However, depending on the size of the job, the algorithmic disadvantage of sorting may be negligible in comparison to the overhead of using a shell.
Similar usage patterns work well on many kinds of filesystem meta-data. This example gets the largest file in each subdirectory recursively. This is a common pattern for performing a calculation on a collection of files in each directory.
Readers who are asking this question in order to rotate their log files may wish to look into logrotate(1) instead, if their operating system provides it.
4. How can I check whether a directory is empty or not? How do I check for any *.mpg files, or count how many there are?
In Bash, you can count files safely and easily with the nullglob and dotglob options (which change the behaviour of globbing), and an array:
See ArithmeticExpression for explanations of arithmetic commands.
Of course, you can use any glob you like instead of *. E.g. *.mpg or /my/music/*.mpg works fine.
Bear in mind that you need read permission on the directory, or it will always appear empty.
Some people dislike nullglob because having unmatched globs vanish altogether confuses programs like ls. Mistyping ls *.zip as ls *.zpi may cause every file to be displayed (for such cases consider setting failglob). Setting nullglob in a SubShell avoids accidentally changing its setting in the rest of the shell, at the price of an extra fork(). If you'd like to avoid having to set and unset shell options, you can pour it all into a SubShell:
The other disadvantage of this approach (besides the extra fork()) is that the array is lost when the subshell exits. If you planned to use those filenames later, then they have to be retrieved all over again.
Both of these examples expand a glob and store the resulting filenames into an array, and then check whether the number of elements in the array is 0. If you actually want to see how many files there are, just print the array's size instead of checking whether it's 0:
You can also avoid the nullglob if you're OK with putting a non-existing filename in the array should no files match (instead of an empty array):
Without nullglob, if there are no files in the directory, the glob will be added as the only element in the array. Since * is a valid filename, we can't simply check whether the array contains a literal *. So instead, we check whether the thing in the array exists as a file. The -L test is required because -e fails if the first file is a dangling symlink.
If you don't care how many matching files there are and don't want to store the results in an array, you can use bash's compgen command. Unfortunately, due to a bug, you need to use a hack to make it recognize dotglob:
Or you can use an extended glob:
You may also use failglob:
But, if you use failglob, note that the subshell is required; the following code does not work because failglob will raise a shell error that will cause bash to stop running the current command (including the if command, any outer compound command, and the entire function that ran this code if it is part of a function), so this will only work in the true case, the else branch will never run:
If you really want to avoid using the subshell and want to set failglob globally, you can either "catch" the shell error using command eval, or you can write a function that expands the glob indirectly:
1 shopt -s dotglob failglob
2 if command eval ': ./*' 2> /dev/null; then
3 echo "The current directory is not empty."
4 else
5 echo "The current directory is empty."
6 fi
7 # or
8 shopt -s dotglob failglob
9 any_match () { local IFS=; { : $@ ;} 2> /dev/null ;}
10 if any_match './*'; then
11 echo "The current directory is not empty."
12 else
13 echo "The current directory is empty."
14 fi
If your script needs to run with various non-Bash shell implementations, you can try using an external program like python, perl, or find; or you can try one of these. Note the "magic 3 globs"1 as POSIX does not have the dotglob option.
At this stage, the positional parameters have been loaded with the contents of the directory, and can be used for processing.
If you just want to count files:
In the Bourne shell, it's even worse, because there is no test -e or test -L:
Of course, that fails if * exists as something other than a plain file (such as a directory or FIFO). The absence of a -e test really hurts.
Here is another solution using find:
If you want it not to recurse, then you need to tell find not to recurse into directories. This gets really tricky and ugly. GNU find has a -maxdepth option to do it. With standard POSIX find, you're stuck with -prune. This is left as an exercise for the reader.
Never try to parse ls output. Even ls -A solutions can break (e.g. on HP-UX, if you are root, ls -A does the exact opposite of what it does if you're not root -- and no, I can't make up something that incredibly stupid).
In fact, one may wish to avoid the direct question altogether. Usually people want to know whether a directory is empty because they want to do something involving the files therein, etc. Look to the larger question. For example, one of these find-based examples may be an appropriate solution:
Most commonly, all that's really needed is something like this:
In other words, the person asking the question may have thought an explicit empty-directory test was needed to avoid an error message like mympgviewer: ./*.mpg: No such file or directory when in fact no such test is required.
Support for a nullglob-like feature is inconsistent. In ksh93 it can be done on a per-pattern basis by prefixing with ~(N)2:
5. How can I use array variables?
This answer assumes you have a basic understanding of what arrays are. If you're new to this kind of programming, you may wish to start with the guide's explanation. This page is more thorough. See links at the bottom for more resources.
Contents
5.1. Intro
One-dimensional integer-indexed arrays are implemented by Bash, Zsh, and most KornShell varieties including AT&T ksh88 or later, mksh, and pdksh. Arrays are not specified by POSIX and not available in legacy or minimalist shells such as BourneShell and Dash. The POSIX-compatible shells that do feature arrays mostly agree on their basic principles, but there are some significant differences in the details. Advanced users of multiple shells should be sure to research the specifics. Ksh93, Zsh, and Bash 4.0 additionally have Associative Arrays (see also FAQ 6). This article focuses on indexed arrays as they are the most common type.
Basic syntax summary (for bash, math indexed arrays):
a=(word1 word2 "$word3" ...) |
Initialize an array from a word list, indexed starting with 0 unless otherwise specified. |
a=(*.png *.jpg) |
Initialize an array with filenames. |
a[i]=word |
Set one element to word, evaluating the value of i in a math context to determine the index. |
a[i+1]=word |
Set one element, demonstrating that the index is also a math context. |
a[i]+=suffix |
Append suffix to the previous value of a[i] (bash 3.1). |
a+=(word ...) # append |
Modify an existing array without unsetting it, indexed starting at one greater than the highest indexed element unless otherwise specified (bash 3.1). |
a+=([3]=word3 word4 [i]+=word_i_suffix) |
|
unset 'a[i]' |
Unset one element. Note the mandatory quotes (a[i] is a valid glob). |
"${a[i]}" |
Reference one element. |
"$(( a[i] + 5 ))" |
Reference one element, in a math context. |
"${a[@]}" |
Expand all elements as a list of words. |
"${!a[@]}" |
Expand all indices as a list of words (bash 3.0). |
"${a[*]}" |
Expand all elements as a single word, with the first char of IFS as separator. |
"${#a[@]}" |
Number of elements (size, length). |
"${a[@]:start:len}" |
Expand a range of elements as a list of words, cf. string range. |
"${a[@]#trimstart}" "${a[@]%trimend}" |
Expand all elements as a list of words, with modifications applied to each element separately. |
declare -p a |
Show/dump the array, in a bash-reusable form. |
mapfile -t a < stream |
Initialize an array from a stream (bash 4.0). |
readarray -t a < stream |
Same as mapfile. |
"$a" |
Same as "${a[0]}". Does NOT expand to the entire array. This usage is considered confusing at best, but is usually a bug. |
Here is a typical usage pattern featuring an array named host:
"${!host[@]}" expands to the indices of of the host array, each as a separate word.
Indexed arrays are sparse, and elements may be inserted and deleted out of sequence.
1 # Bash/ksh
2
3 # Simple assignment syntax.
4 arr[0]=0
5 arr[2]=2
6 arr[1]=1
7 arr[42]='what was the question?'
8
9 # Unset the second element of "arr"
10 unset -v 'arr[2]'
11
12 # Concatenate the values, to a single argument separated by spaces, and echo the result.
13 echo "${arr[*]}"
14 # outputs: "0 1 what was the question?"
It is good practice to write your code in such a way that it can handle sparse arrays, even if you think you can guarantee that there will never be any "holes". Only treat arrays as "lists" if you're certain, and the savings in complexity is significant enough for it to be justified.
5.2. Loading values into an array
Assigning one element at a time is simple, and portable:
It's possible to assign multiple values to an array at once, but the syntax differs across shells. Bash supports only the arrName=(args...) syntax. ksh88 supports only the set -A arrName -- args... syntax. ksh93, mksh, and zsh support both. There are subtle differences in both methods between all of these shells if you look closely.
When initializing in this way, the first index will be 0 unless a different index is specified.
With compound assignment, the space between the parentheses is evaluated in the same way as the arguments to a command, including pathname expansion and WordSplitting. Any type of expansion or substitution may be used. All the usual quoting rules apply within.
With ksh88-style assignment using set, the arguments are just ordinary arguments to a command.
5.2.1. Loading lines from a file or stream
In bash 4, the mapfile command (also known as readarray) accomplishes this:
See ProcessSubstitution and FAQ #24 for more details on the <(...) syntax.
mapfile handles blank lines by inserting them as empty array elements, and (with -t) also silently appends a missing final newline if the input stream lacks one. These can be problematic when reading data in other ways (see the next section). mapfile in bash 4.0 through 4.3 does have one serious drawback: it can only handle newlines as line terminators. Bash 4.4 adds the -d option to supply a different line delimiter.
When mapfile isn't available, we have to work very hard to try to duplicate it. There are a great number of ways to almost get it right, but many of them fail in subtle ways.
The following examples will duplicate most of mapfile's basic functionality in older shells. You can skip all of these alternative examples if you have bash 4.
The += operator, when used together with parentheses, appends the element to one greater than the current highest numbered index in the array.
The square brackets create a math context. The result of the expression is the index used for assignment.
5.2.1.1. Handling newlines (or lack thereof) at the end of a file
read returns false when it reads the last line of a file. This presents a problem: if the file contains a trailing newline, then read will be false when reading/assigning that final line, otherwise, it will be false when reading/assigning the last line of data. Without a special check for these cases, no matter what logic is used, you will always end up either with an extra blank element in the resulting array, or a missing final element.
To be clear - text files should contain a newline as the last character in the file. Newlines are added to the ends of files by most text editors, and also by Here documents and Here strings. Most of the time, this is only an issue when reading output from pipes or process substitutions, or from "broken" text files created with broken or misconfigured tools. Let's look at some examples.
This approach reads the elements one by one, using a loop.
Unfortunately, if the file or input stream contains a trailing newline, a blank element is added at the end of the array, because the read -r arr[i++] is executed one extra time after the last line containing text before returning false.
The square brackets create a math context. Inside them, i++ works as a C programmer would expect (in all but ksh88).
This approach fails in the reverse case - it correctly handles blank lines and inputs terminated with a newline, but fails to record the last line of input. If the file or stream is missing its final newline. So we need to handle that case specially:
This is very close to the "final solution" we gave earlier -- handling both blank lines inside the file, and an unterminated final line. The null IFS is used to prevent read from stripping possible whitespace from the beginning and end of lines, in the event you wish to preserve them.
Another workaround is to remove the empty element after the loop:
Whether you prefer to read too many and then have to remove one, or read too few and then have to add one, is a personal choice.
NOTE: it is necessary to quote the 'arr[i++]' passed to read, so that the square brackets aren't interpreted as globs. This is also true for other non-keyword builtins that take a subscripted variable name, such as let and unset.
5.2.1.2. Other methods
Sometimes stripping blank lines actually is desirable, or you may know that the input will always be newline delimited, such as input generated internally by your script. It is possible in some shells to use the -d flag to set read's line delimiter to null, then abuse the -a or -A (depending on the shell) flag normally used for reading the fields of a line into an array for reading lines. Effectively, the entire input is treated as a single line, and the fields are newline-delimited.
5.2.1.3. Don't read lines with for!
Never read lines using for..in loops! Relying on IFS WordSplitting causes issues if you have repeated whitespace delimiters, because they will be consolidated. It is not possible to preserve blank lines by having them stored as empty array elements this way. Even worse, special globbing characters will be expanded without going to lengths to disable and then re-enable it. Just never use this approach - it is problematic, the workarounds are all ugly, and not all problems are solvable.
5.2.2. Reading NUL-delimited streams
If you are trying to deal with records that might have embedded newlines, you will be using an alternative delimiter such as the NUL character ( \0 ) to separate the records. In bash 4.4, you can simply use mapfile -t -d '':
Otherwise, you'll need to use the -d argument to read inside a loop:
read -d '' tells Bash to keep reading until a NUL byte instead of until a newline. This isn't certain to work in all shells with a -d feature.
If you choose to give a variable name to read instead of using REPLY then also be sure to set IFS= for the read command, to avoid trimming leading/trailing IFS whitespace.
5.2.3. Appending to an existing array
As previously mentioned, arrays are sparse - that is, numerically adjacent indexes are not guaranteed to be occupied by a value. This confuses what it means to "append" to an existing array. There are several approaches.
If you've been keeping track of the highest-numbered index with a variable (for example, as a side-effect of populating an array in a loop), and can guarantee it's correct, you can just use it and continue to ensure it remains in-sync.
If you don't want to keep an index variable, but happen to know that your array is not sparse, then you can use the number of elements to calculate the offset (not recommended):
If you don't know whether your array is sparse or not, but don't mind re-indexing the entire array (very inefficient), then you can use:
If you're in bash 3.1 or higher, then you can use the += operator:
NOTE: the parentheses are required, just as when assigning to an array. Otherwise you will end up appending to ${arr[0]} which $arr is a synonym for. If your shell supports this type of appending, it is the preferred method.
For examples of using arrays to hold complex shell commands, see FAQ #50 and FAQ #40.
5.3. Retrieving values from an array
${#arr[@]} or ${#arr[*]} expand to the number of elements in an array:
Single elements are retrieved by index:
1 echo "${foo[0]} - ${bar[j+1]}"
The square brackets are a math context. Within an arithmetic context, variables, including arrays, can be referenced by name. For example, in the expansion:
1 ${arr[x[3+arr[2]]]}
arr's index will be the value from the array x whose index is 3 plus the value of arr[2].
Using array elements en masse is one of the key features of shell arrays. In exactly the same way that "$@" is expanded for positional parameters, "${arr[@]}" is expanded to a list of words, one array element per word. For example,
This works even if the elements contain whitespace. You always end up with the same number of words as you have array elements.
If one simply wants to dump the full array, one element per line, this is the simplest approach:
For slightly more complex array-dumping, "${arr[*]}" will cause the elements to be concatenated together, with the first character of IFS (or a space if IFS isn't set) between them. As it happens, "$*" is expanded the same way for positional parameters.
Unfortunately, you can't put multiple characters in between array elements using that syntax. You would have to do something like this instead:
Or using array slicing, described in the next section.
This also shows how sparse arrays can be assigned multiple elements at once. Note using the arr=([key]=value ...) notation differs between shells. In ksh93, this syntax gives you an associative array by default unless you specify otherwise, and using it requires that every value be explicitly given an index, unlike bash, where omitted indexes begin at the previous index. This example was written in a way that's compatible between the two.
BASH 3.0 added the ability to retrieve the list of index values in an array:
Retrieving the indices is extremely important for certain kinds of tasks, such as maintaining parallel arrays with the same indices (a cheap way to mimic having an array of structs in a language with no struct):
1 # Bash 3.0 or higher
2 unset -v file title artist i
3 for f in ./*.mp3; do
4 file[i]=$f
5 title[i]=$(mp3info -p %t "$f")
6 artist[i++]=$(mp3info -p %a "$f")
7 done
8
9 # Later, iterate over every song.
10 # This works even if the arrays are sparse, just so long as they all have
11 # the SAME holes.
12 for i in "${!file[@]}"; do
13 echo "${file[i]} is ${title[i]} by ${artist[i]}"
14 done
5.3.1. Retrieving with modifications
Bash's Parameter Expansions may be performed on array elements en masse:
Parameter Expansion can also be used to extract sub-lists of elements from an array. Some people call this slicing:
The same goes for positional parameters
5.4. Using @ as a pseudo-array
As we see above, the @ array (the array of positional parameters) can be used almost like a regularly named array. This is the only array available for use in POSIX or Bourne shells. It has certain limitations: you cannot individually set or unset single elements, and it cannot be sparse. Nevertheless, it still makes certain POSIX shell tasks possible that would otherwise require external tools:
(Compare to FAQ #50's dynamically generated commands using named arrays.)
6. See Also
7. How can I use variable variables (indirect variables, pointers, references) or associative arrays?
This is a complex page, because it's a complex topic. It's been divided into roughly three parts: associative arrays, evaluating indirect variables, and assigning indirect variables. There are discussions of programming issues and concepts scattered throughout.
Contents
7.1. Associative Arrays
We introduce associative arrays first, because we observe that inexperienced programmers often conjure solutions to problems that would most typically utilize associative arrays by attempting to dynamically generate variables in a Hungarian notation scheme (in order to coerce the symbol table hashing function into resolving user-defined associative mappings).
An associative array is an unordered collection of key-value pairs. A value may be retrieved by supplying its corresponding key. Since strings are the only datatype most shells understand, associative arrays map strings to strings, unlike indexed arrays, which map integers to strings. Associative arrays exist in AWK as "associative arrays", in Perl as "hashes", in Tcl as "arrays", in Python and C# as "dictionaries", and in Java as a "Map", and in C++11 STL as std::unordered_map.
1 # Bash 4 / ksh93
2
3 typeset -A homedir # Declare associative array
4 homedir=( # Compound assignment
5 [jim]=/home/jim
6 [silvia]=/home/silvia
7 [alex]=/home/alex
8 )
9
10 homedir[ormaaj]=/home/ormaaj # Ordinary assignment adds another single element
11
12 for user in "${!homedir[@]}"; do # Enumerate all indices (user names)
13 printf 'Home directory of user %q is: %q\n' "$user" "${homedir[$user]}"
14 done
Prior to Bash 4 or if you can't use ksh93, your options are limited. Either move to another interpreter (awk, perl, python, ruby, tcl, ...) or re-evaluate your problem to simplify it. There are certain tasks for which associative arrays are a powerful and completely appropriate tool. There are others for which they are overkill, or simply unsuitable.
Suppose we have several subservient hosts with slightly different configuration, and that we want to ssh to each one and run slightly different commands. One way we could set it up would be to hard-code a bunch of ssh commands in per-hostname functions in a single script and just run them in series or in parallel. (Don't reject this out of hand! Simple is good.) Another way would be to store each group of commands as an element of an associative array keyed by the hostname:
This is the kind of approach we'd expect in a high-level language, where we can store hierarchical information in advanced data structures. The difficulty here is that we really want each element of the associative array to be a list or another array of command strings. But the shell simply doesn't permit that kind of data structure.
So, often it pays to step back and think in terms of shells rather than other programming languages. Aren't we just running a script on a remote host? Then why don't we just store the configuration sets as scripts? Then it's simple:
Now we've removed the need for associative arrays, and also the need to maintain a bunch of extremely horrible quoting issues. It is also easy to parallelize using GNU Parallel:
1 parallel ssh -- {/} bash "<" {} ::: /etc/myapp/*
7.1.1. Associative array hacks in older shells
Before you think of using eval to mimic associative arrays in an older shell (probably by creating a set of variable names like homedir_alex), try to think of a simpler or completely different approach that you could use instead. If this hack still seems to be the best thing to do, consider the following disadvantages:
- It's really hard to read, to keep track of, and to maintain.
The variable names must be a single line and match the RegularExpression ^[a-zA-Z_][a-zA-Z_0-9]*$ -- i.e., a variable name cannot contain arbitrary characters but only letters, digits, and underscores. We cannot have a variable's name contain Unix usernames, for instance -- consider a user named hong-hu. A dash '-' cannot be part of a variable name, so the entire attempt to make a variable named homedir_hong-hu is doomed from the start.
Quoting is hard to get right. If a content string (not a variable name) can contain whitespace characters and quotes, it's hard to quote it right to preserve it through both shell parsings. And that's just for constants, known at the time you write the program. (Bash's printf %q helps, but nothing analogous is available in POSIX shells.)
If the program handles unsanitized user input, it can be VERY dangerous!
Read BashGuide/Arrays or BashFAQ/005 for a more in-depth description and examples of how to use arrays in Bash.
If you need an associative array but your shell doesn't support them, please consider using AWK instead.
7.2. Indirection
7.2.1. Think before using indirection
Putting variable names or any other bash syntax inside parameters is frequently done incorrectly and in inappropriate situations to solve problems that have better solutions. It violates the separation between code and data, and as such puts you on a slippery slope toward bugs and security issues. Indirection can make your code less transparent and harder to follow.
Normally, in bash scripting, you won't need indirect references at all. Generally, people look at this for a solution when they don't understand or know about Bash Arrays or haven't fully considered other Bash features such as functions.
7.2.2. Evaluating indirect/reference variables
BASH allows for expanding parameters indirectly -- that is, one variable may contain the name of another variable. Name reference variables are the preferred method for performing variable indirection. Older versions of Bash could also use a ! prefix operator in parameter expansions for variable indirection. Namerefs should be used unless portability to older bash versions is required. No other shell uses ${!variable} for indirection and there are problems relating to use of that syntax for this purpose. It is also less flexible.
KornShell (ksh93) has a completely different, more powerful syntax -- the nameref command (also known as typeset -n):
Zsh allows you to access a parameter indirectly with the parameter expansion flag P:
Unfortunately, for shells other than Bash, ksh93, and zsh there is no syntax for evaluating a referenced variable. You would have to use eval, which means you would have to undergo extreme measures to sanitize your data to avoid catastrophe.
It's difficult to imagine a practical use for this that wouldn't be just as easily performed by using an associative array. But people ask it all the time (it is genuinely a frequently asked question).
ksh93's nameref allows us to work with references to arrays, as well as regular scalar variables. For example,
zsh's ability to nest parameter expansions allow for referencing arrays too:
We are not aware of any trick that can duplicate that functionality in POSIX or Bourne shells without eval, which can be difficult to do securely. Older versions of Bash can almost do it -- some indirect array tricks work, and others do not, and we do not know whether the syntax involved will remain stable in future releases. So, consider this a use at your own risk hack.
1 # Bash -- trick #2. Seems to work in bash 3 and up.
2 # Can't be combined with special expansions until 4.3. e.g. "${!tmp##*/}"
3 # Does NOT work in bash 2.05b -- Expands to one word instead of three in bash 2.
4 tmp=${ref}[@]
5 printf '<%s> ' "${!tmp}"; echo # Iterate whole array as one word per element.
It is not possible to retrieve array indices directly using the Bash ${!var} indirect expansion.
7.2.3. Assigning indirect/reference variables
Sometimes you'd like to "point" from one variable to another, for purposes of writing information to a dynamically configurable place. Typically this happens when you're trying to write a "reusable" function or library, and you want it to put its output in a variable of the caller's choice instead of the function's choice. (Various traits of Bash make safe reusability of Bash functions difficult at best, so this is something that should not happen often.)
Assigning a value "through" a reference (I'm going to use "ref" from now on) is more widely possible, but the means of doing so are usually extremely shell-specific. All shells with the sole exception of AT&T ksh93 lack real reference variables or pointers. Indirection can only be achieved by indirectly evaluating variable names. IOW, you can never have a real unambiguous reference to an object in memory, the best you can do is use the name of a variable to try simulating the effect. Therefore, you must control the value of the ref and ensure side-effects such as globbing, user-input, and conflicting local parameters can't affect parameter names. Names must either be deterministic or validated in a way that makes certain guarantees. If an end user can populate the ref variable with arbitrary strings, the result can be unexpected code injection. We'll show an example of this at the end.
In ksh93, we can use nameref again:
In zsh, using parameter expansions ::= and expansion flags P:
In Bash, if you only want to assign a single line to the variable, you can use read and Bash's here string syntax:
If you need to assign multiline values, keep reading.
A similar trick works for Bash array variables too:
IFS is used to delimit words, so you may or may not need to set that. Also note that the read command will return failure because there is no terminating NUL byte for the -d '' to catch. Be prepared to ignore that failure.
Another trick is to use Bash's printf -v (only available in recent versions):
You can use all of printf's formatting capabilities. This trick also permits any string content, including embedded and trailing newlines.
Yet another trick is Korn shell's typeset or Bash's declare. The details of typeset vary greatly between shells, but can be used in compatible ways in limited situations. Both of them cause a variable to become locally scoped to a function, if used inside a function; but if used outside all functions, they can operate on global variables.
Bash 4.2 adds declare -g which assigns variables to the global scope from any context.
There is very little advantage to typeset or declare over eval for scalar assignments, but many drawbacks. typeset cannot be made to not affect the scope of the assigned variable. This trick does preserve the exact contents, like eval, if correctly escaped.
You must still be careful about what is on the left-hand side of the assignment. Inside square brackets, expansions are still performed; thus, with a tainted ref, declare or printf -v can be just as dangerous as eval:
This problem also exists with typeset in mksh and pdksh, but apparently not ksh93. This is why the value of ref must be under your control at all times.
If you aren't using Bash or Korn shell, you can do assignments to referenced variables using HereDocument syntax:
(Alas, read without -d means we're back to only getting at most one line of content. This is the most portable trick, but it's limited to single-line content.)
Remember that when using a here document, if the sentinel word (EOF in our example) is unquoted, then parameter expansions will be performed inside the body. If the sentinel is quoted, then parameter expansions are not performed. Use whichever is more convenient for your task.
7.2.3.1. eval
This expands to the statement that is executed:
1 myVar=$value
The right-hand side is not parsed by the shell, so there is no danger of unwanted side effects. The drawback, here, is that every single shell metacharacter on the right hand side of the = must be quoted/escaped carefully. In the example shown here, there was only one. In a more complex situation, there could be dozens.
This is very often done incorrectly. Permutations like these are seen frequently all over the web even from experienced users that ought to know better:
The good news is that if you can sanitize the right hand side correctly, this trick is fully portable, has no variable scope issues, and allows all content including newlines. The bad news is that if you fail to sanitize the right hand side correctly, you have a massive security hole. Use eval if you know what you're doing and are very careful.
The following code demonstrates how to correctly pass a scalar variable name into a function by reference for the purpose of "returning" a value:
7.3. See Also
8. Is there a function to return the length of a string?
The fastest way, not requiring external programs (but not usable in Bourne shells):
(note that with bash 3 and above, that's the number of characters, not bytes, which is a significant differences in multi-byte locales. Behaviour of other shells in that regard vary).
or for Bourne shells:
(expr prints the number of characters or bytes matching the pattern .*, which is the length of the string (in bytes for GNU expr). The x is necessary to avoid problems with $varname values that are expr operators)
or:
(BSD/GNU expr only)
This second version is not specified in POSIX, so is not portable across all platforms.
One may also use awk:
(there, whether the length is expressed in bytes or characters depends on the implementation (for instance, it's characters for GNU awk, but bytes for mawk).
Similar needs:
Expands to the number of elements in an array.
Expands to the length of the array's element i.
9. How can I recursively search all files for a string?
If you are on a typical GNU or BSD system, all you need is one of these:
If your grep lacks a -r option, you can use find to do the recursion:
This command is slower than it needs to be, because find will call grep with only one file name, resulting in many grep invocations (one per file). Since grep accepts multiple file names on the command line, find can be instructed to call it with several file names at once:
The trailing '+' character instructs find to call grep with as many file names as possible, saving processes and resulting in faster execution. This example works for POSIX-2008 find, which most current operating systems have, but which may not be available on legacy systems.
Traditional Unix has a helper program called xargs for the same purpose:
However, if your filenames contain spaces, quotes or other metacharacters, this will fail catastrophically. BSD/GNU xargs has a -print0 option:
1 find . -type f -print0 | xargs -0 grep -l -- "$search"
The -print0 / -0 options ensure that any file name can be processed, even one containing blanks, TAB characters, or newlines.
10. What is buffering? Or, why does my command line produce no output: tail -f logfile | grep 'foo bar' | awk ...
Most standard Unix commands buffer their output when used non-interactively. This means that they don't write each character (or even each line) immediately, but instead collect a larger number of characters (often 4 kilobytes) before printing anything at all. In the case above, the grep command buffers its output, and therefore awk only gets its input in large chunks.
Buffering greatly increases the efficiency of I/O operations, and it's usually done in a way that doesn't visibly affect the user. A simple tail -f from an interactive terminal session works just fine, but when a command is part of a complicated pipeline, the command might not recognize that the final output is needed in (near) real time. Fortunately, there are several techniques available for controlling I/O buffering behavior.
The most important thing to understand about buffering is that it's the writer who's doing it, not the reader.
10.0.1. Eliminate unnecessary commands
In the question, we have the pipeline tail -f logfile | grep 'foo bar' | awk ... (with the actual AWK command being unspecified). There is no problem if we simply run tail -f logfile, because tail -f never buffers its output. Nor is there a problem if we run tail -f logfile | grep 'foo bar' interactively, because grep does not buffer its output if its standard output is a terminal. However, if the output of grep is being piped into something else (such as an AWK command), it starts buffering to improve efficiency.
In this particular example, the grep is actually redundant. We can remove it, and have AWK perform the filtering in addition to whatever else it's doing:
1 tail -f logfile | awk '/foo bar/ ...'
In other cases, this sort of consolidation may not be possible. But you should always look for the simplest solution first.
10.0.2. Your command may already support unbuffered output
Some programs provide special command line options specifically for this sort of problem:
awk (GNU awk, nawk, busybox awk, mawk) |
use the fflush() function. It will be defined in POSIX Issue 8 |
awk (mawk) |
-W interactive |
find (GNU) |
use -printf with the \c escape |
grep (e.g. GNU version 2.5.1) |
--line-buffered |
jq |
--unbuffered |
python |
-u |
sed (e.g. GNU version 4.0.6) |
-u,--unbuffered |
tcpdump, tethereal |
-l |
Each command that writes to a pipe would have to be told to disable buffering, in order for the entire pipeline to run in (near) real time. The last command in the pipeline, if it's writing to a terminal, will not typically need any special consideration.
10.0.3. Disabling buffering in a C application
If the buffering application is written in C, and is either your own or one whose source you can modify, you can disable the buffering with:
1 setvbuf(stdout, NULL, _IONBF, 0);
Often, you can simply add this at the top of the main() function, but if the application closes and reopens stdout, or explicitly calls setvbuf() later, you may need to exercise more discretion.
10.0.4. unbuffer
The expect package has an unbuffer program which effectively tricks other programs into always behaving as if they were being used interactively (which may often disable buffering). Here's a simple example:
1 tail -f logfile | unbuffer grep 'foo bar' | awk ...
expect and unbuffer are not standard POSIX tools, but they may already be installed on your system.
10.0.5. stdbuf
Recent versions of GNU coreutils (from 7.5 onwards) come with a nice utility called stdbuf, which can be used among other things to "unbuffer" the standard output of a command. Here's the basic usage for our example:
1 tail -f logfile | stdbuf -oL grep 'foo bar' | awk ...
In the above code, -oL makes stdout line buffered; you can even use -o0 to entirely disable buffering. The man and info pages have all the details.
stdbuf is not a standard POSIX tool, but it may already be installed on your system (if you're using a recent GNU/Linux distribution, it will probably be present).
10.0.6. less
If you simply wanted to highlight the search term, rather than filter out non-matching lines, you can use the less program instead of a filtered tail -f:
1 less program.log
Inside less, start a search with the '/' command (similar to searching in vi). Or start less with the -p pattern option.
- This should highlight any instances of the search term.
Now put less into "follow" mode, which by default is bound to shift+f.
- You should get an unfiltered tail of the specified file, with the search term highlighted.
"Follow" mode is stopped with an interrupt, which is probably control+c on your system. The '/' command accepts regular expressions, so you could do things like highlight the entire line on which a term appears. For details, consult man less.
10.0.7. coproc
If you're using ksh or Bash 4.0+, whatever you're really trying to do with tail -f might benefit from using coproc and fflush() to create a coprocess. Note well that coproc does not itself address buffering issues (in fact it's prone to buffering problems -- hence the reference to fflush). coproc is only mentioned here because whenever someone is trying to continuously monitor and react to a still-growing file (or pipe), they might be trying to do something which would benefit from coprocesses.
10.0.8. Further reading
11. How can I recreate a directory hierarchy structure, without the files?
With the cpio program:
or with the pax program:
or with GNU tar, and more verbose syntax:
This creates a list of directory names with find, non-recursively adds just the directories to an archive, and pipes it to a second tar instance to extract it at the target location. As you can see, tar is the least suited to this task, but people just adore it, so it has to be included here to appease the tar fanboy crowd. (Note: you can't even do this at all with a typical Unix tar. Also note: there is no such thing as "standard tar", as both tar and cpio were intentionally omitted from POSIX in favor of pax.)
All the solutions above will fail if directory names contain newline characters. On many modern BSD/GNU systems, at least, they can be trivially modified to cope with that, by using find -print0 and one of pax -0 or cpio -0 or tar --null (check your system documentation to see which of these commands you have, and which extensions are available). If you really don't have access to those options, you can probably, at least, use ! -path $'*\n*' -type d -print, or better -name $'*\n*' -prune -o -type d -print (instead of -type d -print) to ignore directories that contain newline characters in their path, making sure find is run in the C/POSIX locale to also exclude file paths containing newline characters as well as sequence of bytes not forming valid characters in the user's locale.
with find
or with bash 4's globstar
(though beware that will also copy symlinks to directories as directories; older versions of bash would also traverse symlinks when crawling the directory tree).
or with zsh's recursive globbing and glob qualifiers:
If you want to create stub files instead of full-sized files, the following is likely to be the simplest solution. The find command recreates the regular files using "dummy" files (empty files with the same timestamps):
If your find can't handle -exec + then you can use \; instead of + at the end of the command. See UsingFind for explanations.
12. How can I print the n'th line of a file?
One dirty (but not quick) way is:
1 sed -n "${n}p" "$file"
But this reads the entire file even if only the third line is desired, which can be avoided by using the q command to quit on line $n, and deleting all other lines with the d command:
1 sed "${n}q;d" "$file"
Another method is to grab lines starting at n, then get the first line of that.
1 tail -n "+$n" "$file" | head -n 1
Another approach, using AWK:
1 awk -v n="$n" 'NR==n{print;exit}' "$file"
If more than one line is needed, it's easy to adapt any of the previous methods:
Or a counter with a simple read loop:
To read into a variable, it is preferable to use read or mapfile rather than an external utility. More than one line can be read into the given array variable or the default array MAPFILE by adjusting the argument to mapfile's -n option:
12.1. See Also
13. How do I invoke a shell command from a non-shell application?
You can use the shell's -c option to run the shell with the sole purpose of executing a short bit of script:
1 sh -c 'echo "Hi! This is a short script."'
This is usually pretty useless without a means of passing data to it. The best way to pass bits of data to your shell is to pass them as positional arguments:
1 sh -c 'echo "Hi! This short script was run with the arguments: $@"' -- "foo" "bar"
Notice the -- before the actual positional parameters. The first argument you pass to the shell process (that isn't the argument to the -c option) will be placed in $0. Positional parameters start at $1, so we put a little placeholder in $0. This can be anything you like; in the example, we use the generic --.
This technique is used often in shell scripting, when trying to have a non-shell CLI utility execute some bash code, such as with find(1):
1 find /foo -name '*.bar' -exec bash -c 'mv "$1" "${1%.bar}.jpg"' -- {} \;
Here, we ask find to run the bash command for every *.bar file it finds, passing it to the bash process as the first positional parameter. The bash process runs the mv command after doing some Parameter Expansion on the first positional parameter in order to rename our file's extension from bar to jpg.
Alternatively, if your non-shell application allows you to set environment variables, you can do that, and then read them using normal variables of the same name.
Similarly, suppose a program (e.g. a file manager) lets you define an external command that an argument will be appended to, but you need that argument somewhere in the middle. In that case:
#!/bin/sh sh -c 'command foo "$1" bar' -- "$@"
13.1. Calling shell functions
Only a shell can call a shell function. So constructs like this won't work:
If your shell function is defined in a file, you can invoke a shell which sources that file, and then calls the function:
1 find . -type f -exec bash -c 'source /my/bash/function; my_bash_function "$@"' _ {} +
(See UsingFind for explanations.)
Bash also permits function definitions to be exported through the environment. So, if your function is defined within your current shell, you can export it to make it available to the new shell which find invokes:
This works ONLY in bash, and isn't without problems. The maximum length of the function code cannot exceed the max size of an environment variable, which is platform-specific. Functions from the environment can be a security risk as well because bash simply scans environment variables for values that fit the form of a shell function, and has no way of knowing who exported the function, or whether a value that happens to look like a function really is one. It is generally a better idea to retrieve the function definition and put it directly into the code. This technique is also more portable.
Note that ksh93 uses a completely different approach for sourcing functions. See the manpage for the FPATH variable. Bash doesn't use FPATH, but the source tree includes 3 different examples of how to emulate it: http://git.savannah.gnu.org/cgit/bash.git/tree/examples/functions/autoload.v3
This technique is also necessary when calling the shell function on a remote system (e.g. over ssh). Sourcing a file containing the function definition on the remote system will work, if such a file is available. If no such file is available, the only viable approach is to ask the current shell to spit out the function definition, and feed that to the remote shell over the ssh channel:
Care must be taken when writing a script to send through ssh. Ssh works like eval, with the same concerns; see that FAQ for details.
14. How can I concatenate two variables? How do I append a string to a variable?
There is no (explicit) concatenation operator for strings (either literal or variable dereferences) in the shell; you just write them adjacent to each other:
1 var=$var1$var2
If the right-hand side contains whitespace characters, it needs to be quoted:
1 var="$var1 - $var2"
If you're appending a string that doesn't "look like" part of a variable name, you just smoosh it all together:
1 var=$var1/.-
Otherwise, braces or quotes may be used to disambiguate the right-hand side:
CommandSubstitution can be used as well. The following line creates a log file name logname containing the current date, resulting in names like e.g. log.2004-07-26:
1 logname="log.$(date +%Y-%m-%d)"
There's no difference when the variable name is reused, either. A variable's value (the string it holds) may be reassigned at will:
1 string="$string more data here"
Concatenating arrays is also possible:
1 var=( "${arr1[@]}" "${arr2[@]}" )
Bash 3.1 has a new += operator that you may see from time to time:
1 string+=" more data here" # EXTREMELY non-portable!
It's generally best to use the portable syntax.
15. How can I redirect the output of multiple commands at once?
Redirecting the standard output of a single command is as easy as:
1 date > file
To redirect standard error:
1 date 2> file
To redirect both:
1 date > file 2>&1
or, a fancier way:
Redirecting an entire loop:
However, this can become tedious if the output of many programs should be redirected. If all output of a script should go into a file (e.g. a log file), the exec command can be used:
(See FAQ 106 for more complex script logging techniques.)
Otherwise, command grouping helps:
In this example, the output of all commands within the curly braces is redirected to the file messages.log.
In-depth: Illustrated Tutorial
16. How can I run a command on all files with the extension .gz?
Often a command already accepts several files as arguments, e.g.
1 zcat -- *.gz
On some systems, you would use gzcat instead of zcat. If neither is available, or if you don't care to play guessing games, just use gzip -dc instead.
The -- prevents a filename beginning with a hyphen from causing unexpected results.
If an explicit loop is desired, or if your command does not accept multiple filename arguments in one invocation, the for loop can be used:
To do it recursively, use find:
If you need to process the files inside your shell for some reason, then read the find results in a loop:
This example uses ProcessSubstitution (see also FAQ #24), although a pipe may also be suitable in many cases. However, it does not correctly handle filenames that contain newlines. To handle arbitrary filenames, see FAQ #20.
17. How can I use a logical AND/OR/NOT in a shell pattern (glob)?
"Globs" are simple patterns that can be used to match filenames or strings. They're generally not very powerful. If you need more power, there are a few options available.
If you want to operate on all the files that match glob A or glob B, just put them both on the same command line:
1 rm -- *.bak *.old
If you want to use a logical OR in just part of a glob (larger than a single charcter -- for which square-bracketed character classes suffice), in Bash, you can use BraceExpansion:
1 rm -- *.{bak,old}
If you need something still more general/powerful, in KornShell or BASH you can use extended globs. In Bash, you'll need the extglob option to be set. It can be checked with:
1 shopt extglob
and set with:
1 shopt -s extglob
To warm up, we'll move all files starting with foo AND not ending with .d to directory foo_thursday.d:
1 mv foo!(*.d) foo_thursday.d
A more complex example -- delete all files containing Pink_Floyd AND not containing The_Final_Cut:
1 rm !(!(*Pink_Floyd*)|*The_Final_Cut*)
By the way: these kind of patterns can be used with the KornShell, too. They don't have to be enabled there, but are the default patterns.
18. How can I group expressions in an if statement, e.g. if (A AND B) OR C?
The portable (POSIX or Bourne) way is to use multiple test (or [) commands:
When they are shell operators between commands (as opposed to the [[...]] operators), && and || have equal precedence, so processing is left to right.
If we need explicit grouping, then we can use curly braces:
What we should not do is try to use the -a or -o operators of the test command, because the results are undefined.
BASH, zsh and the KornShell have different, more powerful comparison commands with slightly different (easier) quoting:
ArithmeticExpression for arithmetic expressions, and
NewTestCommand for string (and file) expressions.
Examples:
or
Note that contrary to the && and || shell operators, the && operator in ((...)) and [[...]] has precedence over the || operator (same goes for ['s -a over -o), so for instance:
1 [ a = a ] || [ b = c ] && [ c = d ]
is false because it's like:
1 { [ a = a ] || [ b = c ]; } && [ c = d ]
(left to right association, no precedence), while
1 [[ a = a || b = c && c = d ]]
is true because it's like:
1 [[ a = a || ( b = c && c = d ) ]]
(&& has precedence over ||).
Note that the distinction between numeric and string comparisons is strict. Consider the following example:
The output will be "ERROR: ....", because in a string comparison "3" is bigger than "10", because "3" already comes after "1", and the next character "0" is not considered. Changing the square brackets to double parentheses (( )) makes the example work as expected.
19. How can I use numbers with leading zeros in a loop, e.g. 01, 02?
As always, there are many different ways to solve the problem, each with its own advantages and disadvantages. The most important considerations are which shell you're using, whether the start/end numbers are constants, and how many times the loop is going to iterate.
19.1. Brace expansion
If you're in bash/zsh/ksh, and if the start and end numbers are constants, and if there aren't too many of them, you can use BraceExpansion. Bash version 4 allows zero-padding and ranges in its brace expansion:
In Bash 3, you can use ranges inside brace expansion (but not zero-padding). Thus, the same thing can be accomplished more concisely like this:
Another bash 3 example, for output of 0000 to 0034:
In ksh and in older bash versions, where the leading zeroes are not supported directly by brace expansion, you might still be able to approximate it:
19.2. Formatting with printf
The most important drawback with BraceExpansion is that the whole list of numbers is generated and held in memory all at once. If there are only a few thousand numbers, that may not be so bad, but if you're looping millions of times, you would need a lot of memory to hold the fully expanded list of numbers.
The printf command (which is a Bash builtin, and is also POSIX standard), can be used to format a number, including zero-padding. The bash builtin can also assign the formatted result to a shell variable (in recent versions), without forking a SubShell.
If all you want to do is print the sequence of numbers, and you're in bash/ksh/zsh, and the sequence is fairly small, you can use the implicit looping feature of printf together with a brace expansion:
If you're in bash 3.1 or higher, you can use a C-style for loop together with printf -v to format the numbers into a variable:
Brace expansion requires constant starting and ending values. If you don't know in advance what the start and end values are, you can cheat:
The eval is required in Bash because brace expansions occur before parameter expansions.
The traditional Csh implementation, which all other applicable shells follow, insert the brace expansion pass sometime between the processing of other expansions and pathname expansion, thus parameter expansion has already been performed by the time words are scanned for brace expansion. There are various pros and cons to Bash's implementation, this being probably the most frequently cited drawback. Given how messy that eval solution is, please give serious thought to using a for or while loop with shell arithmetic instead.
19.3. Ksh formatted brace expansion
The ksh93 method for specifying field width for sequence expansion is to add a (limited) printf format string to the syntax, which is used to format each expanded word. This is somewhat more powerful, but unfortunately incompatible with bash, and ksh does not understand Bash's field padding scheme:
ksh93 also has a variable attribute that specifies a field with to pad with leading zeros whenever the variable is referenced. The concept is similar to other attributes supported by Bash such as case modification. Note that ksh never interprets octal literals.
19.4. External programs
If the command seq(1) is available (it's part of GNU sh-utils/coreutils), you can use it as follows:
1 seq -w 1 10
or, for arbitrary numbers of leading zeros (here: 3):
1 seq -f "%03g" 1 10
Combining printf with seq(1), you can do things like this:
(That may be helpful if you are not using Bash, but you have seq(1), and your version of seq(1) lacks printf-style format specifiers. That's a pretty odd set of restrictions, but I suppose it's theoretically possible. Since seq is a nonstandard external tool, it's good to keep your options open.)
Be warned however that using seq might be considered bad style; it's even mentioned in Don't Ever Do These.
Some BSD-derived systems have jot(1) instead of seq(1). In accordance with the glorious tradition of Unix, it has a completely incompatible syntax:
Finally, the following example works with any BourneShell derived shell (which also has expr and sed) to zero-pad each line to three bytes:
In this example, the number of '.' inside the parentheses in the sed command determines how many total bytes from the echo command (at the end of each line) will be kept and printed.
But if you're going to rely on an external Unix command, you might as well just do the whole thing in awk in the first place:
Now, since the number one reason this question is asked is for downloading images in bulk, you can use the examples above with xargs(1) and wget(1) to fetch files:
1 almost any example above | xargs -i% wget $LOCATION/%
The xargs -i% will read a line of input at a time, and replace the % at the end of the command with the input.
Or, a simpler example using a for loop:
Or, avoiding the subshells (requires bash 3.1):
20. How can I split a file into line ranges, e.g. lines 1-10, 11-20, 21-30?
POSIX specifies the split utility, which can be used for this purpose:
1 split -l 10 input.txt
For more flexibility you can use sed. The sed command can print e.g. the line number range 1-10:
The d command stops sed from printing each line. This could alternatively have been done by passing sed the -n option and printing lines with the p command rather than deleting them with d. It makes no difference.
We can now use this to print an arbitrary range of a file (specified by line number):
1 # POSIX shell
2 file=/etc/passwd
3 range=10
4 cur=1
5 last=$(awk 'END { print NR }' < "$file") # count number of lines
6 chunk=1
7 while [ "$cur" -lt "$last" ]
8 do
9 endofchunk=$((cur + range - 1))
10 sed -n -e "$cur,${endofchunk}p" -e "${endofchunk}q" "$file" > c"hunk.$(printf %04d "$chunk")"
11 chunk=$((chunk + 1))
12 cur=$((cur + range))
13 done
The previous example uses POSIX arithmetic, which older Bourne shells do not have. In that case the following example should be used instead:
1 # legacy Bourne shell; assume no printf either
2 file=/etc/passwd
3 range=10
4 cur=1
5 last=`awk 'END { print NR }' < "$file"` # count number of lines
6 chunk=1
7 while test "$cur" -lt "$last"
8 do
9 endofchunk=`expr $cur + $range - 1`
10 sed -n -e "$cur,${endofchunk}p" -e "${endofchunk}q" "$file" > "chunk.$chunk"
11 chunk=`expr "$chunk" + 1`
12 cur=`expr "$cur" + "$range"`
13 done
Awk can also be used to produce a more or less equivalent result:
1 awk -v range=10 '{print > FILENAME "." (int((NR -1)/ range)+1)}' file
21. How can I find and safely handle file names containing newlines, spaces or both?
First and foremost, to understand why you're having trouble, read Arguments to get a grasp on how the shell understands the statements you give it. It is vital that you grasp this matter well if you're going to be doing anything with the shell.
The preferred method to deal with arbitrary filenames is still to use find(1):
find ... -exec command {} \;
or, if you need to handle filenames en masse:
find ... -exec command {} +
xargs is rarely ever more useful than the above, but if you really insist, remember to use -0 (-0 is not in the POSIX standard, but is implemented by GNU and BSD systems):
# Requires GNU/BSD find and xargs find ... -print0 | xargs -r0 command # Never use xargs without -0 or similar extensions!
Use one of these unless you really can't.
Another way to deal with files with spaces in their names is to use the shell's filename expansion (globbing). This has the disadvantage of not working recursively (except with zsh's extensions or bash 4's globstar), and it normally does not include hidden files (filenames beginning with "."). But if you just need to process all the files in a single directory, and omitting hidden files is okay, it works fantastically well.
For example, this code renames all the *.mp3 files in the current directory to use underscores in their names instead of spaces (this uses the bash/ksh extension allowing "/" in parameter expansion):
# Bash/ksh for file in ./*\ *.mp3; do if [ -e "$file" ] ; then # Make sure it isn't an empty match mv "$file" "${file// /_}" fi done
You can omit the "if..." and "fi" lines if you're certain that at least one path will match the glob. The problem is that if the glob doesn't match, instead of looping 0 times (as you might expect), the loop will execute once with the unexpanded pattern (which is usually not what you want). You can also use the bash extension "shopt -s nullglob" to make empty globs expand to nothing, and then again you can omit the if and fi.
For more examples of renaming files, see FAQ #30.
Remember, you need to quote all your Parameter Expansions using double quotes. If you don't, the expansion will undergo WordSplitting (see also argument splitting and BashPitfalls). Also, always prefix globs with "/" or "./"; otherwise, if there's a file with "-" as the first character, the expansions might be misinterpreted as options.
Another way to handle filenames recursively involves using the -print0 option of find (a GNU/BSD extension), together with bash's -d extended option for read:
# Bash unset a i while IFS= read -r -d $'\0' file; do a[i++]="$file" # or however you want to process each file done < <(find /tmp -type f -print0)
The preceding example reads all the files under /tmp (recursively) into an array, even if they have newlines or other whitespace in their names, by forcing read to use the NUL byte (\0) as its line delimiter. Since NUL is not a valid byte in Unix filenames, this is the safest approach besides using find -exec. IFS= is required to avoid trimming leading/trailing whitespace, and -r is needed to avoid backslash processing. In fact, $'\0' is actually the empty string (bash doesn't support passing NUL bytes to commands even built-in ones) so we could also write it like this:
# Bash unset a i while IFS= read -r -d '' file; do a[i++]="$file" done < <(find /tmp -type f -print0)
So, why doesn't this work?
# DOES NOT WORK unset a i find /tmp -type f -print0 | while IFS= read -r -d '' file; do a[i++]="$file" done
Because of the pipeline, the entire while loop is executed in a SubShell and therefore the array assignments will be lost after the loop terminates. (For more details about this, see FAQ #24.)
For a longer discussion about handling filenames in shell, see Filenames and Pathnames in Shell: How to do it Correctly.
22. How can I replace a string with another string in a variable, a stream, a file, or in all the files in a directory?
There are a number of techniques for this. Which one to use depends on many factors, the biggest of which is what we're editing. This page also contains contradictory advice from multiple authors. This is a deeply ugly topic, and there are no universally right answers (but plenty of universally wrong ones).
Contents
22.1. Files
Before you start, be warned that editing files is a really bad idea. The preferred way to modify a file is to create a new file within the same file system, write the modified content into it, and then mv it to the original name. This is the only way to prevent data loss in the event of a crash while writing. However, using a temp file and mv means that you break hardlinks to the file (unavoidably), that you would convert a symlink to hard file, and that you may need to take extra steps to transfer the ownership and permissions (and possibly other metadata) of the original file to the new file. Some people prefer to roll the dice and accept the tiny possibility of data loss versus the greater possibility of hardlink loss and the inconvenience of chown/chmod (and potentially setfattr, setfacl, chattr...).
The other major problem you're going to face is that all of the standard Unix tools for editing files expect some kind of regular expression as the search pattern. If you're passing input you did not create as the search pattern, it may contain syntax that breaks the program's parser, which can lead to failures, or CodeInjection exploits.
22.1.1. Just Tell Me What To Do
If your search string or your replacement string comes from an external source (environment variable, argument, file, user input) and is therefore not under your control, then this is your best choice:
in="$search" out="$replace" perl -pi -e 's/\Q$ENV{"in"}/$ENV{"out"}/g' ./*
That will operate on all of the files in the current directory. If you want to operate on a full hierarchy (recursively), then:
in="$search" out="$replace" find . -type f -exec \ perl -pi -e 's/\Q$ENV{"in"}/$ENV{"out"}/g' -- {} +
You may of course supply additional options to find to restrict which files are replaced; see UsingFind for more information.
The critical reader may note that these commands use perl which is not a standard tool. That's because none of the standard tools can do this task safely.
If you're stuck using standard tools due to a restricted execution environment, then you'll have to weigh the options below and choose the one that will do the least amount of damage to your files.
22.1.2. Using a file editor
The only standard tools that actually edit a file are ed and ex (vi is the visual mode for ex).
ed is the standard UNIX command-based editor. ex is another standard command-line editor. Here are some commonly-used syntaxes for replacing the string olddomain.com by the string newdomain.com in a file named file. All four commands do the same thing, with varying degrees of portability and efficiency:
## Ex ex -sc '%s/olddomain\.com/newdomain.com/g|x' file ## Ed # Bash ed -s file <<< $'g/olddomain\\.com/s//newdomain.com/g\nw\nq' # Bourne (with printf) printf '%s\n' 'g/olddomain\.com/s//newdomain.com/g' w q | ed -s file printf 'g/olddomain\\.com/s//newdomain.com/g\nw\nq' | ed -s file # Bourne (without printf) ed -s file <<! g/olddomain\\.com/s//newdomain.com/g w q !
To replace a string in all files of the current directory, just wrap one of the above in a loop:
for file in ./*; do [[ -f $file ]] && ed -s "$file" <<< $'g/old/s//new/g\nw\nq' done
To do this recursively, the easy way would be to enable globstar in bash 4 (shopt -s globstar, a good idea to put this in your ~/.bashrc) and use:
# Bash 4+ (shopt -s globstar) for file in ./**; do [[ -f $file ]] && ed -s "$file" <<< $'g/old/s//new/g\nw\nq' done
If you don't have bash 4, you can use find. Unfortunately, it's a bit tedious to feed ed stdin for each file hit:
find . -type f -exec sh -c 'for f do ed -s "$f" <<! g/old/s//new/g w q ! done' sh {} +
Since ex takes its commands from the command-line, it's less painful to invoke from find:
find . -type f -exec ex -sc '%s/old/new/g|x' {} \;
Beware though, if your ex is provided by vim, it may get stuck for files that don't contain an old. In that case, you'd add the e option to ignore those files. When vim is your ex, you can also use argdo and find's {} + to minimize the amount of ex processes to run:
# Bash 4+ (shopt -s globstar) ex -sc 'argdo %s/old/new/ge|x' ./** # Bourne find . -type f -exec ex -sc 'argdo %s/old/new/ge|x' {} +
You can also ask for confirmation for every replacement from A to B. You will need to type y or n every time. Please note that the A is used twice in the command. This approach is good when wrong replacements may happen (working with a natural language, for example) and the data set is small enough.
find . -type f -name '*.txt' -exec grep -q 'A' {} \; -exec vim -c '%s/A/B/gc' -c 'wq' {} \;
22.1.3. Using a temporary file
If shell variables are used as the search and/or replace strings, ed is not suitable. Nor is sed, or any tool that uses regular expressions. Consider using the awk code at the bottom of this FAQ with redirections, and mv.
gsub_literal "$search" "$rep" < "$file" > tmp && mv -- tmp "$file"
# Using GNU tools to preseve ownership/group/permissions gsub_literal "$search" "$rep" < "$file" > tmp && chown --reference="$file" tmp && chmod --reference="$file" tmp && mv -- tmp "$file"
22.1.4. Using nonstandard tools
sed is a Stream EDitor, not a file editor. Nevertheless, people everywhere tend to abuse it for trying to edit files. It doesn't edit files. GNU sed (and some BSD seds) have a -i option that makes a copy and replaces the original file with the copy. An expensive operation, but if you enjoy unportable code, I/O overhead and bad side effects (such as destroying symlinks), and CodeInjection exploits, this would be an option:
sed -i 's/old/new/g' ./* # GNU, OpenBSD sed -i '' 's/old/new/g' ./* # FreeBSD
Those of you who have perl 5 can accomplish the same thing using this code:
perl -pi -e 's/old/new/g' ./*
Recursively using find:
find . -type f -exec perl -pi -e 's/old/new/g' -- {} \; # if your find doesn't have + yet find . -type f -exec perl -pi -e 's/old/new/g' -- {} + # if it does
If you want to delete lines instead of making substitutions:
# Deletes any line containing the perl regex foo perl -ni -e 'print unless /foo/' ./*
To replace for example all "unsigned" with "unsigned long", if it is not "unsigned int" or "unsigned long" ...:
find . -type f -exec perl -i.bak -pne \ 's/\bunsigned\b(?!\s+(int|short|long|char))/unsigned long/g' -- {} \;
All of the examples above use regular expressions, which means they have the same issue as the sed code earlier; trying to embed shell variables in them is a terrible idea, and treating an arbitrary value as a literal string is painful at best.
If the inputs are not under your direct control, you can pass them as variables into both search and replace strings with no unquoting or potential for conflict with sigil characters:
in="$search" out="$replace" perl -pi -e 's/\Q$ENV{"in"}/$ENV{"out"}/g' ./*
Or, wrapped in a useful shell function:
# Bash # usage: replace FROM TO [file ...] replace() { in=$1 out=$2 perl -p ${3+'-i'} -e 's/\Q$ENV{"in"}/$ENV{"out"}/g' -- "${@:3}" }
This wrapper passes perl's -i option if there are any filenames, so that they are "edited in-place" (or at least as far as perl does such a thing -- see the perl documentation for details).
22.2. Variables
If you want to replace content within a variable, this can (and should) be done very simply with Bash's parameter expansion:
# Bash var='some string' var=${var//some/another}
However, if the replacement string is in a variable, one must be cautious. There are inconsistent behaviors across different versions of bash.
# Bash var='some string' search=some; rep=another # Assignments work consistently. Note the quotes. var=${var//"$search"/"$rep"} # Expansions outside of assigments are not consistent. echo "${var//"$search"/"$rep"}" # Works in bash 4.3 and later. echo "${var//"$search"/$rep}" # Works in bash 5.1 and earlier.
The quotes around "$search" prevent the contents of the variable from being treated as a shell pattern (also called a glob). Of course, if pattern matching is intended, do not include the quotes.
In Bash 4.2 and earlier, if you quote $rep in ${var//"$search"/"$rep"} the quotes will be inserted literally.
In Bash 5.2, you must either quote $rep in ${var//"$search"/"$rep"} or disable patsub_replacement (shopt -u patsub_replacement) because otherwise & characters in $rep will be substituted by $search.
The only way to get a consistent and correct result across all versions of bash is to use a temporary variable:
# Bash tmp=${var//"$search"/"$rep"} echo "$tmp"
For compatibility with Bash 4.2 and earlier, make sure you do not put quotes around the assignment's right hand side.
# In bash 4.2, this fails. You get literal quotes in the result. tmp="${var//"$search"/"$rep"}"
Replacements within a variable are even harder in POSIX sh:
# POSIX function # usage: string_rep SEARCH REPL STRING # replaces all instances of SEARCH with REPL in STRING string_rep() { # initialize vars in=$3 unset -v out # SEARCH must not be empty case $1 in '') return; esac while # break loop if SEARCH is no longer in "$in" case "$in" in *"$1"*) ;; *) break;; esac do # append everything in "$in", up to the first instance of SEARCH, and REP, to "$out" out=$out${in%%"$1"*}$2 # remove everything up to and including the first instance of SEARCH from "$in" in=${in#*"$1"} done # append whatever is left in "$in" after the last instance of SEARCH to out, and print printf '%s%s\n' "$out" "$in" } var=$(string_rep "$search" "$rep" "$var") # Note: POSIX does not have a way to localize variables. Most shells (even dash and # busybox), however, do. Feel free to localize the variables if your shell supports # it. Even if it does not, if you call the function with var=$(string_rep ...), the # function will be run in a subshell and any assignments it makes will not persist.
22.3. Streams
If you wish to modify a stream, and if your search and replace strings are known in advance, then use the stream editor:
some_command | sed 's/foo/bar/g'
sed uses regular expressions. In our example, foo and bar are literal strings. If they were variables (e.g. user input), they would have to be rigorously escaped in order to prevent errors. This is very impractical, and attempting to do so will make your code extremely prone to bugs. Embedding shell variables in sed commands is never a good idea -- it is a prime source of CodeInjection bugs.
You could also do it in Bash itself, by combining a parameter expansion with Faq #1:
search=foo rep=bar while IFS= read -r line; do printf '%s\n' "${line//"$search"/"$rep"}" done < <(some_command) # or some_command | while IFS= read -r line; do printf '%s\n' "${line//"$search"/"$rep"}" done
If you want to do more processing than just a simple search/replace, this may be the best option. Note that the last example runs the loop in a SubShell. See Faq #24 for more information on that.
You may notice, however, that the bash loop above is very slow for large data sets. So how do we find something faster, that can replace literal strings? Well, you could use awk. The following function replaces all instances of STR with REP, reading from stdin and writing to stdout.
# usage: gsub_literal STR REP # replaces all instances of STR with REP. reads from stdin and writes to stdout. gsub_literal() { # STR cannot be empty [[ $1 ]] || return str=$1 rep=$2 awk ' # get the length of the search string BEGIN { str = ENVIRON["str"] rep = ENVIRON["rep"] len = length(str); } { # empty the output string out = ""; # continue looping while the search string is in the line while (i = index($0, str)) { # append everything up to the search string, and the replacement string out = out substr($0, 1, i-1) rep; # remove everything up to and including the first instance of the # search string from the line $0 = substr($0, i + len); } # append whatever is left out = out $0; print out; } ' } some_command | gsub_literal "$search" "$rep" # condensed as a one-liner: some_command | s=$search r=$rep awk 'BEGIN {s=ENVIRON["s"]; r=ENVIRON["r"]; l=length(s)} {o=""; while (i=index($0, s)) {o=o substr($0,1,i-1) r; $0=substr($0,i+l)} print o $0}'
23. How can I calculate with floating point numbers instead of just integers?
BASH's builtin arithmetic uses integers only:
$ printf '%s\n' "$((10 / 3))" 3
For most operations involving non-integer numbers, an external program must be used, e.g. bc, AWK or dc:
$ printf 'scale=3; 10/3\n' | bc 3.333
The "scale=3" command notifies bc that three digits of precision after the decimal point are required.
Same example with dc (reverse polish calculator, lighter than bc):
$ printf '3 k 10 3 / p\n' | dc 3.333
k sets the precision to 3, and p prints the value of the top of the stack with a newline. The stack is not altered, though.
If you are trying to compare non-integer numbers (less-than or greater-than), and you have GNU bc, you can do this:
# Bash and GNU bc if (( $(bc <<<'1.4 < 2.5') )); then printf '1.4 is less than 2.5.\n' fi
However, x < y is not supported by all versions of bc:
# HP-UX 10.20. imadev:~$ bc <<<'1 < 2' syntax error on line 1,
If you want to be portable, you need something more subtle:
# POSIX case $(printf '%s - %s\n' 1.4 2.5 | bc) in -*) printf '1.4 is less than 2.5\n' ;; esac
This example subtracts 2.5 from 1.4, and checks the sign of the result. If it is negative, the first number is less than the second. We aren't actually treating bc's output as a number; we're treating it as a string, and only looking at the first character.
Legacy (Bourne) version:
# Bourne case "`echo "1.4 - 2.5" | bc`" in -*) echo "1.4 is less than 2.5";; esac
AWK can be used for calculations, too:
$ awk 'BEGIN {printf "%.3f\n", 10 / 3}' 3.333
There is a subtle but important difference between the bc and the awk solution here: bc reads commands and expressions from standard input. awk on the other hand evaluates the expression as part of the program. Expressions on standard input are not evaluated, i.e. echo 10/3 | awk '{print $0}' will print 10/3 instead of the evaluated result of the expression.
ksh93, zsh and yash have support for non-integers in shell arithmetic. zsh (in the zsh/mathfunc module) and ksh93 additionally have support for some C99 math.h functions sin() or cos() as well as user-defined math functions callable using C syntax. So many of these calculations can be done natively in ksh or zsh:
# ksh93/zsh/yash $ LC_NUMERIC=C; printf '%s\n' "$((3.00000000000/7))" 0.428571428571428571
(ksh93 and yash are sensitive to locale. In ksh93, a dotted decimal used in locales where the decimal separator character is not dot will fail, like in German, Spanish, French locales... In yash, the locale's decimal radix is only honoured in the result of arithmetic expansions.).
Comparing two non-integer numbers for equality is potentially an unwise thing to do. Similar calculations that are mathematically equivalent and which you would expect to give the same result on paper may give ever-so-slightly-different non-integer numeric results due to rounding/truncation and other issues. If you wish to determine whether two non-integer numbers are "the same", you may either:
- Round them both to a desired level of precision, and then compare the rounded results for equality; or
Subtract one from the other and compare the absolute value of the difference against an epsilon value of your choice.
Be sure to output adequate precision to fully express the actual value. Ideally, use hex float literals, which are supported by Bash.
$ ksh93 -c 'LC_NUMERIC=C printf "%-20s %f %.20f %a\n" "error accumulation:" .1+.1+.1+.1+.1+.1+.1+.1+.1+.1{,,} constant: 1.0{,,}' error accumulation: 1.000000 1.00000000000000000011 0x1.0000000000000002000000000000p+0 constant: 1.000000 1.00000000000000000000 0x1.0000000000000000000000000000p+0
One of the very few things that Bash actually can do with non-integer numbers is round them, using printf:
# Bash 3.1 # See if a and b are close to each other. # Round each one to two decimal places and compare results as strings. a=3.002 b=2.998 printf -v a1 %.2f "$a" printf -v b1 %.2f "$b" if [[ $a1 = "$b1" ]]; then printf 'a and b are roughly the same\n' fi
Many problems that look like non-integer arithmetic can in fact be solved using integers only, and thus do not require these tools -- e.g., problems dealing with rational numbers. For example, to check whether two numbers x and y are in a ratio of 4:3 or 16:9 you may use something along these lines:
# Bash # Variables x and y are integers if (( (x * 9 - y * 16) == 0 )) ; then printf '16:9.\n' elif (( (x * 3 - y * 4) == 0 )) ; then printf '4:3.\n' else printf 'Neither 16:9 nor 4:3.\n' fi
A more elaborate test could tell if the ratio is closest to 4:3 or 16:9 without using non-integer arithmetic. Note that this very simple example that apparently involves non-integer numbers and division is solved with integers and no division. If possible, it's usually more efficient to convert your problem to integer arithmetic than to use non-integer arithmetic.
24. I want to launch an interactive shell that has special aliases and functions, not the ones in the user's ~/.bashrc.
When starting bash in non-POSIX mode, specify a different start-up file with --rcfile:
bash --rcfile /my/custom/bashrc
Or:
bash --rcfile <(printf %s 'my; commands; here')
Or:
~ $ bash --rcfile /dev/fd/9 -i 9<<<'cowsay moo' _____ < moo > ----- \ ^__^ \ (oo)\_______ (__)\ )\/\ ||----w | || || +bash-4.3$ exit exit
For POSIX-compatible shells, use the ENV environment variable:
~ $ ( { ENV=/dev/fd/9 exec -a sh bash -i; } 9<<<'echo yo' ) yo +sh-4.3$ exit exit
Unfortunately, ENV only works in bash and zsh when executed in their respective POSIX modes. Confusingly, Bash also has BASH_ENV, which only works in non-posix mode, and only in non-interactive shells.
24.1. Variant question: ''I have a script that sets up an environment, and I want to give the user control at the end of it.''
Put exec bash at the end of it to launch an interactive shell. This shell will inherit the environment variables and open FDs but none of the shell's internal state such as functions or aliases, since the shell process is being replaced by a new instance. Of course, you must also make sure that your script runs in a terminal -- otherwise, you must create one, for example, by using exec xterm -e bash.
25. I set variables in a loop that's in a pipeline. Why do they disappear after the loop terminates? Or, why can't I pipe data to read?
In most shells, each command of a pipeline is executed in a separate SubShell. Non-working example:
# Works only in ksh88/ksh93, or zsh or bash 4.2 with lastpipe enabled # In other shells, this will print 0 linecount=0 printf '%s\n' foo bar | while IFS= read -r line do linecount=$((linecount + 1)) done echo "total number of lines: $linecount"
The reason for this potentially surprising behaviour, as described above, is that each SubShell introduces a new variable context and environment. The while loop above is executed in a new subshell with its own copy of the variable linecount created with the initial value of '0' taken from the parent shell. This copy then is used for counting. When the while loop is finished, the subshell copy is discarded, and the original variable linecount of the parent (whose value hasn't changed) is used in the echo command.
Different shells exhibit different behaviors in this situation:
BourneShell creates a subshell when the input or output of anything (loops, case etc..) but a simple command is redirected, either by using a pipeline or by a redirection operator ('<', '>').
BASH, Yash and PDKsh-derived shells create a new process only if the loop is part of a pipeline.
KornShell and Zsh creates it only if the loop is part of a pipeline, but not if the loop is the last part of it. The read example above actually works in ksh88, ksh93, zsh! (but not MKsh or other PDKsh-derived shells)
POSIX specifies the bash behaviour, but as an extension allows any or all of the parts of the pipeline to run without a subshell (thus permitting the KornShell behaviour, as well).
More broken stuff:
# Bash 4 # The problem also occurs without a loop printf '%s\n' foo bar | mapfile -t line printf 'total number of lines: %s\n' "${#line[@]}" # prints 0
f() { if [[ -t 0 ]]; then echo "$1" else read -r var fi } f 'hello' | f echo "$var" # prints nothing
Again, in both cases the pipeline causes read or some containing command to run in a subshell, so its effect is never witnessed in the parent process.
It should be stressed that this issue isn't specific to loops. It's a general property of all pipes, though the while/read loop might be considered the canonical example that crops up over and over when people read the help or manpage description of the read builtin and notice that it accepts data on stdin. They might recall that data redirected into a compound command is available throughout that command, but not understand why all the fancy process substitutions and redirects they run across in places like FAQ #1 are necessary. Naturally they proceed to put their funstuff directly into a pipeline, and confusion ensues.
25.1. Workarounds
- If the input is a file, a simple redirect will suffice:
# POSIX while IFS= read -r line; do linecount=$((linecount + 1)); done < file echo "$linecount"
Unfortunately, this doesn't work with a Bourne shell; see sh(1) from the Heirloom Bourne Shell for a workaround.
Use command grouping and do everything in the subshell:
# POSIX linecount=0 cat /etc/passwd | { while IFS= read -r line do linecount=$((linecount + 1)) done echo "total number of lines: $linecount" }
This doesn't really change the subshell situation, but if nothing from the subshell is needed in the rest of your code then destroying the local environment after you're through with it could be just what you want anyway.Use ProcessSubstitution (Bash/Zsh/Ksh93 only):
# Bash/Ksh93/Zsh while IFS= read -r line do ((linecount++)) done < <(grep PATH /etc/profile) echo "total number of lines: $linecount"
This is essentially identical to the first workaround above. We still redirect a file, only this time the file happens to be a named pipe temporarily created by our process substitution to transport the output of grep.Use a named pipe:
# POSIX mkfifo mypipe grep PATH /etc/profile > mypipe & while IFS= read -r line do linecount=$((linecount + 1)) done < mypipe echo "total number of lines: $linecount"
Use a coprocess (ksh, even pdksh, oksh, mksh..):
# ksh grep PATH /etc/profile |& while IFS= read -r -p line do linecount=$((linecount + 1)) done echo "total number of lines: $linecount"
# bash>4 coproc grep PATH /etc/profile while IFS= read -r line do linecount=$((linecount + 1)) done <&"${COPROC[0]}" echo "total number of lines: $linecount"
Use a HereString (Bash/Zsh/Ksh93 only, though the example uses the Bash-specific read -a (Ksh93 and Zsh using read -A instead)):
# Options: # -r Backslash does not act as an escape character for the word separators or line delimiter. # -a The words are assigned to sequential indices of the array "words" read -ra words <<< 'hi ho hum' printf 'total number of words: %d\n' "${#words[@]}"
The <<< operator is available in Bash (2.05b and later), Zsh (where it was first introduced inspired from a similar operator in the Unix port of the rc shell), Ksh93 and Yash.
- With a POSIX shell, or for longer multi-line data, you can use a here document instead:
# POSIX linecount=0 while IFS= read -r line; do linecount=$((linecount+1)) done <<EOF hi ho hum EOF printf 'total number of lines: %d\n' "$linecount"
- Use lastpipe (Bash 4.2)
# Bash 4.2 # +m: Disable monitor mode (job control). Background processes display their # exit status upon completion when in monitor mode (we don't want that). set +m shopt -s lastpipe x=0 printf '%s\n' hi{,,,,,} | while IFS= read -r "lines[x++]"; do :; done printf 'total number of lines: %d\n' "${#lines[@]}"
Bash 4.2 introduces the aforementioned ksh-like behavior to Bash. The one caveat is that job control must not be enabled, thereby limiting its usefulness in an interactive shell.
For more related examples of how to read input and break it into words, see FAQ #1.
26. How can I access positional parameters after $9?
Use ${10} instead of $10. This works for BASH and KornShell, but not for older BourneShell implementations. Another way to access arbitrary positional parameters after $9 is to use for, e.g. to get the last parameter:
# Bourne for last do : # nothing done echo "last argument is: $last"
To get an argument by number, we can use a counter:
# Bourne n=12 # This is the number of the argument we are interested in i=1 for arg do if test $i -eq $n then argn=$arg break fi i=`expr $i + 1` done echo "argument number $n is: $argn"
This has the advantage of not "consuming" the arguments. If this is no problem, the shift command discards the first positional arguments:
shift 11 echo "the 12th argument is: $1"
and can be put into a helpful function:
# Bourne getarg() { # $1 is argno shift $1 && echo "$1" } arg12="`getarg 12 "$@"`"
In addition, bash and ksh93 treat the set of positional parameters as an array, and you may use parameter expansion syntax to address those elements in a variety of ways:
# Bash, ksh93 for x in "${@:(-2)}" # iterate over the last 2 parameters for y in "${@:2}" # iterate over all parameters starting at $2 # which may be useful if we don't want to shift
Although direct access to any positional argument is possible this way, it's seldom needed. The common alternative is to use getopts to process options (e.g. "-l", or "-o filename"), and then use either for or while to process all the remaining arguments in turn. An explanation of how to process command line arguments is available in FAQ #35, and another is found at http://www.shelldorado.com/goodcoding/cmdargs.html
27. How can I randomize (shuffle) the order of lines in a file? Or select a random line from a file, or select a random file from a directory?
To randomize the lines of a file, here is one approach. This one involves generating a random number, which is prefixed to each line; then sorting the resulting lines, and removing the numbers.
RANDOM is supported by BASH and KornShell, but is not defined by POSIX.
Here's the same idea (printing random numbers in front of a line, and sorting the lines on that column) using other programs:
This is (possibly) faster than the previous solution, but will not work for very old AWK implementations (try nawk, or gawk, or /usr/xpg4/bin/awk if available). (Note that AWK uses the epoch time as a seed for srand(), which may or may not be random enough for you.)
Other non-portable utilities that can shuffle/randomize a file:
GNU shuf (in recent enough GNU coreutils)
GNU sort -R (coreutils 6.9)
For more details, please see their manuals.
27.1. Shuffling an array
A generalized version of this question might be, How can I shuffle the elements of an array? If we don't want to use the rather clumsy approach of sorting lines, this is actually more complex than it appears. A naive approach would give us badly biased results. A more complex (and correct) algorithm looks like this:
1 # Uses a global array variable. Must be compact (not a sparse array).
2 # Bash syntax.
3 shuffle() {
4 local i tmp size max rand
5
6 size=${#array[@]}
7 for ((i=size-1; i>0; i--)); do
8 # RANDOM % (i+1) is biased because of the limited range of $RANDOM
9 # Compensate by using a range which is a multiple of the rand modulus.
10
11 max=$(( 32768 / (i+1) * (i+1) ))
12 while (( (rand=RANDOM) >= max )); do :; done
13 rand=$(( rand % (i+1) ))
14 tmp=${array[i]} array[i]=${array[rand]} array[rand]=$tmp
15 done
16 }
This function shuffles the elements of an array in-place using the Knuth-Fisher-Yates shuffle algorithm.
If we just want the unbiased random number picking function, we can split that out separately:
This rand function is better than using $((RANDOM % n)). For simplicity, many of the remaining examples on this page may use the modulus approach. In all such cases, switching to the use of the rand function will give better results; this improvement is left as an exercise for the reader.
27.2. Selecting a random line/file
Another question we frequently see is, How can I print a random line from a file?
There are two main approaches to this:
Count the number of lines n, select a random integer r from 1 to n, and print line r.
- Read line by line, selecting lines with a varying probability as we go along.
27.2.1. With counting lines first
The simpler approach is to count lines first.
1 # Bash
2 n=$(wc -l <"$file") # Count number of lines.
3 r=$((RANDOM % n + 1)) # Random number from 1..n (see warnings above!)
4 sed -n "$r{p;q;}" "$file" # Print the r'th line.
5
6 # POSIX with (new) AWK
7 awk -v n="$(wc -l <"$file")" \
8 'BEGIN{srand();l=int((rand()*n)+1)} NR==l{print;exit}' "$file"
(See FAQ 11 for more info about printing the r'th line.)
The next example sucks the entire file into memory. This approach saves time rereading the file, but obviously uses more memory. (Arguably: on systems with sufficient memory and an effective disk cache, you've read the file into memory by the earlier methods, unless there's insufficient memory to do so, in which case you shouldn't, QED.)
Note that we don't add 1 to the random number in this example, because the array of lines is indexed counting from 0.
Also, some people want to choose a random file from a directory (for a signature on an e-mail, or to choose a random song to play, or a random image to display, etc.). A similar technique can be used:
27.2.2. Without counting lines first
If you happen to have GNU shuf you can use that, but it is not portable.
Without shuf, we have to write some code ourselves. If we want n random lines we need to:
- accept the first n lines
- accept each further line with probability n/nl where nl is the number of lines read so far
- if we accepted the line in step 2, replace a random one of the n lines we already have
1 # WARNING: srand() without an argument seeds using the current time accurate to the second.
2 # If run more than once in a single second on the clock you will get the same output.
3 # Find a better way to seed this.
4
5 n=$1
6 shift
7
8 awk -v n="$n" '
9 BEGIN { srand() }
10 NR <= n { lines[NR - 1 ] = $0; next }
11 rand() < n / NR { lines[int(rand() * n)] = $0 }
12 END { for (k in lines) print lines[k] }
13 ' "$@"
Bash and POSIX sh solutions forthcoming.
27.3. Known bugs
http://lists.gnu.org/archive/html/bug-bash/2010-01/msg00042.html points out a surprising pitfall concerning the use of RANDOM without a leading $ in certain mathematical contexts. (Upshot: you should prefer n=$((...math...)); ((array[n]++)) over ((array[...math...]++)) in almost every case.)
- Behavior described appears reversed in current versions of mksh, ksh93, Bash, and Zsh. Still something to keep in mind for legacy. -ormaaj
27.4. Using external random data sources
Some people feel the shell's builtin RANDOM parameter is not sufficiently random for their applications. Typically this will be an interface to the C library's rand(3) function, although the Bash manual does not specify the implementation details. Some people feel their application requires cryptographically stronger random data, which would have to be supplied by some external source.
Before we explore this, we should point out that often people want to do this as a first step in writing some sort of random password generator. If that is your goal, you should at least consider using a password generator that has already been written, such as pwgen.
Now, if we're considering the use of external random data sources in a Bash script, we face several issues:
- The data source will probably not be portable. Thus, the script will only be usable in special environments.
- If we simply grab a byte (or a single group of bytes large enough to span the desired range) from the data source and do a modulus on it, we will run into the bias issue described earlier on this page. There is absolutely no point in using an expensive external data source if we're just going to bias the results with sloppy code! To work around that, we may need to grab bytes (or groups of bytes) repeatedly, until we get one that can be used without bias.
Bash can't handle raw bytes very well, so each time we grab a byte (or group) we need to do something to it to turn it into a number that Bash can read. This may be an expensive operation. So, it may be more efficient to grab several bytes (or groups), and do the conversion to readable numbers, all at once.
- Depending on the data source, these random bytes may be precious, so grabbing a lot of them all at once and discarding ones we don't use might be more expensive (by whatever metric we're using to measure such costs) than grabbing one byte at a time, even counting the conversion step. This is something you'll have to decide for yourself, taking into account the needs of your application, and the nature of your data source.
At this point you should be seriously rethinking your decision to do this in Bash. Other languages already have features that take care of all these issues for you, and you may be much better off writing your application in one of those languages instead.
You're still here? OK. Let's suppose we're going to use the /dev/urandom device (found on most Linux and BSD systems) as an external random data source in Bash. This is a character device which produces raw bytes of "pretty random" data. First, we'll note that the script will only work on systems where this is present. In fact, you should add an explicit check for this device somewhere early in the script, and abort if it's not found.
Now, how can we turn these bytes of data into numbers for Bash? If we attempt to read a byte into a variable, a NUL byte would give us a variable which appears to be empty. However, since no other input gives us that result, this may be acceptable -- an empty variable means we tried to read a NUL. We can work with this. The good news is we won't have to fork an od(1) or any other external program to read bytes. Then, since we're reading one byte at a time, this also means we don't have to write any prefetching or buffering code to save forks.
One other gotcha, however: reading bytes only works in the C locale. If we try this in en_US.utf8 we get an empty variable for every byte from 128 to 255, which is clearly no good.
So, let's put this all together and see what we've got:
1 #!/usr/bin/env bash
2 # Requires Bash 3.1 or higher, and an OS with /dev/urandom (Linux, BSD, etc.)
3
4 export LANG=C
5 if [[ ! -e /dev/urandom ]]; then
6 echo "No /dev/urandom on this system" >&2
7 exit 1
8 fi
9
10 # Return an unbiased random number from 0 to ($1 - 1) in variable 'r'.
11 rand() {
12 if (($1 > 256)); then
13 echo "Argument larger than 256 currently unsupported" >&2
14 r=-1
15 return 1
16 fi
17
18 local max=$((256 / $1 * $1))
19 while IFS= read -r -n1 -d '' r < /dev/urandom
20 printf -v r %d "'$r"
21 ((r >= max))
22 do
23 :
24 done
25 r=$((r % $1))
26 }
This uses a trick from FAQ 71 for converting bytes to numbers. When the variable populated by read is empty (because of a NUL byte), we get 0, which is just what we want.
Extending this to handle ranges larger than 0..255 is left as an exercise for the reader.
27.4.1. Awk as a source of seeded pseudorandom numbers
Sometimes we don't actually want truly random numbers. In some applications, we want a reproducible stream of pseudorandom numbers. We achieve this by using a pseudorandom number generator (PRNG) and "seeding" it with a known value. The PRNG then produces the same stream of output numbers each time, for that seed.
Bash's RANDOM works this way (assigning to it seeds the PRNG that bash uses internally), but for this example, we're going to use awk instead. Awk's rand() function returns a floating point value, so we don't run into the biasing issue that we have with bash's RANDOM. Also, awk is much faster than bash, so really it's just the better choice.
For this example, we'll set up the awk as a background process using ProcessSubstitution. We will read from it by maintaining an open FileDescriptor connected to awk's standard output.
# Bash list=(ox zebra cat buffalo giraffe salamander) n=${#list[@]} exec 3< <( awk -v seed=31337 -v n="$n" \ 'BEGIN {srand(seed); while (1) {print int(n*rand())}}' ) # Print a "random" list element every second, using the background awk stream # as a seeded PRNG. while true; do read -r r <&3 printf %s\\n "${list[r]}" sleep 1 done
Each time the program is run, the same results will be printed. Changing the seed value will change the results.
If you don't want the same results every time, you can change srand(seed); to srand(); in the awk program. Awk will then seed its PRNG using the current epoch timestamp.
28. How can two unrelated processes communicate?
Two unrelated processes cannot use the arguments, the environment or stdin/stdout to communicate; some form of inter-process communication (IPC) is required.
28.1. A file
Process A writes in a file, and Process B reads the file. This method is not synchronized and therefore is not safe if B can read the file while A writes in it. A lockdir or a signal can probably help.
28.2. A directory as a lock
mkdir can be used to test for the existence of a dir and create it in one atomic operation; it thus can be used as a lock, although not a very efficient one.
Script A:
until mkdir /tmp/dir;do # wait until we can create the dir sleep 1 done echo foo > file # write in the file this section is critical rmdir /tmp/dir # remove the lock
Script B:
until mkdir /tmp/dir;do #wait until we can create the dir sleep 1 done read var < file # read in the file this section is, critical echo "$var" # Script A cannot write in the file rmdir /tmp/dir # remove the lock
See Faq #45 and mutex for more examples with a lock directory.
28.3. Signals
Signals are probably the simplest form of IPC:
ScriptA:
trap 'flag=go' USR1 #set up the signal handler for the USR1 signal # echo $$ > /tmp/ScriptA.pid #if we want to save the pid in a file flag="" while [[ $flag != go ]]; do # wait for the green light from Script B sleep 1; done echo we received the signal
You must find or know the pid of the other script to send it a signal using kill:
#kill all the pkill -USR1 -f ScriptA #if ScriptA saved its pid in a file kill -USR1 $(</var/run/ScriptA.pid) #if ScriptA is a child: ScriptA & pid=$! kill -USR1 $pid
The first 2 methods are not bullet proof and will cause trouble if you run more than one instance of scriptA.
28.4. Named Pipes
Named pipes are a much richer form of IPC. They are described on their own page: NamedPipes.
29. How do I determine the location of my script? I want to read some config files from the same place.
There are two prime reasons why this issue comes up: either you want to externalize data or configuration of your script and need a way to find these external resources, or your script is intended to act upon a bundle of some sort (eg. a build script), and needs to find the resources to act upon.
It is important to realize that in the general case, this problem has no solution. Any approach you might have heard of, and any approach that will be detailed below, has flaws and will only work in specific cases. First and foremost, try to avoid the problem entirely by not depending on the location of your script!
Before we dive into solutions, let's clear up some misunderstandings. It is important to understand that:
Your script does not actually have a location! Wherever the bytes end up coming from, there is no "one canonical path" for it. Never.
$0 is NOT the answer to your problem. If you think it is, you can either stop reading and write more bugs, or you can accept this and read on.
29.1. I need to access my data/config files
Very often, people want to make their scripts configurable. The separation principle teaches us that it's a good idea to keep configuration and code separate. The problem then ends up being: how does my script know where to find the user's configuration file for it?
Too often, people believe the configuration of a script should reside in the same directory where they put their script. This is the root of the problem.
A UNIX paradigm exists to solve this problem for you: configuration artifacts of your scripts should exist in either the user's HOME directory or /etc. That gives your script an absolute path to look for the file, solving your problem instantly: you no longer depend on the "location" of your script:
The same holds true for other types of data files. Logs should be written to /var/log or the user's home directory. Support files should be installed to an absolute path in the file system or be made available alongside the configuration in /etc or the user's home directory.
29.2. I need to access files bundled with my script
Sometimes scripts are part of a "bundle" and perform certain actions within or upon it. This is often true for applications unpacked or contained within a bundle directory. The user may unpack or install the bundle anywhere; ideally, the bundle's scripts should work whether that's somewhere in a home dir, or /var/tmp, or /usr/local. The files are transient, and have no fixed or predictable location.
When a script needs to act upon other files it's bundled with, independently of its absolute location, we have two options: either we rely on PWD or we rely on BASH_SOURCE. Both approaches have certain issues; here's what you need to know.
29.2.1. Using BASH_SOURCE
The BASH_SOURCE internal bash variable is actually an array of pathnames. If you expand it as a simple string, e.g. "$BASH_SOURCE", you'll get the first element, which is the pathname of the currently executing function or script. Using the BASH_SOURCE method, you access files within your bundle like this:
Please note that when using BASH_SOURCE, the following caveats apply:
$BASH_SOURCE expands empty when bash does not know where the executing code comes from. Usually, this means the code is coming from standard input (e.g. ssh host 'somecode', or from an interactive session).
$BASH_SOURCE does not follow symlinks (when you run z from /x/y, you get /x/y/z, even if that is a symlink to /p/q/r). Often, this is the desired effect. Sometimes, though, it's not. Imagine your package links its start-up script into /usr/local/bin. Now that script's BASH_SOURCE will lead you into /usr/local and not into the package.
If you're not writing a bash script, the BASH_SOURCE variable is unavailable to you. There is a common convention, however, for passing the location of the script as the process name when it is started. Most shells do this, but not all shells do so reliably, and not all of them attempt to resolve a relative path to an absolute path. Relying on this behaviour is dangerous and fragile, but can be done by looking at $0 (see below). Again, consider all your options before doing this: you are likely creating more problems than you are solving.
29.2.2. Using PWD
Another option is to rely on PWD, the current working directory. In this case, you can assume the user has first cd'ed into your bundle and make all your pathnames relative. Using the PWD method, you access files within your bundle like this:
To reduce fragility, you could even test whether, for example, the relative path to the script name is correct, to make sure the user has indeed cd'ed into the bundle:
You can also try some heuristics, just in case the user is sitting one directory above the bundle:
If you ever do need an absolute path, you can always get one by prefixing the relative path with $PWD: echo "Saved to: $PWD/result.csv"
The only difficulty here is that you're forcing your user to change into your bundle's directory before your script can function. Regardless, this may well be your best option!
29.2.3. Using a configuration/wrapper
If neither the BASH_SOURCE or the PWD option sound interesting, you might want to consider going the route of configuration files instead (see the previous section). In this case, you require that your user set the location of your bundle in a configuration file, and have him put that configuration file in a location you can easily find. For example:
1 [[ -e ~/.myscript.conf ]] || {
2 echo >&2 "First configure the product in ~/.myscript.conf"
3 exit 1
4 }
5
6 # ~/.myscript.conf defines something like bundleDir=/x/y
7 source ~/.myscript.conf
8
9 [[ $bundleDir ]] || {
10 echo >&2 "Please define bundleDir='/some/path' in ~/.myscript.conf"
11 exit 1
12 }
13
14 cd "$bundleDir" || {
15 printf >&2 'Could not cd to <%s>\n' "$bundleDir"
16 exit 1
17 }
18
19 # Now you can use the PWD method: use relative paths.
A variant of this option is to use a wrapper that configures your bundle's location. Instead of calling your bundled script, you install a wrapper for your script in the standard system PATH, which changes directory into the bundle and calls the real script from there, which can then safely use the PWD method from above:
29.3. Why $0 is NOT an option
Common ways of finding a script's location depend on the name of the script, as seen in the predefined variable $0. Unfortunately, providing the script name via $0 is only a (common) convention, not a requirement. In fact, $0 is not at all the location of your script, it's the name of your process as determined by your parent. It can be anything.
The suspect answer is "in some shells, $0 is always an absolute path, even if you invoke the script using a relative path, or no path at all". But this isn't reliable across shells; some of them (including BASH) return the actual command typed in by the user instead of the fully qualified path. And this is just the tip of the iceberg!
Consider that your script may not actually be on a locally accessible disk at all. Consider this:
ssh remotehost bash < ./myscript
The shell running on remotehost is getting its commands from a pipe. There's no script anywhere on any disk that bash can see.
Moreover, even if your script is stored on a local disk and executed, it could move. Someone could mv the script to another location in between the time you type the command and the time your script checks $0. Or someone could have unlinked the script during that same time window, so that it doesn't actually have a link within a file system any more.
(That may sound fanciful, but it's actually very common. Consider a script installed in /opt/foobar/bin, which is running at the time someone upgrades foobar to a new version. They may delete the entire /opt/foobar/ hierarchy, or they may move the /opt/foobar/bin/foobar script to a temporary name before putting a new version in place. For these reasons, even approaches like "use lsof to find the file which the shell is using as standard input" will still fail.)
Even in the cases where the script is in a fixed location on a local disk, the $0 approach still has some major drawbacks. The most important is that the script name (as seen in $0) may not be relative to the current working directory, but relative to a directory from the program search path $PATH (this is often seen with KornShell). Or (and this is most likely problem by far...) there might be multiple links to the script from multiple locations, one of them being a simple symlink from a common PATH directory like /usr/local/bin, which is how it's being invoked. Your script might be in /opt/foobar/bin/script but the naive approach of reading $0 won't tell you that -- it may say /usr/local/bin/script instead.
Some people will try to work around the symlink issue with readlink -f "$0". Again, this may work in some cases, but it's not bulletproof. Nothing that reads $0 will ever be bulletproof, because $0 itself is unreliable. Furthermore, readlink is nonstandard, and won't be available on all platforms.
For a more general discussion of the Unix file system and how symbolic links affect your ability to know where you are at any given moment, see this Plan 9 paper.
30. How can I display the target of a symbolic link?
The nonstandard external command readlink(1) can be used to display the target of a symbolic link:
$ readlink /bin/sh bash
If you don't have readlink, you can use Perl:
perl -le 'print readlink "/bin/sh"'
You can also use GNU find's -printf %l directive, which is especially useful if you need to resolve links in batches:
$ find /bin/ -type l -printf '%p points to %l\n' /bin/sh points to bash /bin/bunzip2 points to bzip2 ...
If your system lacks both readlink and Perl, you can use a function like this one:
# Bash readlink() { local path=$1 ll if [ -L "$path" ]; then ll=$(LC_ALL=C ls -ld -- "$path" 2>/dev/null) && printf '%s\n' "${ll#* -> }" else return 1 fi }
However, this can fail if a symbolic link contains " -> " in its name.
31. How can I rename all my *.foo files to *.bar, or convert spaces to underscores, or convert upper-case file names to lower case?
There are a bunch of different ways to do this, depending on which nonstandard tools you have available. Even with just standard POSIX tools, you can still perform most of the simple cases. We'll show the portable tool examples first.
You can do most non-recursive mass renames with a loop and some Parameter Expansions, like this:
# POSIX # Rename all *.foo to *.bar for f in *.foo; do mv -- "$f" "${f%.foo}.bar"; done
To check what the command would do without actually doing it, you can add an echo before the mv. This applies to almost(?) every example on this page, so we won't mention it again.
# POSIX # This removes the extension .zip from all the files. for file in ./*.zip; do mv "$file" "${file%.zip}"; done
The "--" and "./*" are to protect from problematic filenames that begin with "-". You only need one or the other, not both, so pick your favorite.
Here are some similar examples, using Bash-specific parameter expansions:
# Bash # Replace all spaces with underscores for f in *\ *; do mv -- "$f" "${f// /_}"; done
For more techniques on dealing with files with inconvenient characters in their names, see FAQ #20.
# Bash # Replace "foo" with "bar", even if it's not the extension for file in ./*foo*; do mv "$file" "${file//foo/bar}"; done
All the above examples invoke the external command mv(1) once for each file, so they may not be as efficient as some of the nonstandard implementations.
31.1. Recursively
If you want to rename files recursively, then it becomes much more challenging. This example renames *.foo to *.bar:
# Bash # Also requires GNU or BSD find(1) # Recursively change all *.foo files to *.bar find . -type f -name '*.foo' -print0 | while IFS= read -r -d '' f; do mv -- "$f" "${f%.foo}.bar" done
This example uses Bash 4's globstar instead of GNU find:
# Bash 4 # Replace "foo" with "bar" in all files recursively. # "foo" must NOT appear in a directory name! shopt -s globstar for file in /path/to/**/*foo*; do mv -- "$file" "${file//foo/bar}" done
The trickiest part of recursive renames is ensuring that you do not change the directory component of a pathname, because something like this is doomed to failure:
mv "./FOO/BAR/FILE.TXT" "./foo/bar/file.txt"
Therefore, any recursive renaming command should only change the filename component of each pathname, like this:
mv "./FOO/BAR/FILE.TXT" "./FOO/BAR/file.txt"
If you need to rename the directories as well, those should be done separately. Furthermore, recursive directory renaming should either be done depth-first (changing only the last component of the directory name in each instance), or in several passes. Depth-first works better in the general case.
Here's an example script that uses depth-first recursion (changes spaces in names to underscores, but you just need to change the ren() function to do anything you want) to rename both files and directories. Again, it's easy to modify to make it act only on files or only on directories, or to act only on files with a certain extension, to avoid or force overwriting files, etc.:
# Bash ren() { local newname newname=${1// /_} [[ $1 != "$newname" ]] && mv -- "$1" "$newname" } traverse() { local file cd -- "$1" || exit for file in *; do [[ -d $file ]] && traverse "$file" ren "$file" done cd .. || exit } # main program shopt -s nullglob dotglob traverse /path/to/startdir
Here is another way to recursively rename all directories and files with spaces in their names, UsingFind:
find . -depth -name "* *" -exec bash -c 'dir=${1%/*} base=${1##*/}; mv "$1" "$dir/${base// /_}"' _ {} \;
or, if your version of find accepts it, this is more efficient as it runs one bash for many files instead of one bash per file:
find . -depth -name "* *" -exec bash -c 'for f; do dir=${f%/*} base=${f##*/}; mv "$f" "$dir/${base// /_}"; done' _ {} +
31.2. Upper- and lower-case
To convert filenames to lower-case with only standard tools, you need something that can take a mixed-case filename as input and give back the lowercase version as output. In Bash 4 and higher, there is a parameter expansion that can do it:
# Bash 4 for f in *[[:upper:]]*; do mv -- "$f" "${f,,}"; done
Otherwise, tr(1) may be helpful:
# tolower - convert file names to lower case # POSIX for file do [ -f "$file" ] || continue # ignore non-existing names newname=$(printf %s "$file" | tr '[:upper:]' '[:lower:]') # lower case [ "$file" = "$newname" ] && continue # nothing to do [ -f "$newname" ] && continue # don't overwrite existing files mv -- "$file" "$newname" done
This example will not handle filenames that end with newlines, because the CommandSubstitution will eat them. The workaround for that is to append a character in the command substitution, and remove it afterward. Thus:
newname=$(printf %sx "$file" | tr '[:upper:]' '[:lower:]') newname=${newname%x}
We use the fancy range notation, because tr can behave very strangely when using the A-Z range on some locales:
imadev:~$ echo Hello | tr A-Z a-z hÉMMÓ
To make sure you aren't caught by surprise when using tr with ranges, either use the fancy range notations, or set your locale to C.
imadev:~$ echo Hello | LC_ALL=C tr A-Z a-z hello imadev:~$ echo Hello | tr '[:upper:]' '[:lower:]' hello # Either way is fine here.
Note that GNU tr doesn't support multi-byte characters (like non-ASCII UTF-8 ones). So on GNU systems, you may prefer:
# GNU sed 's/.*/\L&/g' # POSIX awk '{print tolower($0)}'
This technique can also be used to replace all unwanted characters in a file name, e.g. with '_' (underscore). The script is the same as above, with only the "newname=..." line changed.
# renamefiles - rename files whose name contain unusual characters # POSIX for file do [ -f "$file" ] || continue # ignore non-regular files, etc. newname=$(printf '%s\n' "$file" | sed 's/[^[:alnum:]_.]/_/g' | paste -sd _ -) [ "$file" = "$newname" ] && continue # nothing to do [ -f "$newname" ] && continue # do not overwrite existing files mv -- "$file" "$newname" done
The character class in [] contains all the characters we want to keep (after the ^); modify it as needed. The [:alnum:] range stands for all the letters and digits of the current locale. Note however that it will not replace bytes that don't form valid characters (like characters encoded in the wrong character set).
Here's an example that does the same thing, but this time using Parameter Expansion instead of sed:
# renamefiles (more efficient, less portable version) # Bash/Ksh/Zsh for file do [[ -f $file ]] || continue newname=${file//[![:alnum:]_.]/_} [[ $file = "$newname" ]] && continue [[ -e $newname ]] && continue [[ -L $newname ]] && continue mv -- "$file" "$newname" done
It should be noted that all these examples contain a race condition -- an existing file could be overwritten if it is created in between the [ -e "$newname" ... and mv "$file" ... commands. Solving this issue is beyond the scope of this page, however adding the -i and (GNU specific) -T option to mv can reduce its impact.
One final note about changing the case of filenames: when using GNU mv, on many file systems, attempting to rename a file to its lowercase or uppercase equivalent will fail. (This applies to Cygwin on DOS/Windows systems using FAT or NTFS file systems; to GNU mv on Mac OS X systems using HFS+ in case-insensitive mode; as well as to Linux systems which have mounted Windows/Mac file systems, and possibly many other setups.) GNU mv checks both the target names before attempting a rename, and due to the file system's mapping, it thinks that the destination "already exists":
mv README Readme # fails with GNU mv on FAT file systems, etc.
The workaround for this is to rename the file twice: first to a temporary name which is completely different from the original name, then to the desired name.
mv README tempfilename && mv tempfilename Readme
31.3. Nonstandard tools
To convert filenames to lower case, if you have the utility mmv(1) on your machine, you could simply do:
# convert all filenames to lowercase mmv "*" "#l1"
Some GNU/Linux distributions have a rename(1) command; however, the syntax differs from one distribution to the next. Debian uses the perl rename script (formerly included with Perl; now it is not), which it installs as prename(1) and rename(1). Red Hat uses a totally different rename(1) command.
The prename script is extremely flexible. For example, it can be used to change files to lower-case:
# convert all filenames to lowercase prename '$_=lc($_)' ./*
Alternatively, you can also use:
# convert all filenames to lowercase prename 'y/A-Z/a-z/' ./*
For prename to use Unicode instead of ASCII for files encoded in UTF-8:
# convert all filenames to lowercase using Unicode rules PERL_UNICODE=SA rename '$_=lc' ./*
To assume the current locale charset for filenames:
rename 'BEGIN{use Encode::Locale qw(decode_argv);decode_argv} $_=lc'
(note that it still doesn't use the locale's rules for case conversion. For instance, in a Turkish locale, I would be converted to i, not ı).
Or recursively:
# convert all filenames to lowercase, recursively (assumes a find # implementation with support for the non-standard -execdir predicate) # # Note: this will not change directory names. That's because -execdir # cd's to the parent directory before running the command. That means # however that (despite the +), one prename command is executed for # each file to rename. find . -type f -name '*[[:upper:]]*' -execdir prename '$_=lc($_)' {} +
A more efficient and portable approach:
find . -type f -name '*[[:upper:]]*' -exec prename 's{[^/]*$}{lc($&)}e' {} +
Or to replace all underscores with spaces:
prename 's/_/ /g' ./*_*
To rename files interactively using $EDITOR (from moreutils):
vidir
Or recursively:
find . -type f | vidir -
(Note: vidir cannot handle filenames that contain newline characters.)
32. What is the difference between test, [ and [[ ?
The open square bracket [ command (aka test command) and the [[ ... ]] test construct are used to evaluate expressions. [[ ... ]] works only in the Korn shell (where it originates), Bash, Zsh, and recent versions of Yash and busybox sh (if enabled at compilation time, and still very limited there especially in the hush-based variant), and is more powerful; [ and test are POSIX utilities (generally builtin). POSIX doesn't specify the [[ ... ]] construct (which has a specific syntax with significant variations between implementations) though allows shells to treat [[ as a keyword. Here are some examples: