Differences between revisions 112 and 190 (spanning 78 versions)
Revision 112 as of 2009-01-20 03:46:34
Size: 24661
Editor: GreyCat
Comment: spam
Revision 190 as of 2019-05-17 14:24:09
Size: 37274
Editor: c-68-49-79-197
Comment: Mention/link -exec vs -execdir security implications.
Deletions are marked like this. Additions are marked like this.
Line 7: Line 7:
The very first thing you should do before you proceed any further is actually ''read'' your system's man page for the {{{find}}} command. You don't have to memorize it, or understand ''every'' part, but you should at least have looked at all the different parts of it once, so you have a general idea what's going on. Then, you might want to look at the [[http://www.openbsd.org/cgi-bin/man.cgi?query=find&apropos=0&sektion=1&manpath=OpenBSD+Current&arch=i386&format=html|OpenBSD man page]] for comparison. Sometimes, you'll understand one man page more than another. Just realize that not all of the implementations are the same; the OpenBSD one may have features that yours lacks, and ''vice versa''. But many people find the BSD man pages easy to read, so it might help you with the concepts. The very first thing you should do before you proceed any further is actually ''read'' your system's man page for the {{{find}}} command. You don't have to memorize it, or understand ''every'' part, but you should at least have looked at all the different parts of it once, so you have a general idea what's going on. Then, you might want to look at the [[http://man.openbsd.org/find.1|OpenBSD man page]] for comparison. Sometimes, you'll understand one man page more than another. Just realize that not all of the implementations are the same; the OpenBSD one may have features that yours lacks, and ''vice versa''. But many people find the BSD man pages easy to read, so it might help you with the concepts.
Line 17: Line 17:
Line 25: Line 26:
./apple.txt}}} ./apple.txt
}}}
Line 44: Line 46:
./bar/foo.jpg}}}
In this case, only one file matched our criteria, so we only got one line of output. If we ''also'' wanted to find all the files ending with {{{.mp3}}}:
./bar/foo.jpg
}}}
In this case, only one file matched our criteria, so we only got one line of output. Note that `find` uses [[glob|globs]] to express filename-matching patterns. Note also that we had to [[Quotes|quote]] the glob in order to prevent the shell from expanding it. We want `find` to get it without expansion, so that `find` can apply it against each filename it discovers.

If we ''also'' wanted to find all the files ending with {{{.mp3}}}:
Line 50: Line 55:
./foo.mp3}}} ./foo.mp3
}}}
Line 58: Line 64:
We enclosed the whole "or" expression in parentheses, because we want it to be treated as a single unit, especially when we start chaining it together with other expressions (as we'll do shortly). The parentheses themselves have to be passed to {{{find}}} intact, so we had to protect them with backslashes, because the parentheses ''also'' have a special meaning to the shell. We could have used quotes (around the parentheses) instead of backslashes. We enclosed the whole "or" expression in parentheses, because we want it to be treated as a single unit, especially when we start chaining it together with other expressions (as we'll do shortly). The parentheses themselves have to be passed to {{{find}}} intact, so we had to protect them with backslashes, because the parentheses ''also'' have a special meaning to the shell. We could also have used [[Quotes|quotes]] around the parentheses instead of backslashes.
Line 74: Line 80:
./foo.mp3}}} ./foo.mp3
}}}
Line 77: Line 84:
Notice that {{{-name}}} matches only the file's name inside of the deepest directory -- in our example, {{{foo.jpg}}} and not {{{bar/foo.jpg}}} or {{{./bar/foo.jpg}}}. That's why the {{{-name 'foo*'}}} filter is able to match it. If we want to look at the directory names, we must use the {{{-path}}} filter instead: Notice that {{{-name}}} matches only the file's name inside of the deepest directory -- in our example, {{{foo.jpg}}} and not {{{foo/bar.jpg}}} or {{{./foo/bar.jpg}}}. That's why the {{{-name 'foo*'}}} filter is able to match it. If we want to look at the directory names, we must use the {{{-path}}} filter instead:
Line 87: Line 94:
./bar/foo.jpg}}}
{{{-path}}} looks at the ''entire'' pathname (what will be a line of {{{find}}}'s output if {{{-print}}} is used) in order to match things. This includes the leading {{{./}}} in our case.
./bar/foo.jpg
}}}
{{{-path}}} looks at the ''entire'' pathname, which includes the filename (in other words, what you see in {{{find}}}'s output of {{{-print}}}) in order to match things. This also includes the leading {{{./}}} in our case.
Line 98: Line 106:
./apple.txt}}} ./apple.txt
}}}
Line 102: Line 111:
One of the most common uses of {{{find}}} in system maintenance scripts is to find all the files that are older than ''something'', either a relative instant of time such as "30 days ago", or some other file. If we want to find all the files older than 30 days (for example, to clean up a temporary directory), we use: One of the most common uses of {{{find}}} in system maintenance scripts is to find all the files that are older than ''something'' -- for example, a relative moment of time such as "30 days ago", or some other file. If we want to find all the files older than 30 days (for example, to clean up a temporary directory), we use:
Line 109: Line 118:
Let me say that again for those who aren't paying attention: '''''It is totally impossible to know when a file was created.''''' That information is not stored anywhere.

In our example, we used {{{-mtime}}}, which looks at a file's modification time. This is the timestamp we see when we run {{{ls -l}}}, and is the most commonly used. It's updated any time the contents of a file are changed by writing to it. We could also have used {{{-atime}}} to look at a file's access time -- this can be see with {{{ls -lu}}}, and is updated whenever the contents of the file are read (unless the system administrator has explicitly turned off atime updates on this file system). It's quite rare to use ctime (change time), which is updated any time a file's metadata (permissions, ownership, etc.) are changed (e.g., by {{{chmod}}}).
Let me say that again for those who aren't paying attention: '''''It is totally impossible to know when a file was created.''''' That information is not stored anywhere. ~-(Except on nonstandard file systems, which makes it possible ''on those systems only''.)-~

 . [[WikiPedia:Comparison_of_file_systems|ext4 and others]] have a file creation timestamp, but I'm not sure whether this is supported by any {{{find}}} or {{{stat}}}. (NOTE: it is.)
  . (NOTE: Time has progressed, and any general purpose modern filesystem will have creation time - ext4, xfs, zfs, etc. Just watch out because on some filesystems it's ctime, on others crtime, on btrfs it is or was otime, etc).
   . Not all systems that are in current use are "modern". Many applications run on older systems. Creation time is not a portable feature. If your script is only running on systems that have it, then feel free to make use of it. Just don't expect it to be present if your software is ported to a different platform
.

In our example, we used {{{-mtime}}}, which looks at a file's modification time. This is the timestamp we see when we run {{{ls -l}}}, and is the most commonly used. It's updated any time the contents of a file are changed by writing to it. We could also have used {{{-atime}}} to look at a file's access time -- this can be seen with {{{ls -lu}}}, and is updated whenever the contents of the file are read (unless the system administrator has explicitly turned off atime updates on this file system). It's quite rare to use ctime (change time), which is updated any time a file's metadata (permissions, ownership, etc.) are changed (e.g., by {{{chmod}}}).
Line 125: Line 138:
Some versions of `find` have `-mmin`, `-amin` and `-cmin` flags which allow time to be measured in minutes instead of days. This lets you find, for instance, all files modified in the last 30 minutes. These flags are not part of the POSIX standard, but they are present on many GNU and BSD systems.
Line 129: Line 144:
touch /var/log/backup.timestamp}}}
This is the basic structure for doing an incremental system backup: find all the files that have changed since the last backup, and then update our timestamp file for the next run. (Clearly we'd have to something more than just {{{-print}}} the files, in order to have a meaningful backup, but we'll look at actions in the next section.)
touch /var/log/backup.timestamp
}}}
This is the basic structure for doing an incremental system backup: find all the files that have changed since the last backup, and then update our timestamp file for the next run. (Clearly we'd have to do something more than just {{{-print}}} the files, in order to have a meaningful backup, but we'll look at actions in an upcoming section.)
Line 148: Line 164:
find . -type f -mtime +30 -exec rm -f {} \; find /tmp -type f -mtime +30 -exec rm -f {} \;
Line 157: Line 173:
The {{{-exec}}} is an action flag, which says "we want to execute a command"; the command follows it, and is terminated by a semicolon ({{{;}}}). Now, the semicolon is also a special character for the shell, so we have to protect it with a backslash, just as we did for the parentheses earlier. And once again, we could have used quotes around it instead; the backslash is one character less, so it's the traditional preference.

The curly brace pair in our command is a special argument to {{{find}}}. Each time a command is executed by {{{-exec}}} for a file that's been matched, the {} is replaced with the file's name. So, supposing we matched three files under {{{/tmp}}}, {{{find}}} might execute the following commands:
The {{{-exec}}} is an action flag, which says "we want to execute a command"; the command follows it, and is terminated by a semicolon ({{{;}}}). Now, the semicolon is also a special character for the shell, so we have to [[Quotes|protect it with a backslash]], just as we did for the parentheses earlier. And once again, we could have used quotes around it instead; the backslash is one character less, so it's the traditional preference.

The curly brace pair in our command is a special argument to {{{find}}}. Each time a command is executed by {{{-exec}}} for a file that's been matched, the `{}` is replaced with the file's name. So, supposing we matched three files under {{{/tmp}}}, {{{find}}} might execute the following commands:
Line 164: Line 180:
rm -f /tmp/haha naughty file; "rm -rf ."}}}
Even if some user attempts to subvert the system by putting a Trojan horse in the filename as in this example, the {} replacement is perfectly safe. The commands that {{{find}}} executes are not passed to a shell. There is no word splitting, and no parsing of the filename. The filename is merely passed as an argument to the command (in our case, {{{rm}}}) directly. Nothing in the filename will matter at all.

Sometimes you might want to execute complex commands for each file you process. For example, you might want to execute a block of shell code that isolates the file's name (stripping off all leading directories), converts it to all upper-case, and then does something unspecified (perhaps rename the file, after some additional checking, not shown here). A block of bash code to do that to a single file might look something like this:
rm -f /tmp/haha naughty file; "rm -rf ."
}}}
Even if some user attempts to subvert the system by putting a malicious injection string in the filename as in this example, the {} replacement is perfectly safe. The commands that {{{find}}} executes are not passed to a shell, unless ''you'' specify one. There is no word splitting, and no parsing of the filename. The filename is merely passed as an argument to the command (in our case, {{{rm}}}) directly. Nothing in the filename will matter at all.

The GNU implementation of {{{find}}} provides {{{-execdir}}} to mitigate [[https://www.gnu.org/software/findutils/manual/html_node/find_html/Security-Considerations-for-find.html#Security-Considerations-for-find|security concerns]].

== Complex actions ==
Sometimes you might want to execute complex commands for each file you process. For example, you might want to execute a block of shell code that isolates the file's name (stripping off all leading directories), converts it to all upper-case, and then does something else (perhaps rename the file, after some additional checking, not shown here). A block of shell code to do that to a single file might look something like this:
Line 182: Line 202:
     echo "$name -> $upper"' _ {} \;}}}      echo "$name -> $upper"' _ {} \;
}}}
 Or using Bash version 4 [[BashFAQ/073|Parameter Expansion]] (PE) to achieve the same:
 {{{
 find . -type f -name '*.ext' -exec bash -c 'name=${1##*/}; echo "$name -> ${name^^}"' _ {} \;
}}}
Line 185: Line 211:
 * First note that the whole thing is single-quoted, so everything in it is literal. If you needed to embed your own single quotes in the mini-script, you'd have to use {{{'\''}}} to represent them.  * First note that the whole thing is [[Quotes|single-quoted]], so everything in it is literal. If you needed to embed your own single quotes in the mini-script, you'd have to use {{{'\''}}} to represent them.
Line 189: Line 215:
find ... -exec sh -c '..."$1"...' _ {} \;}}} find ... -exec sh -c '..."$1"...' _ {} \;
}}}
Line 192: Line 219:
Line 194: Line 222:
 * Don't ever put `{}` directly inside a mini-script being executed by {{{bash -c}}} or {{{sh -c}}}.  * '''Don't ever put `{}` directly inside a mini-script being executed by {{{bash -c}}} or {{{sh -c}}}!'''
Line 197: Line 225:
 find ... -exec bash -c 'echo {}' \;}}}
 When this is executed, there are two possible results. Some versions of {{{find}}} will simply not perform the `{}` replacement at all, and you'll see one line of `{}` written to stdout for every file. Other versions of {{{find}}} will replace the `{}` with the filename and then pass the result as code for the shell to execute. If the filename contains commands that the shell understands, such as {{{; rm -rf $HOME}}}, then the shell may execute those commands, with disastrous results.
In the absence of the more complex, safe example, some people might have found the disastrous one on their own, without realizing the danger. The next section of this document shows some more dangerous things....
 find ... -exec bash -c 'echo {}' \;
}}}
When this is executed, there are two possible results. Some versions of {{{find}}} will simply not perform the `{}` replacement at all, and you'll see one line of `{}` written to stdout for every file. Other versions of {{{find}}} will replace the `{}` with the filename and then pass the result as code for the shell to interpret and execute. If the filename contains commands that the shell understands, such as {{{; rm -rf $HOME}}}, then the shell may execute those commands, with disastrous results.
Line 213: Line 241:
However, it has a serious problem: a file with spaces in its name will be read as multiple words, and then {{{rm}}} will attempt to remove each individual word, rather than the single multiple-word file that we actually wanted. Also, it parses quotation marks (possibly both kinds), and will treat quoted segments of data as a single word. This ''also'' breaks things if your filenames contain apostrophes. However, it has a serious problem: a file with spaces in its name will be read as multiple words, and then {{{rm}}} will attempt to remove each individual word, rather than the single multiple-word file that we actually wanted. Also, it parses quotation marks (possibly both kinds), and will treat quoted segments of data as a single word. This ''also'' breaks things if your filenames contain apostrophes. Wikipedia has an example of this going wrong: http://en.wikipedia.org/wiki/Xargs#The_separator_problem
Line 220: Line 248:
With the {{{-print0}}} action, instead of putting a newline after each filename, {{{find}}} uses a NUL character (ASCII 00) instead. Since NUL is the only character which is ''not'' valid in a Unix pathname ({{{/}}} is also invalid in a filename, but we're discussing PATH NAMES here, not file names, so please stop putting incorrect information in this document!), NUL is a valid delimiter between '''pathnames''' in a data stream. The corresponding {{{-0}}} (that's "dash zero", not "dash oh") flag to {{{xargs}}} tells {{{xargs}}}, instead of reading whitespace-separated words, to read NUL-delimited words instead.

This feature is typically found on GNU and BSD systems.

The second workaround is more commonly found on POSIX systems (and some ''extremely'' recent GNU systems), and involves abandoning xargs altogether:
With the {{{-print0}}} action, instead of putting a newline after each filename, {{{find}}} uses a NUL character (ASCII 00) instead. Since NUL is the only character which is ''not'' valid in a Unix pathname ({{{/}}} is also invalid in a filename, but we're discussing PATH NAMES here, not file names, so please stop putting incorrect information in this document!), NUL is a valid delimiter between '''pathnames''' in a data stream. The corresponding {{{-0}}} (that's "dash zero", not "dash oh") flag to {{{xargs}}} tells {{{xargs}}}, instead of reading whitespace-separated words, to read NUL-delimited words instead.  It also turns off `xargs`'s special parsing of quotation marks and apostrophes.

The `-print0` feature is typically found on GNU and BSD systems. For `find` implementations lacking it, it can be emulated by

{{{
find . -name '*.temp' -exec printf '%s\0' {} \; | xargs -0 rm
}}}
However, this has three issues:

 * It still forks a `printf` process for every file, so it's probably even slower than if we simply used `-exec rm` directly.
 * It still uses -exec, which we're presumably trying to not use. So it's faster to type the command you're wanting to use into a quoted -exec than to pointlessly pipe it to another command.
 * It requires that your `xargs` implementation have the `-0` flag. It's uncommon that, on the same system, `xargs` would have `-0` but `find` would lack `-print0`.

So this is not likely to be something that's ever useful in practice
.

The second workaround is more commonly found on POSIX- (SUSv3 ff.) and SVR4-like systems and in GNU findutils since mid of '06, and involves abandoning xargs altogether:
Line 240: Line 279:
 find /mnt/sdb7/recipes -iname "*$i*" -type f -exec cp -fiuRt /mnt/sdb7/tmp2/"$i" '{}' +}}}  find /mnt/sdb7/recipes -iname "*$i*" -type f -exec cp -fiuRt /mnt/sdb7/tmp2/"$i" '{}' +
}}}
Line 243: Line 283:
  find . -type f -exec sh -c 'cp -- "$@" /target' _ {} +}}}   find . -type f -exec sh -c 'cp -- "$@" /target' _ {} +
}}}

 . You should reconsider your find-examples, using -exec rm, since gnu-find has an extra switch for that: -delete, which is far more comfortable.
 {{{
 find . -name '*.temp' -delete
}}}
 which applies too for the more early examples. I don't know for posix-find, whether there is a -delete switch. Note: -delete deletes directories too, when empty; however, it will issue a warning if the directory contains files.
  . It is GNU only, not POSIX.

(I have deleted some examples that attempted to use [[http://www.gnu.org/software/parallel/|GNU parallel]] as a replacement for `xargs`. GNU parallel can be very useful, but the examples given here were totally wrong. The ProcessManagement page has some useful examples of `parallel` on it.)

=== Nasty OS X bug ===
OS X users: be aware of a [[http://hintsforums.macworld.com/archive/index.php/t-91863.html|longstanding bug]] in Darwin's implementation of -exec … +, namely, that find quits when the utility invoked in -exec returns a nonzero exit status. This is tricky, because find will invoke the specified utility on as many files as it can fit into max_args, and then quit, unless the command succeeds, in which case the same will be repeated for the next amount of found files.

In other words, you may still find results, but not all of them.

 . As of OS X 10.8.1, it looks like the bug is solved. You can test for yourself:
 {{{
dir="/tmp/lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat"

mkdir -p "$dir" && (cd "$dir" && touch {0001..9999})
exec_count=$(find "$dir" -type f -exec bash -c 'echo; false' _ {} + | wc -l)

if (( exec_count > 1 ))
then echo 'The bug is fixed in this version of find!'
else echo "BUG FOUND. Please don't trust -exec ... + in this version of find."
fi
}}}
 If your current OS X version is higher than 10.6.8 and lower than 10.8.1, feedback will be appreciated.

== Actions in bulk: GNU Parallel ==
The nonstandard tool [[http://www.gnu.org/software/parallel/|GNU parallel]] can be useful in some cases:

{{{
find . -name '*.temp' -print0 | parallel -X -0 rm
}}}
Once again, '''I have deleted examples that were broken and unsafe'''. There are some better examples of GNU parallel on ProcessManagement.

For correctness and security purposes, GNU parallel should be treated exactly like `xargs`. Do not feed it filenames in a stream unless you use `-print0` and `-0`.
Line 249: Line 329:
There are no standard Unix tools that can give you the ["Permissions"] of a file sanely. However, the real question here isn't ''What are the permissions?'' Instead, it's ''Are the permissions correct?'' If we want to know whether a file has 0644 permissions (for example), we can use {{{find}}}: There are no standard Unix tools that can give you the [[Permissions]] of a file sanely. However, the real question here isn't ''What are the permissions?'' Instead, it's ''Are the permissions correct?'' If we want to know whether a file has 0644 permissions (for example), we can use {{{find}}}:
Line 253: Line 333:
# If this produces any output, then the file has 0644 perms.}}} # If this produces any output, then the file has 0644 perms.
}}}
Line 255: Line 336:
 . (NOTE: I don't know who wrote that silliness, but Unix has had a stat command since it's inception, here is the man page from 1971: http://man.cat-v.org/unix-1st/1/stat)
  . Well, it's not in POSIX.
Line 261: Line 344:
Line 263: Line 347:
Finally, GNU {{{find}}} has a third form: if the argument following {{{-perm}}} begins with a {{{+}}} then {{{find}}} will match files with ''any'' of the specified permissions. For example, to find files that are either setuid ''or'' setgid (or both):

{{{
find /usr -type f -perm +6000 -print
Finally, GNU {{{find}}} has a third form: if the argument following {{{-perm}}} begins with a {{{/}}} then {{{find}}} will match files with ''any'' of the specified permissions. For example, to find files that are either setuid ''or'' setgid (or both):

{{{
find /usr -type f -perm /6000 -print
Line 269: Line 353:
Setuid is 4000, and setgid is 2000. If we want to match either of these, we add them together (technically, we bitwise-OR them together, but in this case it's the same thing...) and then use the {{{-perm +}}} filter. We added {{{-type f}}} as well, because many directories are setgid, and we don't care about those.

If we can't use the {{{-perm +}}} filter, then we would have to explicitly check for each permission bit:

Setuid is 4000, and setgid is 2000. If we want to match either of these, we add them together (technically, we bitwise-OR them together, but in this case it's the same thing...) and then use the {{{-perm /}}} filter. We added {{{-type f}}} as well, because many directories are setgid, and we don't care about those.

If we can't use the {{{-perm /}}} filter, then we would have to explicitly check for each permission bit:
Line 275: Line 360:
# Portable version of -perm +6000 # Portable version of -perm /6000
}}}

Some people find the ''symbolic mode'' to be a more intuitive way of dealing with permissions. This example would find in /usr the files that are __r__eadable by __o__ther users.

{{{
find /usr -type f -perm -o=r
Line 281: Line 372:
find /home \( -name '.qmail*' -o -name authorized_keys \) -perm +0022 -print}}}
Group-write permission is 0020, and world-write is 0002. Searching for either of these therefore uses {{{-perm +0022}}}.
find /home \( -name '.qmail*' -o -name authorized_keys \) -perm /0022 -print
}}}
Group-write permission is 0020, and world-write is 0002. Searching for either of these therefore uses {{{-perm /0022}}}.
Line 286: Line 378:
== Find's expression logic, and the "default -print" action ==
According to the POSIX definition of {{{find}}}, the ''expression'' following the initial options and list of paths essentially consists of only one type of ''operator'', which takes zero or more ''operands'', and always evaluates to a truth value.

If no expression is given, {{{find}}} behaves implicitly as:

{{{
find [-H|-L] path1 [path2 ...] -print
}}}
Or, if the three special operators {{{-exec}}}, {{{-ok}}}, or {{{-print}}} are ''not'' present anywhere in the expression, then {{{find}}} implicitly rewrites the expression as:

{{{
find [-H|-L] path1 [path2 ...] \( expression \) -print
}}}
In either case, {{{-print}}} is effectively tacked on to the end as a default.

The Open BSD implementation adds {{{-ls}}} and {{{-print0}}} to the above list of exceptions.

GNU {{{find}}} goes further to subdivide expression operators into '''options''', '''tests''', and '''actions'''.

 * '''Options''' alter some aspect of {{{find}}}'s behavior. They aren't very important to the logic of the expression and should usually just be lumped together at the beginning. GNU {{{find}}} will generate a warning if they are placed elsewhere. See the manpage for details.
 * '''Tests''' are non-side-effectful operators that do nothing other than evaluate ''true'' or ''false'', usually depending upon some state of the current file or directory being processed.
 * '''Actions''' are mostly operators which produce some side-effect (printing output, performing a file-system operation, etc) in addition to having a truth-value. Also importantly, they are those operators considered by GNU find to belong in the above list of special operators which disable the default {{{-print}}}, '''with the exception of {{{-prune}}}''' (described below). This is because {{{-prune}}} is the only GNU find "action" that's also specified by POSIX where POSIX doesn't specify a default {{{-print}}}, so GNU find complies. The GNU {{{find}}} ''actions'' are: {{{-delete}}}, {{{-exec}}}, {{{-execdir}}}, {{{-fls}}}, {{{-fprint}}}, {{{-fprint0}}}, {{{-fprintf}}}, {{{-ls}}}, {{{-ok}}}, {{{-okdir}}}, {{{-print}}}, {{{-print0}}}, {{{-printf}}}, {{{-prune}}}, and {{{-quit}}}.

The important thing to know about the boolean operators is that they short-circuit, like in most languages, and that ''actions'' and their side-effects are applied if the action operator is "reached" in order to be evaluated. Thus, the {{{-a}}} and {{{-o}}} operators are {{{find}}}'s primary means of "control flow". However, the associativity can be difficult to grasp. Unlike many programming languages, but as is common in propositional and first-order logic notation, {{{-a}}} takes precedence over {{{-o}}}. We can easily experiment with this using GNU find's {{{-true}}} and {{{-false}}} "tests" (note that in older versions of BSD's find, {{{-false}}} is an equivalent to {{{-not}}}!), along with the {{{-printf}}} action, which always evaluates as ''true''. Even though {{{-a}}} is always implicit, I'll be explicit in the remaining examples in this section. Assume we're in an empty directory, so that the expression is evaluated exactly once.

{{{
 $ find . -false -o -false -a -printf 'nope\n' -o -printf 'yep\n' -o -printf 'nope\n'
yep
}}}
It might be tempting to think of this logic in terms of Bash ''lists''.

{{{
 $ false || false && printf 'nope\n' || printf 'yep\n' || printf 'nope\n'
yep
}}}
But this leads to a common mistake. In Bash, {{{&&}}} and {{{||}}} have equal precedence. Consider

{{{
 $ find . -true -o -false -a -printf 'yep\n'
<no output>
}}}
This expression associates to the right. Since the first operand to the disjunction is true, the second isn't evaluated (the entire conjunction).

It is always important to keep track of which operators are being used so that the "default {{{-print}}}" rewriting can be taken into account. Consider a more realistic example:

{{{
find somedir -name '*.autosave.*' -o -name '*.bak' -a -print
}}}
Again, this is right-associative. If the first operator is false, {{{find}}} tests whether the second is true, because {{{-o}}} short-circuits only if the first operand is true. Because {{{-a}}} causes its second operand to evaluate (only) if its first is true, the above expression prints only files matching '*.bak", unlike what one might expect. However, what if the {{{-print}}} were omitted?

{{{
find somedir -name '*.autosave.*' -o -name '*.bak'
}}}
Per the above rules, this expression is implicitly rewritten as

{{{
find somedir \( -name '*.autosave.*' -o -name '*.bak' \) -a -print
}}}
Which counter-intuitively, actually has the more intuitive behavior of printing files matching either of these {{{-name}}} criteria. This problem is explored further in the next section.
Line 287: Line 439:
The {{{-prune}}} action is very powerful, but also very confusing. It allows you to skip subdirectories. This is useful either to find files not in those directories, or to skip searching specific directories for performance reasons.

The most confusing property of -prune is that it is an ACTION, and thus no further filters are processed after it.

To use it, you have to combine it with {{{-o}}} to actually process the non-skipped files, like so:

Find all files in source/, except those in subdirectories named CVS:

{{{
$ find source \( -name CVS -prune \) -o \( -type f -print \)
}}}
Note the explicit {{{-print}}} action. The reason for this is described in the next section.

== Explicit actions vs. the default -print action ==
It can be fun to track down stuff like this:
{{{
 find somedir -name '*.autosave.*' -o -name '*.bak'
}}}
returning results nicely, but this:
{{{
 find somedir -name '*.autosave.*' -o -name '*.bak' -print
}}}
returning only the matches for .bak files. This happens because there is an implicit "and" (`-a`) between the options that has precedence over the "or" (`-o`). Because of the `-print`, there is no default action any more.
It is equivalent to:
{{{
find somedir -name '*.autosave.*' -o \( -name '*.bak' -print \)
}}}
If the file matches autosave, then the first filter is true, and the second part of the "or" is not executed. As there is no default action, nothing is printed.

To fix this you need to group the `-name` options like this:
{{{
 find somedir \( -name '*.autosave.*' -o -name '*.bak' \) -print
}}}
which works.
If the current file is a directory, and the {{{-prune}}} operator is evaluated, then find won't descend into that directory. Thus, {{{-prune}}} allows skipping subdirectories and their contents.

As mentioned in the previous section, {{{-prune}}} doesn't disable the implicit {{{-print}}} (despite being classified as an ''action'' by GNU, because of POSIX). This can lead to yet another counter-intuitive behavior.

{{{
 $ tree
.
├── bork
│ ├── bar
│ └── foo
│ └── bork
└── foo
    ├── bar
    ├── baz
    └── blarg
        └── bork
}}}
{{{
 $ find . -name bork -prune
./bork
./foo/blarg/bork
}}}
Not what was expected? Let's rewrite it with a {{{-print}}} since that's what {{{find}}} does internally in this case.

{{{
find . \( -name bork -prune \) -print
}}}
If the {{{-name}}} matches "bork", then evaluate {{{-prune}}}, which is always true. Since "true and true" is true, and {{{-a}}} causes the next subexpression to evaluate in that case, evaluate {{{-print}}}. Thus, only files or directories matching "bork" are printed, and additionally, a directory matching "bork" is not descended, so any subfiles or directories under that directory are not printed.

Here's what was most likely intended, using {{{-o}}} and an explicit {{{-print}}}

{{{
 $ find . -type d -name bork -prune -o -print
.
./foo
./foo/bar
./foo/blarg
./foo/blarg/bork
./foo/baz
}}}
Now, if the current file is a directory whose {{{-name}}} matches "bork", then: apply {{{-prune}}}; don't descend into its subdirectories; and don't apply {{{-print}}}. Otherwise, the entire and-list evaluates ''false'', {{{-prune}}} isn't evaluated, and print is evaluated.

== Other Examples ==
Some readers have contributed miscellaneous examples:

Search for a pattern in multiple files. For example: Find the pattern 'int main' on all C source files.

{{{
customFind() {
  local ext=$1 pattern=$2
  find . -name "$ext" -type f -exec grep "$pattern" /dev/null {} +
}

customFind '*.c' 'int main'
}}}
Change file permissions on all regular files within a directory:

{{{
find /path/to/directory -type f -exec chmod 644 {} +
}}}
Change permissions of files older than 30 days:

{{{
find /path/to/directory -type f -mtime +30 -exec chmod 644 {} +
}}}
Copy files, removing digits from the filename (e.g file123 to file or te12345st to test). Do not overwrite any existing file.

{{{
find . -type f -execdir bash -c '
  dest=${1//[0-9]/}
  [[ -f $dest ]] || cp -- "$1" "$dest"
' _ {} \;
}}}
Search files ''recursively'' in a certain directory and do '''multiple''' operations with them (to put an end to those horrid attempts of "for i in $(find...)" that some users would regularly try, also in forums):

{{{
# Requires Bash and GNU/BSD find
while IFS= read -rd '' f; do

  echo "current filename: ${f##*/}"
  (...lots of operations...)

done < <(find ~/audio_archive/wip -type f -name '*.mp3' -print0)
}}}
In comparison, the `-exec` option will only perform '''one''' operation, unless combined with a `sh -c` child shell. Here's that version:

{{{
# POSIX
find ~/audio_archive/wip -type f -name '*.mp3' -exec sh -c '
  for f; do
    echo "current filename: ${f##*/}"
    (...lots of operations...)
  done
' _ {} +
}}}
The only real disadvantage of the POSIX version is that the work takes place in a child shell. If you need to populate variables for use after the loop, you can't do it that way.

== Additional Reading ==
[[http://www.in-ulm.de/~mascheck/various/find/|Some Notes About Find]]

----
CategoryShell CategoryUnix

Using Find

find(1) is an extremely useful command for shell scripts, but it's very poorly understood. This is due in part to a complex syntax (perhaps the most complex of all the standard Unix commands that aren't actually programming languages like awk); and in part to poorly written man pages. (The GNU version's man page didn't even have examples until late 2006!)

The very first thing you should do before you proceed any further is actually read your system's man page for the find command. You don't have to memorize it, or understand every part, but you should at least have looked at all the different parts of it once, so you have a general idea what's going on. Then, you might want to look at the OpenBSD man page for comparison. Sometimes, you'll understand one man page more than another. Just realize that not all of the implementations are the same; the OpenBSD one may have features that yours lacks, and vice versa. But many people find the BSD man pages easy to read, so it might help you with the concepts.

Now, let's talk a bit about what find does, and when and how you should consider using it.

1. Overview

Here's the basic idea: find descends through a hierarchy of files, matches files that meet specified criteria, and then does stuff with them. This is really important, so I'm going to write it out again:

  • find descends through a hierarchy of files,

  • matches files that meet specified criteria, and then
  • does stuff with them.

Let's give a few quick examples, to illustrate its basic operations. First up:

find .
.
./bar
./bar/foo.jpg
./foo.mp3
./apple.txt

If you don't specify the . for current directory, some versions of find will assume that's what you want; others may generate an error. You should always specify which directory you want find to descend into.

If you don't specify any criteria for which files should be matched, find will match every file (including directories, symlinks, block and character devices, FIFOs, sockets -- you name it). We didn't add any such flags on our first example, so we got every file and directory.

Finally, if you don't specify an action to be performed on each matched file, most modern versions of find will assume you want to print their names, to standard output, with a newline after each one. (Some extremely old versions of find may do nothing -- it's really best to specify the action.) If we want to be explicit, we would write it thus:

find . -print

This would produce the same output we saw earlier; I won't repeat it.

You might also have observed that the output of find is not necessarily sorted into alphabetical order like that of ls(1). It simply gives you the files in the order in which they appear within a directory.

2. Searching based on names

Now, let's apply a filtering option. Suppose we only want to find files ending with .jpg:

find . -name '*.jpg' -print
./bar/foo.jpg

In this case, only one file matched our criteria, so we only got one line of output. Note that find uses globs to express filename-matching patterns. Note also that we had to quote the glob in order to prevent the shell from expanding it. We want find to get it without expansion, so that find can apply it against each filename it discovers.

If we also wanted to find all the files ending with .mp3:

find . \( -name '*.mp3' -o -name '*.jpg' \) -print
./bar/foo.jpg
./foo.mp3

That's a little more complex than we've shown so far, so let's go over it in detail. The central part is this:

-name '*.mp3' -o -name '*.jpg'

This simply says "we want files that meet this or that". The -o is a logical "or" operator, and is the most portable way to write it. (Some other versions of find support -or as well, but why use a non-portable flag that's more typing, when there's a portable one already?)

We enclosed the whole "or" expression in parentheses, because we want it to be treated as a single unit, especially when we start chaining it together with other expressions (as we'll do shortly). The parentheses themselves have to be passed to find intact, so we had to protect them with backslashes, because the parentheses also have a special meaning to the shell. We could also have used quotes around the parentheses instead of backslashes.

Finally, we applied the explicit -print action. Any files that meet our criteria will have their names printed.

If you want a file to meet multiple criteria, you can specify multiple flags. There's an implicit logical "and" any time you put two filters together without the -o in between them:

find . -name '*.mp3' -name '*.jpg' -print

This one produces no output, because we asked for all the files that end with .mp3 and end with .jpg. Clearly, no file can satisfy both criteria, so we got nothing.

Now, let's chain together an "or" and an implicit "and", as we promised earlier:

find . \( -name '*.mp3' -o -name '*.jpg' \) -name 'foo*' -print
./bar/foo.jpg
./foo.mp3

Here, we have our same "or" expression as before: the file must have a name ending with .mp3 or .jpg. In addition, it must have a name beginning with foo. The results are shown.

Notice that -name matches only the file's name inside of the deepest directory -- in our example, foo.jpg and not foo/bar.jpg or ./foo/bar.jpg. That's why the -name 'foo*' filter is able to match it. If we want to look at the directory names, we must use the -path filter instead:

find . -path 'bar*' -print

That produces no output. Why? Look here:

find . -path './bar*' -print
./bar
./bar/foo.jpg

-path looks at the entire pathname, which includes the filename (in other words, what you see in find's output of -print) in order to match things. This also includes the leading ./ in our case.

(At this point, I must point out that -path is not available on every version of find. In particular, Solaris lacks it. But it's pretty common on everything else.)

We can also negate an expression:

griffon:/tmp/greg$ find . ! -path '*bar*' -print
.
./foo.mp3
./apple.txt

The ! negates the expression which follows it, so we got every file that does not have bar somewhere in its full pathname.

3. Searching based on times

One of the most common uses of find in system maintenance scripts is to find all the files that are older than something -- for example, a relative moment of time such as "30 days ago", or some other file. If we want to find all the files older than 30 days (for example, to clean up a temporary directory), we use:

find /tmp -mtime +30 -print

The -mtime flag is one of the three filters used to inspect a file's timestamps. A file on a Unix file system has three timestamps: modification time (mtime), access time (atime), and change time (ctime). There is nothing that stores a file's creation time.

Let me say that again for those who aren't paying attention: It is totally impossible to know when a file was created. That information is not stored anywhere. (Except on nonstandard file systems, which makes it possible on those systems only.)

  • ext4 and others have a file creation timestamp, but I'm not sure whether this is supported by any find or stat. (NOTE: it is.)

    • (NOTE: Time has progressed, and any general purpose modern filesystem will have creation time - ext4, xfs, zfs, etc. Just watch out because on some filesystems it's ctime, on others crtime, on btrfs it is or was otime, etc).
      • Not all systems that are in current use are "modern". Many applications run on older systems. Creation time is not a portable feature. If your script is only running on systems that have it, then feel free to make use of it. Just don't expect it to be present if your software is ported to a different platform.

In our example, we used -mtime, which looks at a file's modification time. This is the timestamp we see when we run ls -l, and is the most commonly used. It's updated any time the contents of a file are changed by writing to it. We could also have used -atime to look at a file's access time -- this can be seen with ls -lu, and is updated whenever the contents of the file are read (unless the system administrator has explicitly turned off atime updates on this file system). It's quite rare to use ctime (change time), which is updated any time a file's metadata (permissions, ownership, etc.) are changed (e.g., by chmod).

The +30 means we want all the files that are more than 30 days old. If we had used this:

find . -mtime 30 -print

We would've gotten all the files that are exactly 30 days old (when rounded to the nearest day -- see the manual for the exact definitions). That's not often useful. However, if we wanted all the files that have been modified within the last 30 days, we would use:

find . -mtime -30 -print

This gives us all the files whose modification time is less than 30 days ago.

Some versions of find have -mmin, -amin and -cmin flags which allow time to be measured in minutes instead of days. This lets you find, for instance, all files modified in the last 30 minutes. These flags are not part of the POSIX standard, but they are present on many GNU and BSD systems.

The other common use for matching files based on their times is when we want to find all the files that have been modified since the last time we checked. That logic looks like this:

find /etc -newer /var/log/backup.timestamp -print
touch /var/log/backup.timestamp

This is the basic structure for doing an incremental system backup: find all the files that have changed since the last backup, and then update our timestamp file for the next run. (Clearly we'd have to do something more than just -print the files, in order to have a meaningful backup, but we'll look at actions in an upcoming section.)

4. Searching based on sizes

Another common use for find is looking for files that are larger than some threshold value. For example, this looks for all files larger than 10 megabytes (10485760 bytes) within /home:

find /home -type f -size +10485760c -print

As with file modification times (shown in the previous section), a leading + or - before the argument that follows -size is significant. A leading + means the size must be larger than the specified value; - means smaller than; and no leading sign means exactly the specified value.

Without the trailing c, the argument would be interpreted as a number of blocks, which are generally 512 bytes, although this may be system-dependent.

Some versions of find have additional unit specifiers, such as k for kilobytes (1024 bytes). These are not portable, and you should refer to your system's manual to learn about them if you wish to use them.

5. Actions

A more complete example for cleaning up /tmp might look something like this:

find /tmp -type f -mtime +30 -exec rm -f {} \;

Here, we've got a few new things. First of all, we used -type f to specify that we only want to match ordinary files -- not directories, and not sockets, and so on. Then, we have our -mtime +30 filter. As we've already seen, two filters adjacent to each other like this have an implicit "and" between them, so the file must match both criteria to be considered.

Now, our action is no longer -print, because we don't want to see them. Instead, we've got this:

-exec rm -f {} \;

The -exec is an action flag, which says "we want to execute a command"; the command follows it, and is terminated by a semicolon (;). Now, the semicolon is also a special character for the shell, so we have to protect it with a backslash, just as we did for the parentheses earlier. And once again, we could have used quotes around it instead; the backslash is one character less, so it's the traditional preference.

The curly brace pair in our command is a special argument to find. Each time a command is executed by -exec for a file that's been matched, the {} is replaced with the file's name. So, supposing we matched three files under /tmp, find might execute the following commands:

rm -f /tmp/foo285712
rm -f /tmp/core
rm -f /tmp/haha naughty file; "rm -rf ."

Even if some user attempts to subvert the system by putting a malicious injection string in the filename as in this example, the {} replacement is perfectly safe. The commands that find executes are not passed to a shell, unless you specify one. There is no word splitting, and no parsing of the filename. The filename is merely passed as an argument to the command (in our case, rm) directly. Nothing in the filename will matter at all.

The GNU implementation of find provides -execdir to mitigate security concerns.

6. Complex actions

Sometimes you might want to execute complex commands for each file you process. For example, you might want to execute a block of shell code that isolates the file's name (stripping off all leading directories), converts it to all upper-case, and then does something else (perhaps rename the file, after some additional checking, not shown here). A block of shell code to do that to a single file might look something like this:

# $1 contains the original filename
name=${1##*/}
upper=$(echo "$name" | tr "[:lower:]" "[:upper:]")
echo "$name -> $upper"

If we want to have find run that for every file it matches, then we have two possibilities:

  1. Create a script with this code in it, and then -exec that script.

  2. Write a complex find command like this one:

    find ... -exec bash -c \
         'name=${1##*/}; upper=$(echo "$name" | tr "[:lower:]" "[:upper:]");
         echo "$name -> $upper"' _ {} \;

    Or using Bash version 4 Parameter Expansion (PE) to achieve the same:

     find . -type f -name '*.ext' -exec bash -c 'name=${1##*/}; echo "$name -> ${name^^}"' _ {} \;

The separate script is much easier to read and understand, and should be given serious consideration as a solution. For those of you who want to understand what the second alternative does, here's an explanation:

  • First note that the whole thing is single-quoted, so everything in it is literal. If you needed to embed your own single quotes in the mini-script, you'd have to use '\'' to represent them.

  • The mini-script is followed by _ and {}. When bash -c executes a command, the next argument after the command is used as $0 (the script's "name" in the process listing), and subsequent arguments become the positional parameters ($1, $2, etc.). This means that the filename passed by find (in place of the {}) becomes the first parameter of the script -- and is referenced by $1 inside the mini-script. (Don't omit the _ and try to use $0 inside the mini-script -- not only would that be more confusing, but it is also prone to failure if the filename provided by find has special meaning as an argument to the shell.)

    • If your system does not have bash, you can use another shell, as long as the shell supports the features you're using in your mini-script. For example,
      find ... -exec sh -c '..."$1"...' _ {} \;

      In the case of sh specifically, it is important that the placeholder argument before the {} be anything other than a hyphen (-). Some versions of sh ignore a plain hyphen in that position, instead of using it as $0.

  • See Bash FAQ #30 and Bash FAQ #73 for more explanation of what's going on inside the mini-script.

We've included an example of this complexity here largely because it's frequently requested in the #bash channel on IRC. Also, there is a slightly simpler solution that looks like it should work -- and it may indeed work in many cases -- but which is very dangerous to use in general:

  • Don't ever put {} directly inside a mini-script being executed by bash -c or sh -c!

    # THIS IS BAD!  DANGER!  DO NOT USE!
     find ... -exec bash -c 'echo {}' \;

    When this is executed, there are two possible results. Some versions of find will simply not perform the {} replacement at all, and you'll see one line of {} written to stdout for every file. Other versions of find will replace the {} with the filename and then pass the result as code for the shell to interpret and execute. If the filename contains commands that the shell understands, such as ; rm -rf $HOME, then the shell may execute those commands, with disastrous results.

7. Actions in bulk: xargs, -print0 and -exec +

xargs by default is extremely dangerous, and should never be used without the nonstandard -0 extension. Please stop deleting important information from this document.

xargs is a command that reads words from standard input, and then uses each word as the argument to an arbitrary command. It processes the words in big chunks (although you can specify how many words to process at a time), so that the command is called relatively few times. This can speed up find commands that handle lots of files, by processing files in bulk instead of one by one.

For example,

find . -name '*.temp' -print | xargs rm      # Don't do this!

In this example (which by the way is very bad), the find commands generates a list of filenames matching our filter, and writes them to standard output, with a newline after each one. xargs reads these names, one word at a time, and then constructs one or more big rm commands, to delete the files. Rather than calling rm once per file, this will only call it a few times at most; this can be a dramatic savings in time on systems where fork() is really slow.

However, it has a serious problem: a file with spaces in its name will be read as multiple words, and then rm will attempt to remove each individual word, rather than the single multiple-word file that we actually wanted. Also, it parses quotation marks (possibly both kinds), and will treat quoted segments of data as a single word. This also breaks things if your filenames contain apostrophes. Wikipedia has an example of this going wrong: http://en.wikipedia.org/wiki/Xargs#The_separator_problem

Because of these serious issues (but because people still wanted the efficiency of only forking rm once), a few different workarounds have been created. The first involves a change to both find and xargs:

find . -name '*.temp' -print0 | xargs -0 rm

With the -print0 action, instead of putting a newline after each filename, find uses a NUL character (ASCII 00) instead. Since NUL is the only character which is not valid in a Unix pathname (/ is also invalid in a filename, but we're discussing PATH NAMES here, not file names, so please stop putting incorrect information in this document!), NUL is a valid delimiter between pathnames in a data stream. The corresponding -0 (that's "dash zero", not "dash oh") flag to xargs tells xargs, instead of reading whitespace-separated words, to read NUL-delimited words instead. It also turns off xargs's special parsing of quotation marks and apostrophes.

The -print0 feature is typically found on GNU and BSD systems. For find implementations lacking it, it can be emulated by

find . -name '*.temp' -exec printf '%s\0' {} \; | xargs -0 rm

However, this has three issues:

  • It still forks a printf process for every file, so it's probably even slower than if we simply used -exec rm directly.

  • It still uses -exec, which we're presumably trying to not use. So it's faster to type the command you're wanting to use into a quoted -exec than to pointlessly pipe it to another command.
  • It requires that your xargs implementation have the -0 flag. It's uncommon that, on the same system, xargs would have -0 but find would lack -print0.

So this is not likely to be something that's ever useful in practice.

The second workaround is more commonly found on POSIX- (SUSv3 ff.) and SVR4-like systems and in GNU findutils since mid of '06, and involves abandoning xargs altogether:

find . -name '*.temp' -exec rm {} +

The + (instead of ;) at the end of the -exec action tells find to use an internal xargs-like feature which causes the rm command to be invoked only once for every chunk of files, instead of once per file.

Unfortunately, with the POSIX-style + feature, the {} must appear at the end of the command. This does not work:

find . -type f -exec cp {} ../bar +          # Generates an error.

Would've been nice, wouldn't it? Oh well.

  • Is this the same thing -I'm a newbie - because it works perfectly?
     find /mnt/sdb7/recipes -iname "*$i*" -type f  -exec cp -fiuRt   /mnt/sdb7/tmp2/"$i" '{}' +
    • You're relying on GNU cp's -t switch, which allows you to write the arguments backwards. Yes, that will work, as long as you're on a GNU system. I'd probably leave out all those other arguments (fiuR) though, unless you really do want their behavior. A portable alternative to that might be something like this:

        find . -type f -exec sh -c 'cp -- "$@" /target' _ {} +
  • You should reconsider your find-examples, using -exec rm, since gnu-find has an extra switch for that: -delete, which is far more comfortable.
     find . -name '*.temp' -delete
    which applies too for the more early examples. I don't know for posix-find, whether there is a -delete switch. Note: -delete deletes directories too, when empty; however, it will issue a warning if the directory contains files.
    • It is GNU only, not POSIX.

(I have deleted some examples that attempted to use GNU parallel as a replacement for xargs. GNU parallel can be very useful, but the examples given here were totally wrong. The ProcessManagement page has some useful examples of parallel on it.)

7.1. Nasty OS X bug

OS X users: be aware of a longstanding bug in Darwin's implementation of -exec … +, namely, that find quits when the utility invoked in -exec returns a nonzero exit status. This is tricky, because find will invoke the specified utility on as many files as it can fit into max_args, and then quit, unless the command succeeds, in which case the same will be repeated for the next amount of found files.

In other words, you may still find results, but not all of them.

  • As of OS X 10.8.1, it looks like the bug is solved. You can test for yourself:
    dir="/tmp/lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat"
    
    mkdir -p "$dir" && (cd "$dir" && touch {0001..9999})
    exec_count=$(find "$dir" -type f -exec bash -c 'echo; false' _ {} + | wc -l)
    
    if (( exec_count > 1 ))
    then echo 'The bug is fixed in this version of find!'
    else echo "BUG FOUND.  Please don't trust -exec ... + in this version of find."
    fi
    If your current OS X version is higher than 10.6.8 and lower than 10.8.1, feedback will be appreciated.

8. Actions in bulk: GNU Parallel

The nonstandard tool GNU parallel can be useful in some cases:

find . -name '*.temp' -print0 | parallel -X -0 rm

Once again, I have deleted examples that were broken and unsafe. There are some better examples of GNU parallel on ProcessManagement.

For correctness and security purposes, GNU parallel should be treated exactly like xargs. Do not feed it filenames in a stream unless you use -print0 and -0.

9. Checking file permissions

Many people ask questions such as, How can I get the permissions of a file? I want to check them to see if they're correct.

There are no standard Unix tools that can give you the Permissions of a file sanely. However, the real question here isn't What are the permissions? Instead, it's Are the permissions correct? If we want to know whether a file has 0644 permissions (for example), we can use find:

find myfile -perm 0644 -print
# If this produces any output, then the file has 0644 perms.

It's a rather ugly hack, to work around the fact that Unix has no stat(1) command, but at least it's portable.

  • (NOTE: I don't know who wrote that silliness, but Unix has had a stat command since it's inception, here is the man page from 1971: http://man.cat-v.org/unix-1st/1/stat)

    • Well, it's not in POSIX.

Very often you don't have a specific, known permission mask in mind -- you just want to know if there are any files that have certain permission bits set. For example, some security tools search for any files that have the setuid bit set:

find /usr -perm -4000 -print

There are subtle differences in the meaning of the -perm argument, depending on what follows it. If the argument following -perm begins with a digit, then only files with exactly the same permissions will be matched. If the argument following -perm begins with a - then find will match any files that have all of the specified permissions, and possibly others. (We used this above to find files that have the setuid bit (4000) set, regardless of any other bits they may also have set.)

Finally, GNU find has a third form: if the argument following -perm begins with a / then find will match files with any of the specified permissions. For example, to find files that are either setuid or setgid (or both):

find /usr -type f -perm /6000 -print
# GNU find only, unfortunately.  Possibly FreeBSD and Darwin.

Setuid is 4000, and setgid is 2000. If we want to match either of these, we add them together (technically, we bitwise-OR them together, but in this case it's the same thing...) and then use the -perm / filter. We added -type f as well, because many directories are setgid, and we don't care about those.

If we can't use the -perm / filter, then we would have to explicitly check for each permission bit:

find /usr -type f \( -perm -2000 -o -perm -4000 \) -print
# Portable version of -perm /6000

Some people find the symbolic mode to be a more intuitive way of dealing with permissions. This example would find in /usr the files that are readable by other users.

find /usr -type f -perm -o=r

Another example of this would be checking the permissions on some of a user's files to ensure that they are not going to cause problems. sshd, for instance, will refuse to read public keys that are group- or world-writable, and qmail-local will refuse to honor group- or world-writable .qmail files.

find /home \( -name '.qmail*' -o -name authorized_keys \) -perm /0022 -print

Group-write permission is 0020, and world-write is 0002. Searching for either of these therefore uses -perm /0022.

This isn't actually a complete check for improper permissions on these files; ssh and qmail will also reject them if any directory in the entire path leading up to them is group- or world-writable (such as /home itself). But that's a bit beyond the scope of this page. For the ssh case, see SshKeys.

10. Find's expression logic, and the "default -print" action

According to the POSIX definition of find, the expression following the initial options and list of paths essentially consists of only one type of operator, which takes zero or more operands, and always evaluates to a truth value.

If no expression is given, find behaves implicitly as:

find [-H|-L] path1 [path2 ...] -print

Or, if the three special operators -exec, -ok, or -print are not present anywhere in the expression, then find implicitly rewrites the expression as:

find [-H|-L] path1 [path2 ...] \( expression \) -print

In either case, -print is effectively tacked on to the end as a default.

The Open BSD implementation adds -ls and -print0 to the above list of exceptions.

GNU find goes further to subdivide expression operators into options, tests, and actions.

  • Options alter some aspect of find's behavior. They aren't very important to the logic of the expression and should usually just be lumped together at the beginning. GNU find will generate a warning if they are placed elsewhere. See the manpage for details.

  • Tests are non-side-effectful operators that do nothing other than evaluate true or false, usually depending upon some state of the current file or directory being processed.

  • Actions are mostly operators which produce some side-effect (printing output, performing a file-system operation, etc) in addition to having a truth-value. Also importantly, they are those operators considered by GNU find to belong in the above list of special operators which disable the default -print, with the exception of -prune (described below). This is because -prune is the only GNU find "action" that's also specified by POSIX where POSIX doesn't specify a default -print, so GNU find complies. The GNU find actions are: -delete, -exec, -execdir, -fls, -fprint, -fprint0, -fprintf, -ls, -ok, -okdir, -print, -print0, -printf, -prune, and -quit.

The important thing to know about the boolean operators is that they short-circuit, like in most languages, and that actions and their side-effects are applied if the action operator is "reached" in order to be evaluated. Thus, the -a and -o operators are find's primary means of "control flow". However, the associativity can be difficult to grasp. Unlike many programming languages, but as is common in propositional and first-order logic notation, -a takes precedence over -o. We can easily experiment with this using GNU find's -true and -false "tests" (note that in older versions of BSD's find, -false is an equivalent to -not!), along with the -printf action, which always evaluates as true. Even though -a is always implicit, I'll be explicit in the remaining examples in this section. Assume we're in an empty directory, so that the expression is evaluated exactly once.

 $ find . -false -o -false -a -printf 'nope\n' -o -printf 'yep\n' -o -printf 'nope\n'
yep

It might be tempting to think of this logic in terms of Bash lists.

 $ false || false && printf 'nope\n' || printf 'yep\n' || printf 'nope\n'
yep

But this leads to a common mistake. In Bash, && and || have equal precedence. Consider

 $ find . -true -o -false -a -printf 'yep\n'
<no output>

This expression associates to the right. Since the first operand to the disjunction is true, the second isn't evaluated (the entire conjunction).

It is always important to keep track of which operators are being used so that the "default -print" rewriting can be taken into account. Consider a more realistic example:

find somedir -name '*.autosave.*' -o -name '*.bak' -a -print

Again, this is right-associative. If the first operator is false, find tests whether the second is true, because -o short-circuits only if the first operand is true. Because -a causes its second operand to evaluate (only) if its first is true, the above expression prints only files matching '*.bak", unlike what one might expect. However, what if the -print were omitted?

find somedir -name '*.autosave.*' -o -name '*.bak'

Per the above rules, this expression is implicitly rewritten as

find somedir \( -name '*.autosave.*' -o -name '*.bak' \) -a -print

Which counter-intuitively, actually has the more intuitive behavior of printing files matching either of these -name criteria. This problem is explored further in the next section.

11. -prune

If the current file is a directory, and the -prune operator is evaluated, then find won't descend into that directory. Thus, -prune allows skipping subdirectories and their contents.

As mentioned in the previous section, -prune doesn't disable the implicit -print (despite being classified as an action by GNU, because of POSIX). This can lead to yet another counter-intuitive behavior.

 $ tree
.
├── bork
│   ├── bar
│   └── foo
│       └── bork
└── foo
    ├── bar
    ├── baz
    └── blarg
        └── bork

 $ find . -name bork -prune
./bork
./foo/blarg/bork

Not what was expected? Let's rewrite it with a -print since that's what find does internally in this case.

find . \( -name bork -prune \) -print

If the -name matches "bork", then evaluate -prune, which is always true. Since "true and true" is true, and -a causes the next subexpression to evaluate in that case, evaluate -print. Thus, only files or directories matching "bork" are printed, and additionally, a directory matching "bork" is not descended, so any subfiles or directories under that directory are not printed.

Here's what was most likely intended, using -o and an explicit -print

 $ find . -type d -name bork -prune -o -print
.
./foo
./foo/bar
./foo/blarg
./foo/blarg/bork
./foo/baz

Now, if the current file is a directory whose -name matches "bork", then: apply -prune; don't descend into its subdirectories; and don't apply -print. Otherwise, the entire and-list evaluates false, -prune isn't evaluated, and print is evaluated.

12. Other Examples

Some readers have contributed miscellaneous examples:

Search for a pattern in multiple files. For example: Find the pattern 'int main' on all C source files.

customFind() {
  local ext=$1 pattern=$2
  find . -name "$ext" -type f -exec grep "$pattern" /dev/null {} +
}

customFind '*.c' 'int main'

Change file permissions on all regular files within a directory:

find /path/to/directory -type f -exec chmod 644 {} +

Change permissions of files older than 30 days:

find /path/to/directory -type f -mtime +30 -exec chmod 644 {} +

Copy files, removing digits from the filename (e.g file123 to file or te12345st to test). Do not overwrite any existing file.

find . -type f -execdir bash -c '
  dest=${1//[0-9]/}
  [[ -f $dest ]] || cp -- "$1" "$dest"
' _ {} \;

Search files recursively in a certain directory and do multiple operations with them (to put an end to those horrid attempts of "for i in $(find...)" that some users would regularly try, also in forums):

# Requires Bash and GNU/BSD find
while IFS= read -rd '' f; do

  echo "current filename: ${f##*/}"
  (...lots of operations...)

done < <(find ~/audio_archive/wip -type f -name '*.mp3' -print0)

In comparison, the -exec option will only perform one operation, unless combined with a sh -c child shell. Here's that version:

# POSIX
find ~/audio_archive/wip -type f -name '*.mp3' -exec sh -c '
  for f; do
    echo "current filename: ${f##*/}"
    (...lots of operations...)
  done
' _ {} +

The only real disadvantage of the POSIX version is that the work takes place in a child shell. If you need to populate variables for use after the loop, you can't do it that way.

13. Additional Reading

Some Notes About Find


CategoryShell CategoryUnix

UsingFind (last edited 2019-05-17 14:26:29 by c-68-49-79-197)