Differences between revisions 21 and 104 (spanning 83 versions)
Revision 21 as of 2008-02-21 21:01:56
Size: 23203
Editor: MrIgli
Comment: minor: quote around brackets
Revision 104 as of 2008-12-18 19:39:56
Size: 182
Editor: 194
Comment: http://www.zapsurvey.com/Survey.aspx?id=d83f11ae-451d-463b-8d5a-62ec5a542cdf Buy Acomplia http://www.zapsurvey.com/Survey.aspx?id=d25ec381-f3ef-4f4a-98b7-c8a486df6259 Buy Actigall
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
#pragma section-numbers 2
= Using Find =
[[TableOfContents]]

{{{find(1)}}} is an extremely useful command for shell scripts, but it's very poorly understood. This is due in part to a complex syntax (perhaps the most complex of all the standard Unix commands that aren't actually programming languages like {{{awk}}}); and in part to poorly written man pages. (The GNU version's man page didn't even have examples until late 2006!)

The very first thing you should do before you proceed any further is actually ''read'' your system's man page for the {{{find}}} command. You don't have to memorize it, or understand ''every'' part, but you should at least have looked at all the different parts of it once, so you have a general idea what's going on. Then, you might want to look at the [http://www.openbsd.org/cgi-bin/man.cgi?query=find&apropos=0&sektion=1&manpath=OpenBSD+Current&arch=i386&format=html OpenBSD man page] for comparison. Sometimes, you'll understand one man page more than another. Just realize that not all of the implementations are the same; the OpenBSD one may have features that yours lacks, and ''vice versa''. But many people find the BSD man pages easy to read, so it might help you with the concepts.

Now, let's talk a bit about what {{{find}}} does, and when and how you should consider using it.

== Overview ==
Here's the basic idea: {{{find}}} descends through a hierarchy of files, matches files that meet specified criteria, and then does stuff with them. This is really important, so I'm going to write it out again:

 * {{{find}}} descends through a hierarchy of files,
 * matches files that meet specified criteria, and then
 * does stuff with them.
Let's give a few quick examples, to illustrate its basic operations. First up:

{{{
find .
.
./bar
./bar/foo.jpg
./foo.mp3
./apple.txt}}}
If you don't specify the {{{.}}} for current directory, some versions of find will assume that's what you want; others may generate an error. You should always specify which directory you want {{{find}}} to descend into.

If you don't specify any criteria for which files should be matched, {{{find}}} will match ''every'' file (including directories, symlinks, block and character devices, FIFOs, sockets -- you name it). We didn't add any such flags on our first example, so we got every file and directory.

Finally, if you don't specify an action to be performed on each matched file, most modern versions of {{{find}}} will assume you want to print their names, to standard output, with a newline after each one. (Some extremely old versions of {{{find}}} may do nothing -- it's really best to specify the action.) If we want to be explicit, we would write it thus:

{{{
find . -print
}}}
This would produce the same output we saw earlier; I won't repeat it.

You might also have observed that the output of {{{find}}} is not necessarily sorted into alphabetical order like that of {{{ls(1)}}}. It simply gives you the files in the order in which they appear within a directory.

== Searching based on names ==
Now, let's apply a filtering option. Suppose we only want to find files ending with {{{.jpg}}}:

{{{
find . -name '*.jpg' -print
./bar/foo.jpg}}}
In this case, only one file matched our criteria, so we only got one line of output. If we ''also'' wanted to find all the files ending with {{{.mp3}}}:

{{{
find . \( -name '*.mp3' -o -name '*.jpg' \) -print
./bar/foo.jpg
./foo.mp3}}}
That's a little more complex than we've shown so far, so let's go over it in detail. The central part is this:

{{{
-name '*.mp3' -o -name '*.jpg'
}}}
This simply says "we want files that meet this ''or'' that". The {{{-o}}} is a logical "or" operator, and is the most portable way to write it. (Some other versions of {{{find}}} support {{{-or}}} as well, but why use a non-portable flag that's more typing, when there's a portable one already?)

We enclosed the whole "or" expression in parentheses, because we want it to be treated as a single unit, especially when we start chaining it together with other expressions (as we'll do shortly). The parentheses themselves have to be passed to {{{find}}} intact, so we had to protect them with backslashes, because the parentheses ''also'' have a special meaning to the shell. We could have used quotes (around the parentheses) instead of backslashes.

Finally, we applied the explicit {{{-print}}} action. Any files that meet our criteria will have their names printed.

If you want a file to meet multiple criteria, you can specify multiple flags. There's an implicit logical "and" any time you put two filters together without the {{{-o}}} in between them:

{{{
find . -name '*.mp3' -name '*.jpg' -print
}}}
This one produces no output, because we asked for all the files that end with {{{.mp3}}} ''and'' end with {{{.jpg}}}. Clearly, no file can satisfy both criteria, so we got nothing.

Now, let's chain together an "or" and an implicit "and", as we promised earlier:

{{{
find . \( -name '*.mp3' -o -name '*.jpg' \) -name 'foo*' -print
./bar/foo.jpg
./foo.mp3}}}
Here, we have our same "or" expression as before: the file must have a name ending with {{{.mp3}}} ''or'' {{{.jpg}}}. In addition, it must have a name beginning with {{{foo}}}. The results are shown.

Notice that {{{-name}}} matches only the file's name inside of the deepest directory -- in our example, {{{foo.jpg}}} and not {{{bar/foo.jpg}}} or {{{./bar/foo.jpg}}}. That's why the {{{-name 'foo*'}}} filter is able to match it. If we want to look at the directory names, we must use the {{{-path}}} filter instead:

{{{
find . -path 'bar*' -print
}}}
That produces no output. Why? Look here:

{{{
find . -path './bar*' -print
./bar
./bar/foo.jpg}}}
{{{-path}}} looks at the ''entire'' pathname (what will be a line of {{{find}}}'s output if {{{-print}}} is used) in order to match things. This includes the leading {{{./}}} in our case.

(At this point, I must point out that {{{-path}}} is not available on every version of {{{find}}}. In particular, Solaris lacks it. But it's pretty common on everything else.)

We can also negate an expression:

{{{
griffon:/tmp/greg$ find . ! -path '*bar*' -print
.
./foo.mp3
./apple.txt}}}
The {{{!}}} negates the expression which follows it, so we got every file that does ''not'' have {{{bar}}} somewhere in its full pathname.

== Searching based on times ==
One of the most common uses of {{{find}}} in system maintenance scripts is to find all the files that are older than ''something'', either a relative instant of time such as "30 days ago", or some other file. If we want to find all the files older than 30 days (for example, to clean up a temporary directory), we use:

{{{
find /tmp -mtime +30 -print
}}}
The {{{-mtime}}} flag is one of the three filters used to inspect a file's timestamps. A file on a Unix file system has three timestamps: ''modification time'' (mtime), ''access time'' (atime), and ''change time'' (ctime). There is nothing that stores a file's creation time.

Let me say that again for those who aren't paying attention: '''''It is totally impossible to know when a file was created.''''' That information is not stored anywhere.

In our example, we used {{{-mtime}}}, which looks at a file's modification time. This is the timestamp we see when we run {{{ls -l}}}, and is the most commonly used. It's updated any time the contents of a file are changed by writing to it. We could also have used {{{-atime}}} to look at a file's access time -- this can be see with {{{ls -lu}}}, and is updated whenever the contents of the file are read (unless the system administrator has explicitly turned off atime updates on this file system). It's quite rare to use ctime (change time), which is updated any time a file's metadata (permissions, ownership, etc.) are changed (e.g., by {{{chmod}}}).

The {{{+30}}} means we want all the files that are ''more than'' 30 days old. If we had used this:

{{{
find . -mtime 30 -print
}}}
We would've gotten all the files that are exactly 30 days old (when rounded to the nearest day -- see the manual for the exact definitions). That's not often useful. However, if we wanted all the files that have been modified ''within'' the last 30 days, we would use:

{{{
find . -mtime -30 -print
}}}
This gives us all the files whose modification time is ''less than'' 30 days ago.

The other common use for matching files based on their times is when we want to find all the files that have been modified since the last time we checked. That logic looks like this:

{{{
find /etc -newer /var/log/backup.timestamp -print
touch /var/log/backup.timestamp}}}
This is the basic structure for doing an incremental system backup: find all the files that have changed since the last backup, and then update our timestamp file for the next run. (Clearly we'd have to something more than just {{{-print}}} the files, in order to have a meaningful backup, but we'll look at actions in the next section.)

== Searching based on sizes ==
Another common use for {{{find}}} is looking for files that are larger than some threshold value. For example, this looks for all files larger than 10 megabytes (10485760 bytes) within {{{/home}}}:

{{{
find /home -type f -size +10485760c -print
}}}
As with file modification times (shown in the previous section), a leading {{{+}}} or {{{-}}} before the argument that follows {{{-size}}} is significant. A leading {{{+}}} means the size must be ''larger than'' the specified value; {{{-}}} means ''smaller than''; and no leading sign means ''exactly'' the specified value.

Without the trailing {{{c}}}, the argument would be interpreted as a number of ''blocks'', which are generally 512 bytes, although this may be system-dependent.

Some versions of {{{find}}} have additional unit specifiers, such as {{{k}}} for kilobytes (1024 bytes). These are not portable, and you should refer to your system's manual to learn about them if you wish to use them.

== Actions ==
A more complete example for cleaning up /tmp might look something like this:

{{{
find . -type f -mtime +30 -exec rm -f {} \;
}}}
Here, we've got a few new things. First of all, we used {{{-type f}}} to specify that we only want to match ordinary files -- not directories, and not sockets, and so on. Then, we have our {{{-mtime +30}}} filter. As we've already seen, two filters adjacent to each other like this have an implicit "and" between them, so the file must match both criteria to be considered.

Now, our action is no longer {{{-print}}}, because we don't want to see them. Instead, we've got this:

{{{
-exec rm -f {} \;
}}}
The {{{-exec}}} is an action flag, which says "we want to execute a command"; the command follows it, and is terminated by a semicolon ({{{;}}}). Now, the semicolon is also a special character for the shell, so we have to protect it with a backslash, just as we did for the parentheses earlier. And once again, we could have used quotes around it instead; the backslash is one character less, so it's the traditional preference.

The curly brace pair in our command is a special argument to {{{find}}}. Each time a command is executed by {{{-exec}}} for a file that's been matched, the {} is replaced with the file's name. So, supposing we matched three files under {{{/tmp}}}, {{{find}}} might execute the following commands:

{{{
rm -f /tmp/foo285712
rm -f /tmp/core
rm -f /tmp/haha naughty file; "rm -rf ."}}}
Even if some user attempts to subvert the system by putting a Trojan horse in the filename as in this example, the {} replacement is perfectly safe. The commands that {{{find}}} executes are not passed to a shell. There is no word splitting, and no parsing of the filename. The filename is merely passed as an argument to the command (in our case, {{{rm}}}) directly. Nothing in the filename will matter at all.

Sometimes you might want to execute complex commands for each file you process. For example, you might want to execute a block of shell code that isolates the file's name (stripping off all leading directories), converts it to all upper-case, and then does something unspecified (perhaps rename the file, after some additional checking, not shown here). A block of bash code to do that to a single file might look something like this:

{{{
# $1 contains the original filename
name=${1##*/}
upper=$(echo "$name" | tr "[:lower:]" "[:upper:]")
echo "$name -> $upper"
}}}
If we want to have {{{find}}} run that for every file it matches, then we have two possibilities:

 1. Create a script with this code in it, and then {{{-exec}}} that script.
 1. Write a complex {{{find}}} command like this one:
 {{{
find ... -exec bash -c \
     'name=${1##*/}; upper=$(echo "$name" | tr "[:lower:]" "[:upper:]");
     echo "$name -> $upper"' - {} \;}}}
The separate script is much easier to read and understand, and should be given serious consideration as a solution. For those of you who want to understand what the second alternative does, here's an explanation:

 * First note that the whole thing is single-quoted, so everything in it is literal. If you needed to embed your own single quotes in the mini-script, you'd have to use {{{'\''}}} to represent them.
 * The mini-script is followed by {{{-}}} and `{}`. When {{{bash -c}}} executes a command, the next argument ''after'' the command is used as {{{$0}}} (the script's "name" in the process listing), and subsequent arguments become the positional parameters ({{{$1}}}, {{{$2}}}, etc.). This means that the filename passed by {{{find}}} (in place of the `{}`) becomes the first parameter of the script -- and is referenced by {{{$1}}} inside the mini-script. (We could have omitted the {{{-}}} and then used {{{$0}}} inside the mini-script, but that would be even more confusing.)
 * See [:BashFAQ#30:Bash FAQ #30] and [:BashFAQ#73:Bash FAQ #73] for more explanation of what's going on inside the mini-script.
We've included an example of this complexity here largely because it's frequently requested in the #bash channel on IRC. Also, there is a slightly simpler solution that looks like it should work -- and it may indeed work in many cases -- but which is '''very dangerous''' to use in general:

 * Don't ever put `{}` directly inside a mini-script being executed by {{{bash -c}}} or {{{sh -c}}}.
 {{{
# THIS IS BAD! DANGER! DO NOT USE!
 find ... -exec bash -c 'echo {}' \;}}}
 When this is executed, there are two possible results. Some versions of {{{find}}} will simply not perform the `{}` replacement at all, and you'll see one line of `{}` written to stdout for every file. Other versions of {{{find}}} will replace the `{}` with the filename and then pass the result as code for the shell to execute. If the filename contains commands that the shell understands, such as {{{; rm -rf $HOME}}}, then the shell may execute those commands, with disastrous results.
In the absence of the more complex, safe example, some people might have found the disastrous one on their own, without realizing the danger. The next section of this document shows some more dangerous things....

== Actions in bulk: xargs, -print0 and -exec + ==
''{{{xargs}}} by default is '''extremely''' dangerous, and should never be used without the nonstandard {{{-0}}} extension. Please stop deleting important information from this document.''

{{{xargs}}} is a command that reads words from standard input, and then uses each word as the argument to an arbitrary command. It processes the words in big chunks (although you can specify how many words to process at a time), so that the command is called relatively few times. This can speed up {{{find}}} commands that handle lots of files, by processing files in bulk instead of one by one.

For example,

{{{
find . -name '*.temp' -print | xargs rm # Don't do this!
}}}
In this example (which by the way is '''very''' bad), the {{{find}}} commands generates a list of filenames matching our filter, and writes them to standard output, with a newline after each one. {{{xargs}}} reads these names, one word at a time, and then constructs one or more big {{{rm}}} commands, to delete the files. Rather than calling {{{rm}}} once per file, this will only call it a few times at most; this can be a dramatic savings in time on systems where {{{fork()}}} is really slow.

However, it has a serious problem: a file with spaces in its name will be read as multiple words, and then {{{rm}}} will attempt to remove each individual word, rather than the single multiple-word file that we actually wanted. Also, it parses quotation marks (possibly both kinds), and will treat quoted segments of data as a single word. This ''also'' breaks things if your filenames contain apostrophes.

Because of these serious issues (but because people still wanted the efficiency of only forking {{{rm}}} once), a few different workarounds have been created. The first involves a change to both {{{find}}} and {{{xargs}}}:

{{{
find . -name '*.temp' -print0 | xargs -0 rm
}}}
With the {{{-print0}}} action, instead of putting a newline after each filename, {{{find}}} uses a NUL character (ASCII 00) instead. Since NUL is the only character which is ''not'' valid in a Unix pathname ({{{/}}} is also invalid in a filename, but we're discussing PATH NAMES here, not file names, so please stop putting incorrect information in this document!), NUL is a valid delimiter between '''pathnames''' in a data stream. The corresponding {{{-0}}} (that's "dash zero", not "dash oh") flag to {{{xargs}}} tells {{{xargs}}}, instead of reading whitespace-separated words, to read NUL-delimited words instead.

This feature is typically found on GNU and BSD systems.

The second workaround is more commonly found on POSIX systems (and some ''extremely'' recent GNU systems), and involves abandoning xargs altogether:

{{{
find . -name '*.temp' -exec rm {} +
}}}
The {{{+}}} (instead of {{{;}}}) at the end of the {{{-exec}}} action tells {{{find}}} to use an internal {{{xargs}}}-like feature which causes the {{{rm}}} command to be invoked only once for every chunk of files, instead of once per file.

Unfortunately, with the POSIX-style {{{+}}} feature, the {} must appear at the ''end'' of the command. This does not work:

{{{
find . -type f -exec cp {} ../bar + # Generates an error.
}}}
Would've been nice, wouldn't it? Oh well.

 . Is this the same thing -I'm a newbie - because it works perfectly?
 {{{
 find /mnt/sdb7/recipes -iname "*$i*" -type f -exec cp -fiuRt /mnt/sdb7/tmp2/"$i" '{}' +}}}
  . You're relying on GNU {{{cp}}}'s {{{-t}}} switch, which allows you to write the arguments backwards. Yes, that will work, as long as you're on a GNU system. I'd probably leave out all those other arguments ({{{fiuR}}}) though, unless you really do want their behavior. A portable alternative to that might be something like this:
  {{{
  find . -type f -exec sh -c 'cp -- "$@" /target' - {} +}}}
[[Anchor(permissions)]]

== Checking file permissions ==
Many people ask questions such as, ''How can I get the permissions of a file? I want to check them to see if they're correct.''

There are no standard Unix tools that can give you the ["Permissions"] of a file sanely. However, the real question here isn't ''What are the permissions?'' Instead, it's ''Are the permissions correct?'' If we want to know whether a file has 0644 permissions (for example), we can use {{{find}}}:

{{{
find myfile -perm 0644 -print
# If this produces any output, then the file has 0644 perms.}}}
It's a rather ugly hack, to work around the fact that Unix has no {{{stat(1)}}} command, but at least it's portable.

Very often you don't have a specific, known permission mask in mind -- you just want to know if there are any files that ''have'' certain permission bits set. For example, some security tools search for any files that have the setuid bit set:

{{{
find /usr -perm -4000 -print
}}}
There are subtle differences in the meaning of the {{{-perm}}} argument, depending on what follows it. If the argument following {{{-perm}}} begins with a digit, then only files with ''exactly'' the same permissions will be matched. If the argument following {{{-perm}}} begins with a {{{-}}} then {{{find}}} will match any files that have ''all'' of the specified permissions, and possibly others. (We used this above to find files that have the setuid bit (4000) set, regardless of any other bits they may also have set.)

Finally, GNU {{{find}}} has a third form: if the argument following {{{-perm}}} begins with a {{{+}}} then {{{find}}} will match files with ''any'' of the specified permissions. For example, to find files that are either setuid ''or'' setgid (or both):

{{{
find /usr -type f -perm +6000 -print
# GNU find only, unfortunately. Possibly FreeBSD and Darwin.
}}}
Setuid is 4000, and setgid is 2000. If we want to match either of these, we add them together (technically, we bitwise-OR them together, but in this case it's the same thing...) and then use the {{{-perm +}}} filter. We added {{{-type f}}} as well, because many directories are setgid, and we don't care about those.

If we can't use the {{{-perm +}}} filter, then we would have to explicitly check for each permission bit:

{{{
find /usr -type f \( -perm -2000 -o -perm -4000 \) -print
# Portable version of -perm +6000
}}}

Another example of this would be checking the permissions on some of a user's files to ensure that they are not going to cause problems. {{{sshd}}}, for instance, will refuse to read public keys that are group- or world-writable, and {{{qmail-local}}} will refuse to honor group- or world-writable {{{.qmail}}} files.

{{{
find /home \( -name '.qmail*' -o -name authorized_keys \) -perm +0022 -print}}}
Group-write permission is 0020, and world-write is 0002. Searching for either of these therefore uses {{{-perm +0022}}}.

This isn't actually a ''complete'' check for improper permissions on these files; ssh and qmail will also reject them if any directory in the entire path leading up to them is group- or world-writable (such as {{{/home}}} itself). But that's a bit beyond the scope of this page. For the ssh case, see SshKeys.

== -prune ==
The {{{-prune}}} action is very powerful, but also very confusing. It allows you to skip subdirectories. This is useful either to find files not in those directories, or to skip searching specific directories for performance reasons.

The most confusing property of -prune is that it is an ACTION, and thus no further filters are processed after it.

To use it, you have to combine it with {{{-o}}} to actually process the non-skipped files, like so:

Find all files in source/, except those in subdirectories named CVS

{{{
$ find source \( -name CVS -prune \) -o \( -type f -print \)
}}}
Note the explicit {{{-print}}} action, please don't ask me to explain why, but it's needed for some reason.
http://www.zapsurvey.com/Survey.aspx?id=d83f11ae-451d-463b-8d5a-62ec5a542cdf Buy Acomplia
http://www.zapsurvey.com/Survey.aspx?id=d25ec381-f3ef-4f4a-98b7-c8a486df6259 Buy Actigall

UsingFind (last edited 2019-05-17 14:26:29 by c-68-49-79-197)