Size: 18158
Comment: Filenames are byte arrays, and read -r breaks on these without LC_CTYPE=C. read -d '' is documented now.
|
← Revision 10 as of 2025-04-19 07:42:55 ⇥
Size: 21019
Comment: pf65 update and various improvements.
|
Deletions are marked like this. | Additions are marked like this. |
Line 27: | Line 27: |
printf '%s\0' "${files[@]}" > "$outputfile" }}} |
print0() { [ "$#" -eq 0 ] || printf '%s\0' "$@"; } print0 "${files[@]}" > "$outputfile" }}} (the empty array case needs to be handled specially as `printf '%s\0'` without argument would print '''one''' empty record instead of nothing at all) |
Line 33: | Line 36: |
mapfile -t -d '' files < "$inputfile" | readarray -t -d '' files < "$inputfile" |
Line 40: | Line 43: |
while IFS= LC_CTYPE=C read -r -d '' file; do | while IFS= LC_ALL=C read -r -d '' file; do |
Line 45: | Line 48: |
The `IFS=` suppresses the trimming of leading/trailing whitespace characters. `LC_CTYPE=C` suppresses multibyte character interpretation; remember, filenames are byte sequences, not character sequences. Without this, bytes that may be interpreted as incomplete or invalid characters may break your script. `mapfile` does not appear to have the same limitations that `read` does. This serialization works for ''any'' array, not just filenames. Bash arrays hold strings, and strings can't contain NUL bytes. |
The `IFS=` suppresses the trimming of leading/trailing whitespace characters that you'd get with the default value of `$IFS`. `LC_ALL=C` works around a bug in some versions of bash. `readarray` does not appear to have the same bugs that `read` does. This serialization works for ''any'' array, not just filenames. Bash arrays hold C-like strings, and those can't contain NUL bytes. |
Line 79: | Line 82: |
* '''Read+write''': exec 3<>"$file" Opening a file for write will clobber (destroy the contents of) any existing file by that name, even if you don't actually write anything to that FD. You can set the ''noclobber'' option (`set -C`) if this is a concern. I've never actually seen that used in a real script. (It may be more common in interactive shells.) |
* '''Read+write''' (without truncation): exec 3<>"$file" Opening a file for write will clobber (truncate, destroy the contents of) any existing file by that name, even if you don't actually write anything to that FD. You can set the ''noclobber'' option (`set -C`) if this is a concern. I've never actually seen that used in a real script. (It may be more common in interactive shells.) |
Line 85: | Line 88: |
The read+write mode is normally used with network sockets, not regular files. Closing a file is simple: `exec 3>&-` or `exec 3<&-` (either one should work). |
The read+write mode is more commonly used for bidirectional streams such as network sockets. It can be useful for regular files, not so much because it's read+write but because contrary to `>`, it skips truncation allowing you to overwrite a part of the file's contents. Closing a file descriptor is simple: `exec 3>&-` or `exec 3<&-` (either one should work regardless of how the file was opened). |
Line 94: | Line 97: |
read -r -s -p 'Password: ' pwd <&3 | IFS= LC_ALL=C read -r -s -p 'Password: ' pwd <&3 }}} There, `read` still reads on its stdin (fd 0), but after it has been temporarily redirected to the same resource as on fd 3. Though `read` specifically can also be told to read from fd 3 directly with `-u` (a non-standard extension from ksh): {{{ IFS= LC_ALL=C read -r -u 3 -s -p 'Password: ' pwd |
Line 106: | Line 115: |
while read -r host <&3; do | while IFS= read -r host <&3; do |
Line 111: | Line 120: |
SSH slurps stdin, which would interfere with the reading of our hostlist file. So we do that reading on a separate FD, and voila. Note that you ''don't'' do this: `while read -r host <"$hostlist"; do ...`. That would reopen the file every time we hit the top of the loop, and keep reading the same host over and over. | `ssh` without `-n` slurps stdin, which would interfere with the reading of our hostlist file. So we do that reading on a separate FD, and voilà. Note that you ''don't'' do this: `while IFS= read -r host <"$hostlist"; do ...`. That would reopen the file every time we hit the top of the loop, and keep reading the same host over and over. |
Line 123: | Line 132: |
(no exactly the same as you end up with fd 3 closed, while when redirecting the loop, fd 3 would be restored to what it was before after the loop finishes) |
|
Line 127: | Line 138: |
log() { printf '%s\n' "$*" >&3; } | log() { local IFS=' ' printf '%s\n' "$*" >&3 } |
Line 140: | Line 154: |
* If you ''don't'' want to recurse, then you probably want to use a glob. `find` always recurses. GNU `find` has a nonstandard extension that lets you control this, but in a portable script, that won't be an option. * Bash's globs ''can'' recurse (as of bash 4.0 and the `globstar` option), but if you need to target systems with older versions of bash, recursion is going to mean `find`. |
* If you ''don't'' want to recurse, then you probably want to use a glob. `find` always recurses though the `-prune` predicate can tell it to skip recursion. GNU `find` has a nonstandard extension (since copied by many other implementations) that lets you control the minimum and maximum recursion depth to make it easier, but in a portable script, that won't be an option. * Bash's globs ''can'' recurse (as of bash 4.0 and the `globstar` option; though it was buggy before 5.0), but if you need to target systems with older versions of bash, recursion is going to mean `find`. |
Line 146: | Line 160: |
shopt -s nullglob; shopt -u failglob | |
Line 156: | Line 171: |
shopt -s nullglob; shopt -u failglob | |
Line 163: | Line 179: |
When using a glob expansion in a loop or storing it in an array, you also generally want to enable the `nullglob` option without which if there's no match, you loop once over (or store) the literal value of the glob pattern. As `nullglob` unfortunately doesn't take precedence over `failglob`, you may need to disable it as well in case it was enabled earlier. |
|
Line 168: | Line 186: |
while IFS= LC_CTYPE=C read -r -d '' f; do | while IFS= LC_ALL=C read -r -d '' f; do |
Line 173: | Line 191: |
Remember, pathnames may contain newlines, so the ''only'' delimiter that can safely separate pathnames in a stream is the NUL byte. GNU and BSD `find` commands have the `-print0` option to delimit the stream this way, and you may use that as long as you are only targeting such systems. If you need to target systems that have only POSIX `find`, this workaround is portable: {{{ while IFS= LC_CTYPE=C read -r -d '' f; do |
Remember, pathnames may contain newlines, so the ''only'' delimiter that can safely separate pathnames in a stream is the NUL byte. Most `find` implementations now have the `-print0` predicate to delimit the stream this way (it's now standard as of the 2024 edition of the POSIX standard) though you may still find older systems where that's not available. If you need to target systems that have the older `find`, this workaround is more portable: {{{ while IFS= LC_ALL=C read -r -d '' f; do |
Line 178: | Line 196: |
done < <(find . -type f -exec printf %s\\0 {} +) }}} This is notably less efficient than `-print0` of course. In both cases, the `read` command does our parsing for us. We tell it to use a NUL delimiter between files with the `-d ''` option. The `-r` option suppresses backslash mangling, and setting `IFS=` suppresses leading/trailing space trimming. This basic template for reading a NUL delimited stream is extremely important, and you should be absolutely sure you understand it. If you want to store `find` results in an array, you can use this same template, and simply put an array element assignment inside the loop. In bash 4.4, `mapfile` was also given the `-d` option, which you may use if you're targeting such systems: {{{ mapfile -t -d '' files < <(find ... -print0) |
done < <(find . -type f -exec printf '%s\0' {} +) }}} This is less efficient than `-print0` of course. In both cases, the `read` command does our parsing for us. We tell it to expect a NUL delimiter between files with the `-d ''` option. The `-r` option suppresses backslash mangling, setting `IFS=` suppresses leading/trailing space trimming, setting `LC_ALL=C` works around a bug in bash 5.0 or newer. This basic template for reading a NUL delimited stream is extremely important, and you should be absolutely sure you understand it. If you want to store `find` results in an array, you can use this same template, and simply put an array element assignment inside the loop. In bash 4.4, `readarray` (or its misnamed `mapfile` alias) was also given the `-d` option, which you may use if you're targeting such systems: {{{ readarray -t -d '' files < <(find ... -print0) |
Line 210: | Line 228: |
find ... -exec bash -c '... "$@" ...' x {} + | find ... -exec bash -c '... "$@" ...' bash {} + |
Line 216: | Line 234: |
find ... -exec bash -c 'for f; do ...; done' x {} + }}} You can use `sh -c` if your mini-script doesn't rely on bash features. Remember, the argument that immediately follows `bash -c script` becomes argument 0 (`$0`) of the script, so you need to put a placeholder argument there. I'm using `x` in this document, but it can be literally any string you like. `find` puts a sub-list of filenames where the `{}` is, and those become the script's positional parameters (`"$@"`). `find` may choose to do this multiple times, if there are lots of files, so you will end up with one grandchild shell process for each such chunk of files. |
find ... -exec bash -c 'for f do ...; done' bash {} + }}} You can use `sh -c` if your mini-script doesn't rely on bash features. Remember, the argument that immediately follows `bash -c script` becomes argument 0 (`$0`) of the script, so you need to put a placeholder argument there. I'm repeating the shell interpreter in this document. While it can be literally any string you like, it's important to pick something that identifies the command that is being used such as `bash`/`sh` here as the value will also be used in error messages by the shell. Values such as `_` or `x` would result in confusing error messages. `find` puts a sub-list of filenames where the `{}` is, and those become the script's positional parameters (`"$@"`). `find` may choose to do this multiple times, if there are lots of files, so you will end up with one grandchild shell process for each such chunk of files. |
Line 224: | Line 242: |
find ... -exec sh -c 'mv -- "$@" /destination' x {} + | find ... -exec sh -c 'mv -- "$@" /destination' sh {} + |
Line 229: | Line 247: |
for f; do | for f do |
Line 231: | Line 249: |
mkdir -p "/destination/$dir" convert ... "$f" ... "/destination/$dir/${file%.*}.png" |
mkdir -p -- "/destination/$dir" && convert ... "$f" ... "/destination/$dir/${file%.*}.png" |
Line 234: | Line 252: |
' x {} + | ' sh {} + |
Line 241: | Line 259: |
find ... -exec bash -c 'for f; do myfunc "$f"; done' x {} + | find ... -exec bash -c 'for f do myfunc "$f"; done' bash {} + |
Line 250: | Line 268: |
find "${1:-.}" -type f -printf '%T@ %TY-%Tm-%Td %TT %p\0' | | printf '%s\0' "${@-.}" | find -files0-from - -type f -printf '%T@@%TFT%TT@%Tz@%p\0' | |
Line 252: | Line 270: |
while LC_CTYPE=C read -rd '' _ day time path; do printf '%s %s %s\n' "$day" "${time%.*}" "$path" |
while LC_ALL=C IFS=@ read -rd '' _ mtime tz file; do printf '%s\n' "${mtime%.*}$tz $file" |
Line 263: | Line 281: |
1. GNU `sort` can sort the input on whatever field we like. What field should we provide to make sorting as easy as possible? Obviously the Unix timestamp (seconds since epoch) is the easiest, so we'll add that too. (I ''could'' have chosen to sort on the human-readable date and time fields. I didn't.) | 1. To be able to deal with arbitrary pathname arguments, we can't use `find "$@"` (not even `find -- "$@"`) which wouldn't work for pathnames that start with `-` (or some other values such as `!` or `(` which are the names of some of `find`'s predicates), so we pass the list (defaulting to `.`) NUL-delimited on `find`'s `stdin` which since version 4.9 GNU `find` can read with `-files0-from -`. 1. GNU `sort` can sort the input on whatever field we like. What field should we provide to make sorting as easy as possible? Obviously the Unix last modification timestamp (as seconds since epoch) is the easiest, so we'll add that too. (sorting on the human-readable mtime field wouldn't work properly in timezones that implement daylight saving) |
Line 270: | Line 289: |
1491425037.8232634170 2017-04-05 16:43:57.8232634170 .bashrc }}} We want to remove the entire first field (plus the whitespace after it), and make a modification to the third field. We want to leave everything after the third field untouched, no matter what crazy internal whitespace it has. This matches up very nicely with the shell's `read` command (not so nicely with awk), so I chose a `while read` loop to do the final clean-up. |
1491425037.8232634170@2017-04-05T16:43:57.8232634170@-0500@.bashrc }}} Note we separate the fields with `@`. Using a whitespace character (such as the ones found in the default value of `$IFS`) wouldn't work properly for pathnames that start with such characters because of the special way IFS-splitting handles those. We want to remove the entire first field, and make a modification to the second field and concatenate the third (a timestamp without timezone offset is ambiguous). We want to leave everything after the third field untouched, no matter what crazy characters or non-characters it has. This matches up very nicely with the shell's `read` command (not so nicely with awk), so I chose a `while read` loop to do the final clean-up. On the sample above that will give us: {{{ 2017-04-05T16:43:57-0500 .bashrc }}} Using a standard (ISO8601) unambiguous timestamp format. |
<- Tool selection | Working with files | Collating with associative arrays ->
Working with files
On the previous page, we looked at some input file formats, and considered the choice of various tools that can read them. But many scripts deal with the files themselves, rather than what's inside them.
Contents
Filenames
On Unix systems, filenames may contain whitespace characters. This includes the space character, obviously. It also includes tabs, carriage returns, newlines, and more. Unix filenames may contain every character except / and NUL, and / is obviously allowed in pathnames (which are filenames preceded by zero or more directory components, either relative like ./myscript or absolute like /etc/aliases).
In fact, it's even worse: Unix filenames don't consist of characters at all; they consist of bytes. A filename may not even be a valid character sequence in your locale's character encoding. (Some languages call these "byte arrays"; bash lacks that particular terminology, but if you're familiar with it from another language, then that's what they are.)
Since whitespace characters may be included in a filename, it is a tragic mistake to write software that assumes filenames may be separated by spaces, or even newlines. Poorly written bash scripts are especially likely to be vulnerable to malicious or accidentally created unusual filenames. It's your job as the programmer to write scripts that don't fall over and die (or worse) when the user has a weird filename.
Iteration over filenames should never be done by ParsingLs. Instead, let bash expand a glob. If you need to iterate recursively, you can use the globstar option and a glob containing **, or you can use find. I won't duplicate the UsingFind page here; you are expected to have read it. Later, we'll explore the glob-vs.-find choice in depth.
A single filename may be safely stored in a bash string variable. If you need to store multiple filenames for some reason, use an array variable. Never attempt to store multiple filenames in a string variable with whitespace between them. In most cases, you shouldn't need to store multiple filenames anyway. Usually you can just iterate over the files once, and don't need to store more than one filename at a time. Of course, this depends on what the script is doing.
Sometimes your script will need to read, or write, a file which contains a list of filenames, one per line. If this is an external demand imposed on you, then there's not much you can do about it. You'll have to deal with the fact that a filename containing a newline is going to break your script (or the thing reading your output file). If you're writing the file, you could choose to omit the unusual filename altogether (with or without an error message).
If you're using a file as an internal storage dump, you may safely store the list of filenames in a file if they are delimited by NUL characters instead of newlines. If they're in an array, this is trivial:
print0() { [ "$#" -eq 0 ] || printf '%s\0' "$@"; } print0 "${files[@]}" > "$outputfile"
(the empty array case needs to be handled specially as printf '%s\0' without argument would print one empty record instead of nothing at all)
To read such a file into an array, in bash 4.4:
readarray -t -d '' files < "$inputfile"
Or in older bashes:
files=() while IFS= LC_ALL=C read -r -d '' file; do files+=("$file") done < "$inputfile"
The IFS= suppresses the trimming of leading/trailing whitespace characters that you'd get with the default value of $IFS. LC_ALL=C works around a bug in some versions of bash. readarray does not appear to have the same bugs that read does.
This serialization works for any array, not just filenames. Bash arrays hold C-like strings, and those can't contain NUL bytes.
Opening and closing
(Introductory material: Redirection, FileDescriptor.)
Simple bash scripts will read from stdin and write to stdout/stderr, and never need to worry about opening and closing files. The caller will take care of that, usually by doing its own redirections.
Slightly more complex scripts may open the occasional file by name, usually a single output file for logging results. This may be done on a per-command basis:
myfunc >"$log" 2>&1
or by redirecting stdout/stderr once, at the top of the script:
exec >"$log" 2>&1 myfunc anotherfunc
In the latter case, all commands executed by the script after exec inherit the redirected stdout/stderr, just as if the caller had launched the script with that redirection in the first place.
The exec command doubles as both "open" and "close" in shell scripts. When you open a file, you decide on a file descriptor number to use. This FD number will be what you use to read from/write to the file, and to close it. (Bash 4.1 lets you open files without hard-coding a FD number, instead using a variable to let bash tell you what FD number it assigned. We won't cover this here.)
Scripts may safely assume that they inherit FD 0, 1 and 2 from the caller. FD 3 and higher are therefore typically available for you to use. (If your caller is doing something special with open file descriptors, you'll need to learn about that and deal with it. For now, we'll assume no such special arrangements.)
Bash and sh can open files in 4 different modes:
Read: exec 3<"$file"
Write: exec 3>"$file"
Append: exec 3>>"$file"
Read+write (without truncation): exec 3<>"$file"
Opening a file for write will clobber (truncate, destroy the contents of) any existing file by that name, even if you don't actually write anything to that FD. You can set the noclobber option (set -C) if this is a concern. I've never actually seen that used in a real script. (It may be more common in interactive shells.)
Opening a file for append means every write to the file is preceded (atomically, magically) by a seek-to-end-of-file. This means two or more processes may open the file for append simultaneously, and each one's writes will appear at the end of the file as expected. (Do not attempt this with two processes opening a file for write. The semantics are entirely different.)
The read+write mode is more commonly used for bidirectional streams such as network sockets. It can be useful for regular files, not so much because it's read+write but because contrary to >, it skips truncation allowing you to overwrite a part of the file's contents.
Closing a file descriptor is simple: exec 3>&- or exec 3<&- (either one should work regardless of how the file was opened).
Reading and writing with file descriptors
To read from an FD, you take a command that would normally read stdin, and you perform a redirection:
IFS= LC_ALL=C read -r -s -p 'Password: ' pwd <&3
There, read still reads on its stdin (fd 0), but after it has been temporarily redirected to the same resource as on fd 3. Though read specifically can also be told to read from fd 3 directly with -u (a non-standard extension from ksh):
IFS= LC_ALL=C read -r -u 3 -s -p 'Password: ' pwd
To write to an FD, you do the same thing using stdout:
printf '%s\n' "$message" >&3
Here's a realistic example:
while IFS= read -r host <&3; do ssh "$host" ... done 3<"$hostlist"
ssh without -n slurps stdin, which would interfere with the reading of our hostlist file. So we do that reading on a separate FD, and voilà. Note that you don't do this: while IFS= read -r host <"$hostlist"; do .... That would reopen the file every time we hit the top of the loop, and keep reading the same host over and over.
The placement of the "open" at the bottom of the loop may seem a bit weird if you're not used to bash programming. In fact, this syntax is really just a shortcut. If you prefer, you could write it out the long way:
exec 3<"$hostlist" while read -r host <&3; do ssh "$host" ... done exec 3<&-
(no exactly the same as you end up with fd 3 closed, while when redirecting the loop, fd 3 would be restored to what it was before after the loop finishes)
And here's an example using an output FD:
exec 3>>"$log" log() { local IFS=' ' printf '%s\n' "$*" >&3 }
Each time the log function is called, a message will be written to the open FD. This is more efficient than putting >>"$log" at the end of each printf command, and easier to type.
Operating on files in bulk
As we discussed earlier, there are two fundamental ways you can operate on multiple files: expanding a glob, or UsingFind. When using find, there are actually two approaches you can take: you can use -exec to have find perform some action, or you can read the names in your script.
Which tool and which approach you use depends on what your script needs to do. Ultimately you as the programmer must make all such decisions. I can only present some common guidelines:
If your script needs to store information about files, then find -exec is probably not the approach you want. find performs its actions as a child process of your script, so you don't actually know anything about what it's doing. If you need to store information, then you will want to process the filenames yourself, which means you either read find's output, or you go with a glob.
If you need to select files based on any metadata other than their names (owner, permissions, etc.) then you definitely want find.
If you don't want to recurse, then you probably want to use a glob. find always recurses though the -prune predicate can tell it to skip recursion. GNU find has a nonstandard extension (since copied by many other implementations) that lets you control the minimum and maximum recursion depth to make it easier, but in a portable script, that won't be an option.
Bash's globs can recurse (as of bash 4.0 and the globstar option; though it was buggy before 5.0), but if you need to target systems with older versions of bash, recursion is going to mean find.
Using a glob is simple. A glob expands to a list of filenames, which is a thing that exists only ephemerally, for the duration of the command that contains the expansion. Normally this is exactly what you want.
shopt -s nullglob; shopt -u failglob for f in *.mp3; do ... done
The list that results from expanding *.mp3 lives somewhere in bash's dynamic memory regions. It's not accessible to you, and you don't need it to be, because your loop is just handling one file at a time.
If for some reason you want to store this list, you can use an array.
shopt -s nullglob; shopt -u failglob files=(*.mp3)
This is typically only required if you want to do something like counting the number of files, or iterating over the list multiple times, or determining the first or last file in the expansion. (Glob expansions are sorted according to the rules of your locale, specifically the LC_COLLATE variable. If you wanted to get the first or last file when sorted by some other criteria, such as modification time, that is an entirely separate problem, enormously more difficult.)
Remember, you can also use extended globs if those will help you. For example, !(*~) would expand to all of the files that don't end with ~. Recall that if you intend to enable extglob in a script, you must do it early in the script, not inside of a function or other compound command that attempts to use extended globs.
When using a glob expansion in a loop or storing it in an array, you also generally want to enable the nullglob option without which if there's no match, you loop once over (or store) the literal value of the glob pattern. As nullglob unfortunately doesn't take precedence over failglob, you may need to disable it as well in case it was enabled earlier.
When using find, as mentioned earlier, you have two basic choices: let find act on the files via -exec, or retrieve the names within your script. Each approach has its merits, so it's useful for you to understand both of them.
Conceptually, retrieving the names is simpler, because it shares the same basic structure as the for loop using a glob. However, find does not produce a list; it produces a data stream, which we have to parse. Therefore we don't use for. We use while read instead.
while IFS= LC_ALL=C read -r -d '' f; do ... done < <(find . -type f -print0)
Remember, pathnames may contain newlines, so the only delimiter that can safely separate pathnames in a stream is the NUL byte. Most find implementations now have the -print0 predicate to delimit the stream this way (it's now standard as of the 2024 edition of the POSIX standard) though you may still find older systems where that's not available. If you need to target systems that have the older find, this workaround is more portable:
while IFS= LC_ALL=C read -r -d '' f; do ... done < <(find . -type f -exec printf '%s\0' {} +)
This is less efficient than -print0 of course.
In both cases, the read command does our parsing for us. We tell it to expect a NUL delimiter between files with the -d '' option. The -r option suppresses backslash mangling, setting IFS= suppresses leading/trailing space trimming, setting LC_ALL=C works around a bug in bash 5.0 or newer. This basic template for reading a NUL delimited stream is extremely important, and you should be absolutely sure you understand it.
If you want to store find results in an array, you can use this same template, and simply put an array element assignment inside the loop. In bash 4.4, readarray (or its misnamed mapfile alias) was also given the -d option, which you may use if you're targeting such systems:
readarray -t -d '' files < <(find ... -print0)
Storing an entire hierarchy of filenames in an array shouldn't be a common choice, but it's there if you need it.
That leaves the more difficult approach: using -exec to delegate actions to a grandchild process. If the delegation is simple, then this may not actually be so difficult, but if we want to do anything subtle or complicated, then this becomes an interesting tool.
The fundamental point you must remember is that {} has to appear directly before + with no intervening arguments. So, for example, you can do this:
find ... -exec dos2unix {} +
But you cannot do this:
find ... -exec mv {} /destination + # Does not work.
Any time you want to run something that has the {} in the middle, or which would have multiple instances of {}, or which needs to manipulate the filename, you can -exec a shell and let the shell process each filename as an argument. In effect, you are writing a script within a script. The basic templates for this look like:
find ... -exec bash -c '... "$@" ...' bash {} +
or
find ... -exec bash -c 'for f do ...; done' bash {} +
You can use sh -c if your mini-script doesn't rely on bash features. Remember, the argument that immediately follows bash -c script becomes argument 0 ($0) of the script, so you need to put a placeholder argument there. I'm repeating the shell interpreter in this document. While it can be literally any string you like, it's important to pick something that identifies the command that is being used such as bash/sh here as the value will also be used in error messages by the shell. Values such as _ or x would result in confusing error messages. find puts a sub-list of filenames where the {} is, and those become the script's positional parameters ("$@"). find may choose to do this multiple times, if there are lots of files, so you will end up with one grandchild shell process for each such chunk of files.
Some examples:
find ... -exec sh -c 'mv -- "$@" /destination' sh {} +
find ... -exec sh -c ' for f do dir=${f%/*} file=${f##*/} mkdir -p -- "/destination/$dir" && convert ... "$f" ... "/destination/$dir/${file%.*}.png" done ' sh {} +
Remember that you've got an outer layer of single quotes around your mini-script, so you can't use single quotes inside it, unless you write them as '\''. It's best to avoid writing anything that needs such a level of complexity. If you reach that level, you can put the mini-script in an actual script (a separate file), and -exec that directly. Or, you can write it as a function, and export it, and let the grandchild bash -c process import it automatically.
export -f myfunc find ... -exec bash -c 'for f do myfunc "$f"; done' bash {} +
Finally, I leave you with this example which synthesizes many of the techniques we've already discussed:
rlart() { # Recursive version of "ls -lart". Show files sorted by mtime, recursively. # Requires GNU find and GNU sort. printf '%s\0' "${@-.}" | find -files0-from - -type f -printf '%T@@%TFT%TT@%Tz@%p\0' | sort -zn | while LC_ALL=C IFS=@ read -rd '' _ mtime tz file; do printf '%s\n' "${mtime%.*}$tz $file" done }
Sorting is something bash can't do internally; scripts are expected to call sort(1) instead. So we need to provide a stream that sort can sort how we want. I use GNU find's -printf option to format the fields of the data stream for each pathname, using an explicit NUL delimiter. GNU sort has a -z option to accept an input stream with NUL delimiters, so everything works together.
How did I come up with this? Simply break the problem down into steps:
We're going to need to use find because we're recursing. GNU find can produce output in any format. We want to see the modification date & time and full pathname, so read the man page and figure out what syntax to use for those.
To be able to deal with arbitrary pathname arguments, we can't use find "$@" (not even find -- "$@") which wouldn't work for pathnames that start with - (or some other values such as ! or ( which are the names of some of find's predicates), so we pass the list (defaulting to .) NUL-delimited on find's stdin which since version 4.9 GNU find can read with -files0-from -.
GNU sort can sort the input on whatever field we like. What field should we provide to make sorting as easy as possible? Obviously the Unix last modification timestamp (as seconds since epoch) is the easiest, so we'll add that too. (sorting on the human-readable mtime field wouldn't work properly in timezones that implement daylight saving)
We don't actually want to see the Unix timestamp in the final output, so we'll need to remove it after the sort.
And hey, it turns out -printf doesn't have a way to print seconds without annoying fractions out to 10 decimal places. We want to remove those too. So we might as well combine these two removals into a single clean-up step.
The -printf format that I chose produces output like:
1491425037.8232634170@2017-04-05T16:43:57.8232634170@-0500@.bashrc
Note we separate the fields with @. Using a whitespace character (such as the ones found in the default value of $IFS) wouldn't work properly for pathnames that start with such characters because of the special way IFS-splitting handles those.
We want to remove the entire first field, and make a modification to the second field and concatenate the third (a timestamp without timezone offset is ambiguous). We want to leave everything after the third field untouched, no matter what crazy characters or non-characters it has. This matches up very nicely with the shell's read command (not so nicely with awk), so I chose a while read loop to do the final clean-up.
On the sample above that will give us:
2017-04-05T16:43:57-0500 .bashrc
Using a standard (ISO8601) unambiguous timestamp format.
You may have noticed that we're producing a newline-delimited output stream, which is a problem if one of the filenames contains newlines. This command is only intended to be used by a human. The output is not meant to be parsed by anything less sophisticated than a human brain. This means that it serves the same purpose as ls, as I noted in the comments. It shares the same newline limitation. We live in an imperfect world, so sometimes we need imperfect tools.
<- Tool selection | Working with files | Collating with associative arrays ->