Diff for "BashFAQ/020"

Differences between revisions 4 and 15 (spanning 11 versions)

How can I find and deal with file names containing newlines, spaces or both?

The preferred method is still to use find(1):

    find ... -exec command {} \;

or, if you need to handle filenames en masse, with GNU and recent BSD tools:

    find ... -print0 | xargs -0 command

or with POSIX find:

    find ... -exec command {} +

Use that unless you really can't.

Another way to deal with files with spaces in their names is to use the shell's filename expansion (globbing). This has the disadvantage of not working recursively (except with zsh's extensions), but if you just need to process all the files in a single directory, it works fantastically well.

This example changes all the *.mp3 files in the current directory to use underscores in their names instead of spaces. It uses Parameter Expansions that will not work in the original BourneShell or POSIX shell, but should be good in KornShell and BASH.

for file in ./*.mp3; do
    mv "$file" "${file// /_}"
done

Remember, you need to quote all your Parameter Expansions using double quotes. If you don't, the expansion will undergo WordSplitting (see also BashGuide/TheBasics/ArgumentSplitting and BashPitfalls). Also, always prefix globs with "./"; otherwise, if there's a file with "-" as the first character, the expansions might be misinterpreted as options.

You could do the same thing for all files with spaces in their names (regardless of extension) by using

for file in ./*\ *; do

instead of *.mp3.

Another way to handle filenames recursively involves using the -print0 option of find (a GNU/BSD extension), together with bash's -d option for read:

# Bash
unset a i
while IFS= read -r -d $'\0' file; do
  a[i++]="$file"        # or however you want to process each file
done < <(find /tmp -type f -print0)

The preceding example reads all the files under /tmp (recursively) into an array, even if they have newlines or other whitespace in their names, by forcing read to use the NUL byte (\0) as its line delimiter. Since NUL is not a valid byte in Unix filenames, this is the safest approach besides using find -exec. IFS= is required to avoid trimming leading/trailing whitespace, and -r is needed to avoid backslash processing. In fact, $'\0' is equivalent to '' so we could also write it like this:

# Bash
unset a i
while IFS= read -r -d '' file; do
  a[i++]="$file"
done < <(find /tmp -type f -print0)

We can also write it in another form, without having to remember the '< <(' syntax and seeing first what is executed first:

# Bash
unset a i
find /tmp -type f -print0 | while IFS= read -r -d '' file; do
  a[i++]="$file"
done

-  ⇤ ← Revision 4 as of 2008-05-31 06:37:22 → 
  Size: 2166
  Editor: pgas
  Comment: add a note about quoting, still not sure about how to explain concisely and simply word splitting...
+   ← Revision 15 as of 2009-07-03 09:59:29 → ⇥
  Size: 2878
  Editor: localhost
  Comment: Helping the user to avoid remembering the '< <(' syntax and putting first "find" as it's executed first.
-Deletions are marked like this.
+Additions are marked like this.
 Line 1:
-[[Anchor(faq20)]]
+<<Anchor(faq20)>>
-Line 3:
+Line 4:
-The preferred method is still to use [:UsingFind:find(1)]:
+The preferred method is still to use [[UsingFind|find(1)]]:
-Line 8:
+Line 9:
 Line 14:
-Line 20:
+Line 19:
-Line 23:
+Line 21:
-Another way to deal with files with spaces in their names is to use the shell's filename expansion (["globbing"]).  This has the disadvantage of not working recursively (except with zsh's extensions), but if you just need to process all the files in a single directory, it works fantastically well, but you need to '''quote all your [:BashFAQ/073:Parameter Expansions] using double 
quotes''' "$file" "${file%.mp3}" ...If you don't use the quotes the expansion will be split into several words, which means
that the command will think you gave it several arguments instead of just one.
+Another way to deal with files with spaces in their names is to use the shell's filename expansion ([[globbing]]).  This has the disadvantage of not working recursively (except with zsh's extensions), but if you just need to process all the files in a single directory, it works fantastically well.
-Line 27:
+Line 23:
-This example changes all the *.mp3 files in the current directory to use underscores in their names instead of spaces.  It uses [:BashFAQ/073:Parameter Expansions] that will not work in the original BourneShell, but should be good in Korn and Bash.
+This example changes all the *.mp3 files in the current directory to use underscores in their names instead of spaces.  It uses [[BashFAQ/073|Parameter Expansions]] that will not work in the original BourneShell or POSIX shell, but should be good in KornShell and [[BASH]].
-Line 30:
+Line 26:
-for file in *.mp3; do
+for file in ./*.mp3; do
-Line 34:
+Line 30:
+Remember, you need to '''quote all your [[BashFAQ/073|Parameter Expansions]] using double  quotes'''.  If you don't, the expansion will undergo WordSplitting (see also BashGuide/TheBasics/ArgumentSplitting and BashPitfalls).  Also, always prefix globs with "./"; otherwise, if there's a file with "-" as the first character, the expansions might be misinterpreted as options.
-Line 35:
+Line 32:
-You could do the same thing for all files (regardless of extension) by using
+You could do the same thing for all files with spaces in their names (regardless of extension) by using
-Line 38:
+Line 35:
-for file in *\ *; do
+for file in ./*\ *; do
-Line 40:
+Line 37:
-Line 43:
+Line 39:
-Another way to handle filenames recursively involes using the {{{-print0}}} option of {{{find}}} (a GNU/BSD extension), together with bash's {{{-d}}} option for read:
+Another way to handle filenames recursively involves using the {{{-print0}}} option of {{{find}}} (a GNU/BSD extension), together with bash's {{{-d}}} option for read:
-Line 46:
+Line 42:
+# Bash
-Line 47:
+Line 44:
-while read -d $'\0' file; do
+while IFS= read -r -d $'\0' file; do
-Line 51:
+Line 48:
+The preceding example reads all the files under `/tmp` (recursively) into an array, even if they have newlines or other whitespace in their names, by forcing {{{read}}} to use the NUL byte (\0) as its line delimiter.  Since NUL is not a valid byte in Unix filenames, this is the safest approach besides using {{{find -exec}}}.  `IFS=` is required to avoid trimming leading/trailing whitespace, and `-r` is needed to avoid backslash processing.  In fact, `$'\0'` is equivalent to `''` so we could also write it like this:
-Line 52:
+Line 50:
-The preceding example reads all the files under /tmp (recursively) into an array, even if they have newlines or other whitespace in their names, by forcing {{{read}}} to use the NUL byte (\0) as its word delimiter.  Since NUL is not a valid byte in Unix filenames, this is the safest approach besides using {{{find -exec}}}.
+{{{
# Bash
unset a i
while IFS= read -r -d '' file; do
  a[i++]="$file"
done < <(find /tmp -type f -print0)
}}}
We can also write it in another form, without having to remember the '< <(' syntax  and seeing first what is executed first:

{{{
# Bash
unset a i
find /tmp -type f -print0 | while IFS= read -r -d '' file; do
  a[i++]="$file"
done
}}}