Differences between revisions 3 and 4
Revision 3 as of 2012-07-05 12:31:11
Size: 3128
Editor: ormaaj
Comment: This article is too specific
Revision 4 as of 2013-12-28 15:50:51
Size: 3143
Editor: cpe-66-68-20-30
Comment: expanded an acronym that the general reader wouldn't be expected to recognize
Deletions are marked like this. Additions are marked like this.
Line 65: Line 65:
And numerous other common issues such as ParsingLs, [[BashFAQ/050|commands in strings]], and many [[Quotes|quoting]] problems. ''DRLWF'' is just one manifestation of the general problem of trying to translate an output stream into a collection via wordsplitting and pathname expansion. And numerous other common issues such as ParsingLs, [[BashFAQ/050|commands in strings]], and many [[Quotes|quoting]] problems. ''DontReadLinesWithFor'' is just one manifestation of the general problem of trying to translate an output stream into a collection via wordsplitting and pathname expansion.

Why you don't read lines with "for"

Many people think that they should use a for loop to read the lines of a text file. This is clumsy and inefficient at best, and fails in many cases. You should use a while loop instead. Here is why.

First, the right way:

$ cat afile
ef gh

*
$ while IFS= read -r aline; do echo "$aline" ; done < afile
ef gh

*

Now, trying to use for:

$ for i in $(<afile); do echo "$i"; done
ef
gh
afile
anotherfile
stillanotherfile
the_embarrassing_file_you_forgot_about

As you can see, this attempt to duplicate cat failed in several ways. First, the line with two words was split into two lines of output. Second, the blank line was omitted entirely. Third, the line with a glob was expanded into all your filenames (one per line).

We can try to work around the splitting of the first line by setting IFS; and we can prevent the glob expansion by setting -f:

$ IFS=$'\n'; set -f; for i in $(<afile); do echo "$i"; done; set +f; unset IFS
ef gh
*

Notice that the syntax is now longer than that of the while loop -- and we still lost the blank line! As discussed in IFS and in FAQ #5, the use of IFS=$'\n' (or any other "whitespace" in IFS) causes the shell to consolidate all consecutive instances of the whitespace delimiter into one. In other words, it skips over blank lines.

There is no workaround for this. You cannot possibly preserve blank lines if you are relying on IFS to split on newlines.

Another issue with setting IFS in this way is that it will retain its setting during the body of the loop. This may be undesirable if your loop body is more complex than echo "$i", as you will have a nonstandard IFS in effect, possibly leading to unpleasant surprises.

The final issue with reading lines with for is inefficiency. A while read loop reads one line at a time from an input stream; $(<afile) slurps the entire file into memory all at once. For small files, this is not a problem, but if you're reading large files, the memory requirement will be enormous. (Bash will have to allocate one string to hold the file, and another set of strings to hold the word-split results... essentially, the memory allocated will be twice the size of the input file.)

Please see Bash FAQ 1 for more examples of how to read a file properly.

Oh, and by the way...

Everything mentioned above also applies equally to patterns like:

arr=($(cmd))

or

IFS=$'\n'; arr=($(cat file))

or

printf ... $(printf ...)

And numerous other common issues such as ParsingLs, commands in strings, and many quoting problems. DontReadLinesWithFor is just one manifestation of the general problem of trying to translate an output stream into a collection via wordsplitting and pathname expansion.

TODO: Write a "big picture" article which explains the theory and bundles all these pitfalls. Until then, read Arguments, and IFS


CategoryShell

DontReadLinesWithFor (last edited 2016-12-07 22:21:19 by StephaneChazelas)