go back: [[BashFAQwTopics]] <> <> == How can I read a file line-by-line? == {{{ while read line do echo "$line" done < "$file" }}} The {{{read}}} command still modifies each line read, e.g. it removes all leading whitespace characters (blanks, tab characters). If that is not desired, the IFS (internal field separator) variable has to be cleared: {{{ OIFS=$IFS; IFS= while read line do echo "$line" done < "$file" IFS=$OIFS }}} As a feature, the {{{read}}} command concatenates lines that end with a backslash '\' character to one single line. To disable this feature, KornShell and [[BASH]] have {{{read -r}}}: {{{ OIFS=$IFS; IFS= while read -r line do echo "$line" done < "$file" IFS=$OIFS }}} Note that reading a file line by line this way is ''very slow'' for large files. Consider using e.g. [[AWK]] instead if you get performance problems. Sometimes it's useful to read a file into an array, one array element per line. You can do that with the following example: {{{ O=$IFS IFS=$'\n' arr=($(< myfile)) IFS=$O }}} This temporarily changes the Input Field Separator to a newline, so that each line will be considered one field by read. Then it populates the array {{{arr}}} with the fields. Then it sets the {{{IFS}}} back to what it was before. <> == How can I recursively search all files for a string? == On most recent systems (GNU/Linux/BSD), you would use {{{grep -r pattern .}}} to search all files from the current directory (.) downward. You can use {{{find}}} if your {{{grep}}} lacks -r: {{{ find . -type f -exec grep -l "$search" '{}' \; }}} The {} characters will be replaced with the current file name. This command is slower than it needs to be, because {{{find}}} will call {{{grep}}} with only one file name, resulting in many {{{grep}}} invocations (one per file). Since {{{grep}}} accepts multiple file names on the command line, {{{find}}} can be instrumented to call it with several file names at once: {{{ find . -type f -exec grep -l "$search" '{}' \+ }}} The trailing '+' character instructs {{{find}}} to call {{{grep}}} with as many file names as possible, saving processes and resulting in faster execution. This example works for POSIX {{{find}}}, e.g. with Solaris. GNU find uses a helper program called {{{xargs}}} for the same purpose: {{{ find . -type f -print0 | xargs -0 grep -l "$search" }}} The {{{-print0}}} / {{{-0}}} options ensure that any file name can be processed, even ones containing blanks, TAB characters, or new-lines. 90% of the time, all you need is: Have grep recurse and print the lines (GNU grep): {{{ grep -r "$search" . }}} Have grep recurse and print only the names (GNU grep): {{{ grep -r -l "$search" . }}} The {{{find}}} command can be used to run arbitrary commands on every file in a directory (including sub-directories). Replace {{{grep}}} with the command of your choice. The curly braces {} will be replaced with the current file name in the case above. (Note that they must be escaped in some shells, but not in [[BASH]].) <> == How can I split a file into line ranges, e.g. lines 1-10, 11-20, 21-30? == Some Unix systems provide the {{{split}}} utility for this purpose: {{{ split --lines 10 --numeric-suffixes input.txt output- }}} For more flexibility you can use {{{sed}}}. The {{{sed}}} command can print e.g. the line number range 1-10: {{{ sed -n '1,10p' }}} This stops {{{sed}}} from printing each line ({{{-n}}}). Instead it only processes the lines in the range 1-10 ("1,10"), and prints them ("p"). {{{sed}}} still reads the input until the end, although we are only interested in lines 1 though 10. We can speed this up by making {{{sed}}} terminate immediately after printing line 10: {{{ sed -n -e '1,10p' -e '10q' }}} Now the command will quit after reading line 10 ("10q"). The {{{-e}}} arguments indicate a script (instead of a file name). The same can be written a little shorter: {{{ sed -n '1,10p;10q' }}} We can now use this to print an arbitrary range of a file (specified by line number): {{{ file=/etc/passwd range=10 firstline=1 maxlines=$(wc -l < "$file") # count number of lines while (($firstline < $maxlines)) do ((lastline=$firstline+$range+1)) sed -n -e "$firstline,${lastline}p" -e "${lastline}q" "$file" ((firstline=$firstline+$range+1)) done }}} This example uses [[BASH]] and KornShell ArithmeticExpressions, which older [[BourneShell|Bourne shells]] do not have. In that case the following example should be used instead: {{{ file=/etc/passwd range=10 firstline=1 maxlines=`wc -l < "$file"` # count line numbers while [ $firstline -le $maxlines ] do lastline=`expr $firstline + $range + 1` sed -n -e "$firstline,${lastline}p" -e "${lastline}q" "$file" firstline=`expr $lastline + 1` done }}} <> == How can I replace a string with another string in all files? == {{{sed}}} is a good command to replace strings, e.g. {{{ sed 's/olddomain\.com/newdomain\.com/g' input > output }}} To replace a string in all files of the current directory: {{{ for i in *; do sed 's/old/new/g' "$i" > atempfile && mv atempfile "$i" done }}} GNU sed 4.x (but no other version of sed) has a special {{{-i}}} flag which makes the temp file unnecessary: {{{ for i in *; do sed -i 's/old/new/g' "$i" done }}} Those of you who have perl 5 can accomplish the same thing using this code: {{{ perl -pi -e 's/old/new/g' * }}} Recursively: {{{ find . -type f -print0 | xargs -0 perl -pi -e 's/old/new/g' }}} Finally, here's a script that some people may find useful: {{{ : # chtext - change text in several files # neither string may contain '|' unquoted old='olddomain\.com' new='newdomain\.com' # if no files were specified on the command line, use all files: [ $# -lt 1 ] && set -- * for file do [ -f "$file" ] || continue # do not process e.g. directories [ -r "$file" ] || continue # cannot read file - ignore it # Replace string, write output to temporary file. Terminate script in case of errors sed "s|$old|$new|g" "$file" > "$file"-new || exit # If the file has changed, overwrite original file. Otherwise remove copy if cmp "$file" "$file"-new >/dev/null 2>&1 then rm "$file"-new # file nas not changed else mv "$file"-new "$file" # file has changed: overwrite original file fi done }}} If the code above is put into a script file (e.g. {{{chtext}}}), the resulting script can be used to change a text e.g. in all HTML files of the current and all subdirectories: {{{ find . -type f -name '*.html' -exec chtext {} \; }}} Many optimizations are possible: * use another {{{sed}}} separator character than '|', e.g. ^A (ASCII 1) * some implementations of {{{sed}}} (e.g. GNU sed) have an "-i" option that can change a file in-place; no temporary file is necessary in that case * the {{{find}}} command above could use either {{{xargs}}} or the built-in {{{xargs}}} of POSIX find Note: {{{set -- *}}} in the code above is safe with respect to files whose names contain spaces. The expansion of * by {{{set}}} is the same as the expansion done by {{{for}}}, and filenames will be preserved properly as individual parameters, and not broken into words on whitespace. A more sophisticated example of {{{chtext}}} is here: http://www.shelldorado.com/scripts/cmds/chtext <> == How can I randomize (shuffle) the order of lines in a file? == {{{ randomize(){ while read l ; do echo "0$RANDOM $l" ; done | sort -n | cut -d" " -f2- } }}} Note: the leading 0 is to make sure it doesnt break if the shell doesnt support $RANDOM, which is supported by [[BASH]], KornShell, KornShell93 and [[POSIX]] shell, but not BourneShell. The same idea (printing random numbers in front of a line, and sorting the lines on that column) using other programs: {{{ awk ' BEGIN { srand() } { print rand() "\t" $0 } ' | sort -n | # Sort numerically on first (random number) column cut -f2- # Remove sorting column }}} This is faster then the previous solution, but will not work for very old AWK implementations (try "nawk", or "gawk", if available). go back: [[BashFAQwTopics]]