Diff for "BashFAQ/019"

Differences between revisions 2 and 160 (spanning 158 versions)

How can I split a file into line ranges, e.g. lines 1-10, 11-20, 21-30?

POSIX specifies the split utility, which can be used for this purpose:

   1 split -l 10 input.txt

For more flexibility you can use sed. The sed command can print e.g. the line number range 1-10:

   1 sed 10q         # Print lines 1-10 and then quit.
   2 sed '1,5d; 10q' # Print just lines 6-10 by filtering the first 5 then quitting after 10.

The d command stops sed from printing each line. This could alternatively have been done by passing sed the -n option and printing lines with the p command rather than deleting them with d. It makes no difference.

We can now use this to print an arbitrary range of a file (specified by line number):

   1 # POSIX shell
   2 file=/etc/passwd
   3 range=10
   4 cur=1
   5 last=$(wc -l < "$file") # count number of lines
   6 chunk=1
   7 while [ "$cur" -lt "$last" ]
   8 do
   9     endofchunk=$((cur + range - 1))
  10     sed -n -e "$cur,${endofchunk}p" -e "${endofchunk}q" "$file" > chunk.$(printf %04d $chunk)
  11     chunk=$((chunk + 1))
  12     cur=$((cur + range))
  13 done

The previous example uses POSIX arithmetic, which older Bourne shells do not have. In that case the following example should be used instead:

   1 # legacy Bourne shell; assume no printf either
   2 file=/etc/passwd
   3 range=10
   4 cur=1
   5 last=`wc -l < "$file"` # count number of lines
   6 chunk=1
   7 while test "$cur" -lt "$last"
   8 do
   9     endofchunk=`expr $cur + $range - 1`
  10     sed -n -e "$cur,${endofchunk}p" -e "${endofchunk}q" "$file" > "chunk.$chunk"
  11     chunk=`expr $chunk + 1`
  12     cur=`expr $cur + $range`
  13 done

Awk can also be used to produce a more or less equivalent result:

   1 awk -v range=10 '{print > FILENAME "." (int((NR -1)/ range)+1)}' file

CategoryShell

-  ⇤ ← Revision 2 as of 2008-02-28 21:36:40 → 
  Size: 1952
  Editor: GreyCat
  Comment: POSIX arithmetic instead of bash; also, both examples were WRONG.  Rewrite.
+   ← Revision 160 as of 2022-04-19 05:38:41 → ⇥
  Size: 1955
  Editor: emanuele6
  Comment: quote expansions where necessary and don't use `$' in arithmetic contexts
-Deletions are marked like this.
+Additions are marked like this.
 Line 1:
-[[Anchor(faq19)]]
+<<Anchor(faq19)>>
 Line 3:
-Some Unix systems provide the {{{split}}} utility for this purpose:
+POSIX specifies the {{{split}}} utility, which can be used for this purpose:
 Line 5:
-{{{
    split --lines 10 --numeric-suffixes input.txt output-
+{{{#!highlight bash
split -l 10 input.txt
 Line 10:
-{{{
    sed -n '1,10p'
+{{{#!highlight bash
sed 10q         # Print lines 1-10 and then quit.
sed '1,5d; 10q' # Print just lines 6-10 by filtering the first 5 then quitting after 10.
-Line 14:
+Line 15:
-This stops {{{sed}}} from printing each line ({{{-n}}}). Instead it only processes the lines in the range 1-10 ("1,10"), and prints them ("p"). {{{sed}}} still reads the input until the end, although we are only interested in lines 1 though 10. We can speed this up by making {{{sed}}} terminate immediately after printing line 10:

{{{
    sed -n -e '1,10p' -e '10q'
}}}

Now the command will quit after reading line 10 ("10q"). The {{{-e}}} arguments indicate a script (instead of a file name). The same can be written a little shorter:

{{{
    sed -n '1,10p;10q'
}}}
+The `d` command stops {{{sed}}} from printing each line. This could alternatively have been done by passing sed the `-n` option and printing lines with the `p` command rather than deleting them with `d`. It makes no difference.
-Line 28:
+Line 19:
-{{{
+{{{#!highlight bash
-Line 35:
+Line 26:
-while [ $cur -lt $last ]
+while [ "$cur" -lt "$last" ]
-Line 37:
+Line 28:
-    endofchunk=$(($cur + $range - 1))
+    endofchunk=$((cur + range - 1))
-Line 39:
+Line 30:
-    chunk=$(($chunk + 1))
    cur=$(($cur + $range))
+    chunk=$((chunk + 1))
    cur=$((cur + range))
-Line 44:
+Line 35:
-The previous example uses POSIX [:ArithmeticExpression:arithmetic], which older [:BourneShell:Bourne shells] do not have. In that case the following example should be used instead:
+The previous example uses POSIX [[ArithmeticExpression|arithmetic]], which older [[BourneShell|Bourne shells]] do not have. In that case the following example should be used instead:
-Line 46:
+Line 37:
-{{{
+{{{#!highlight bash
-Line 53:
+Line 44:
-while test $cur -lt $last
+while test "$cur" -lt "$last"
-Line 56:
+Line 47:
-    sed -n -e "$cur,${endofchunk}p" -e "${endofchunk}q" "$file" > chunk.$chunk
+    sed -n -e "$cur,${endofchunk}p" -e "${endofchunk}q" "$file" > "chunk.$chunk"
-Line 61:
+Line 52:
+Awk can also be used to produce a more or less equivalent result:

{{{#!highlight bash
awk -v range=10 '{print > FILENAME "." (int((NR -1)/ range)+1)}' file
}}}

----
CategoryShell