Diff for "BashFAQ/019"

Differences between revisions 1 and 2

How can I split a file into line ranges, e.g. lines 1-10, 11-20, 21-30?

Some Unix systems provide the split utility for this purpose:

    split --lines 10 --numeric-suffixes input.txt output-

For more flexibility you can use sed. The sed command can print e.g. the line number range 1-10:

    sed -n '1,10p'

This stops sed from printing each line (-n). Instead it only processes the lines in the range 1-10 ("1,10"), and prints them ("p"). sed still reads the input until the end, although we are only interested in lines 1 though 10. We can speed this up by making sed terminate immediately after printing line 10:

    sed -n -e '1,10p' -e '10q'

Now the command will quit after reading line 10 ("10q"). The -e arguments indicate a script (instead of a file name). The same can be written a little shorter:

    sed -n '1,10p;10q'

We can now use this to print an arbitrary range of a file (specified by line number):

# POSIX shell
file=/etc/passwd
range=10
cur=1
last=$(wc -l < "$file") # count number of lines
chunk=1
while [ $cur -lt $last ]
do
    endofchunk=$(($cur + $range - 1))
    sed -n -e "$cur,${endofchunk}p" -e "${endofchunk}q" "$file" > chunk.$(printf %04d $chunk)
    chunk=$(($chunk + 1))
    cur=$(($cur + $range))
done

The previous example uses POSIX [:ArithmeticExpression:arithmetic], which older [:BourneShell:Bourne shells] do not have. In that case the following example should be used instead:

# legacy Bourne shell; assume no printf either
file=/etc/passwd
range=10
cur=1
last=`wc -l < "$file"` # count number of lines
chunk=1
while test $cur -lt $last
do
    endofchunk=`expr $cur + $range - 1`
    sed -n -e "$cur,${endofchunk}p" -e "${endofchunk}q" "$file" > chunk.$chunk
    chunk=`expr $chunk + 1`
    cur=`expr $cur + $range`
done

-  ⇤ ← Revision 1 as of 2007-05-02 23:05:11 → 
  Size: 1833
  Editor: redondos
  Comment:
+   ← Revision 2 as of 2008-02-28 21:36:40 → ⇥
  Size: 1952
  Editor: GreyCat
  Comment: POSIX arithmetic instead of bash; also, both examples were WRONG.  Rewrite.
-Deletions are marked like this.
+Additions are marked like this.
 Line 29:
+# POSIX shell
-Line 31:
+Line 32:
-firstline=1
maxlines=$(wc -l < "$file") # count number of lines
while (($firstline < $maxlines))
+cur=1
last=$(wc -l < "$file") # count number of lines
chunk=1
while [ $cur -lt $last ]
-Line 35:
+Line 37:
-    ((lastline=$firstline+$range+1))
    sed -n -e "$firstline,${lastline}p" -e "${lastline}q" "$file"
    ((firstline=$firstline+$range+1))
+    endofchunk=$(($cur + $range - 1))
    sed -n -e "$cur,${endofchunk}p" -e "${endofchunk}q" "$file" > chunk.$(printf %04d $chunk)
    chunk=$(($chunk + 1))
    cur=$(($cur + $range))
-Line 41:
+Line 44:
-This example uses ["BASH"] and KornShell ArithmeticExpressions, which older [wiki:Self:BourneShell Bourne shells] do not have. In that case the following example should be used instead:
+The previous example uses POSIX [:ArithmeticExpression:arithmetic], which older [:BourneShell:Bourne shells] do not have. In that case the following example should be used instead:
-Line 44:
+Line 47:
+# legacy Bourne shell; assume no printf either
-Line 46:
+Line 50:
-firstline=1
maxlines=`wc -l < "$file"` # count line numbers
while [ $firstline -le $maxlines ]
+cur=1
last=`wc -l < "$file"` # count number of lines
chunk=1
while test $cur -lt $last
-Line 50:
+Line 55:
-    lastline=`expr $firstline + $range + 1`
    sed -n -e "$firstline,${lastline}p" -e "${lastline}q" "$file"
    firstline=`expr $lastline + 1`
+    endofchunk=`expr $cur + $range - 1`
    sed -n -e "$cur,${endofchunk}p" -e "${endofchunk}q" "$file" > chunk.$chunk
    chunk=`expr $chunk + 1`
    cur=`expr $cur + $range`