Diff for "BashFAQ/019"

Differences between revisions 4 and 158 (spanning 154 versions)

How can I split a file into line ranges, e.g. lines 1-10, 11-20, 21-30?

Some Unix systems provide the split utility for this purpose:

   1 split --lines 10 --numeric-suffixes input.txt output-

For more flexibility you can use sed. The sed command can print e.g. the line number range 1-10:

   1 sed 10q         # Print lines 1-10 and then quit.
   2 sed '1,5d; 10q' # Print just lines 6-10 by filtering the first 5 then quitting after 10.

The d command stops sed from printing each line. This could alternatively have been done by passing sed the -n option and printing lines with the p command rather than deleting them with d. It makes no difference.

We can now use this to print an arbitrary range of a file (specified by line number):

   1 # POSIX shell
   2 file=/etc/passwd
   3 range=10
   4 cur=1
   5 last=$(wc -l < "$file") # count number of lines
   6 chunk=1
   7 while [ $cur -lt $last ]
   8 do
   9     endofchunk=$(($cur + $range - 1))
  10     sed -n -e "$cur,${endofchunk}p" -e "${endofchunk}q" "$file" > chunk.$(printf %04d $chunk)
  11     chunk=$(($chunk + 1))
  12     cur=$(($cur + $range))
  13 done

The previous example uses POSIX arithmetic, which older Bourne shells do not have. In that case the following example should be used instead:

   1 # legacy Bourne shell; assume no printf either
   2 file=/etc/passwd
   3 range=10
   4 cur=1
   5 last=`wc -l < "$file"` # count number of lines
   6 chunk=1
   7 while test $cur -lt $last
   8 do
   9     endofchunk=`expr $cur + $range - 1`
  10     sed -n -e "$cur,${endofchunk}p" -e "${endofchunk}q" "$file" > chunk.$chunk
  11     chunk=`expr $chunk + 1`
  12     cur=`expr $cur + $range`
  13 done

Awk can also be used to produce a more or less equivalent result:

   1 awk -v range=10 '{print > FILENAME "." (int((NR -1)/ range)+1)}' file

CategoryShell

-  ⇤ ← Revision 4 as of 2008-10-28 22:20:04 → 
  Size: 395
  Editor: cpc1-barn10-0-0-cust401
  Comment: -9
+   ← Revision 158 as of 2016-02-08 11:35:04 → ⇥
  Size: 1973
  Editor: ormaaj
  Comment: maybe d instead of D?
-Deletions are marked like this.
+Additions are marked like this.
 Line 1:
-Hello, im from Gergia. It has been two years of agonising pain and incredible hard work but YOU'VE DONE IT! We are all very proud of you and the way that you have used your terrible accident to help others who have found themselves amputees antique style handheld mirror Perfect site, i like it! http://diamonds83.free-site-host.com/ latest female fashion tre,  see you
+<<Anchor(faq19)>>
== How can I split a file into line ranges, e.g. lines 1-10, 11-20, 21-30? ==
Some Unix systems provide the {{{split}}} utility for this purpose:

{{{#!highlight bash
split --lines 10 --numeric-suffixes input.txt output-
}}}

For more flexibility you can use {{{sed}}}.  The {{{sed}}} command can print e.g. the line number range 1-10:
{{{#!highlight bash
sed 10q         # Print lines 1-10 and then quit.
sed '1,5d; 10q' # Print just lines 6-10 by filtering the first 5 then quitting after 10.
}}}

The `d` command stops {{{sed}}} from printing each line. This could alternatively have been done by passing sed the `-n` option and printing lines with the `p` command rather than deleting them with `d`. It makes no difference.

We can now use this to print an arbitrary range of a file (specified by line number):

{{{#!highlight bash
# POSIX shell
file=/etc/passwd
range=10
cur=1
last=$(wc -l < "$file") # count number of lines
chunk=1
while [ $cur -lt $last ]
do
    endofchunk=$(($cur + $range - 1))
    sed -n -e "$cur,${endofchunk}p" -e "${endofchunk}q" "$file" > chunk.$(printf %04d $chunk)
    chunk=$(($chunk + 1))
    cur=$(($cur + $range))
done
}}}

The previous example uses POSIX [[ArithmeticExpression|arithmetic]], which older [[BourneShell|Bourne shells]] do not have. In that case the following example should be used instead:

{{{#!highlight bash
# legacy Bourne shell; assume no printf either
file=/etc/passwd
range=10
cur=1
last=`wc -l < "$file"` # count number of lines
chunk=1
while test $cur -lt $last
do
    endofchunk=`expr $cur + $range - 1`
    sed -n -e "$cur,${endofchunk}p" -e "${endofchunk}q" "$file" > chunk.$chunk
    chunk=`expr $chunk + 1`
    cur=`expr $cur + $range`
done
}}}

Awk can also be used to produce a more or less equivalent result:

{{{#!highlight bash
awk -v range=10 '{print > FILENAME "." (int((NR -1)/ range)+1)}' file
}}}
-Line 3:
+Line 60:
-CategoryHomepage
+CategoryShell