Differences between revisions 6 and 35 (spanning 29 versions)
Revision 6 as of 2008-02-13 18:18:54
Size: 2591
Editor: MrIgli
Comment: minor: added another awk example, to discourage ``grep foo | cut..''
Revision 35 as of 2023-01-26 22:53:31
Size: 4193
Editor: emanuele6
Comment: fix formatting
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
[[Anchor(faq79)]]
== How can I grep for lines containing foo AND bar, foo OR bar? Or for files containing foo AND bar, possibly on separate lines? ==
<<Anchor(faq79)>>
== How can I grep for lines containing foo AND bar, foo OR bar? Or for files containing foo AND bar, possibly on separate lines?  Or files containing foo but NOT bar? ==
This is really four different questions, so we'll break this answer into parts.

=== foo AND bar on the same line ===
Line 6: Line 9:
{{{ {{{#!highlight bash
Line 8: Line 11:
grep foo "$myfile" | grep bar # for those who need the hand-holding grep foo -- "$myfile" | grep bar # for those who need the hand-holding
Line 11: Line 14:
It can also be done with one {{{egrep}}}, although (as you can probably guess) this doesn't really scale well to more than two patterns: It can also be done with one {{{grep}}}, although (as you can probably guess) this doesn't really scale well to more than two patterns:
Line 13: Line 16:
{{{
egrep 'foo.*bar|bar.*foo'
{{{#!highlight bash
grep -E 'foo.*bar|bar.*foo'
Line 17: Line 20:
If you prefer, you can achieve this in one {{{sed}}} or {{{awk}}} statement. (The {{{awk}}} example is probably the most scalable.) If you prefer, you can achieve this in one {{{sed}}} or {{{awk}}} statement:
Line 19: Line 22:
{{{
sed -n '/foo/{/bar/p}'
{{{#!highlight bash
sed '/foo/!d; /bar/!d'
Line 24: Line 27:
To match lines containing foo OR bar, {{{egrep}}} is the natural choice, but it can also be done with {{{sed}}}, {{{awk}}}, etc. If you need to scale the awk solution to an arbitrary number of patterns, you can write a function like this:
Line 26: Line 29:
{{{
egrep 'foo|bar'
# some people prefer grep -E 'foo|bar'
{{{#!highlight sh
# POSIX
multimatch() { # usage: multimatch pattern...
  awk '
    BEGIN {
      for ( i = 1; i < ARGC; i++ )
        a[i] = ARGV[i]
      ARGC = 1
    }
    {
      for (i in a)
        if ($0 !~ a[i])
          next
      print
    }' "$@"
}
}}}
Line 30: Line 47:
# This is another option, some people prefer: === foo OR bar on the same line ===

There are lots of ways to match lines containing foo OR bar. `grep` can be given multiple patterns with `-e`:

{{{#!highlight bash
Line 32: Line 53:
}}}
Line 33: Line 55:
# awk equivalent (eg if you want to extract fields) Or you can separate the patterns with newlines:

{{{#!highlight bash
grep 'foo
bar'
}}}

Or you can construct one pattern with {{{grep -E}}}:

{{{#!highlight bash
grep -E 'foo|bar'
}}}

(You can't use the `|` union operator with plain `grep`. `|` is only available in [[RegularExpression|Extended Regular Expressions]].)

It can also be done with {{{sed}}}, {{{awk}}}, etc.

{{{#!highlight bash
sed -n -e '/foo/{ p; d; }' -e '/bar/{ p; d; }'
Line 37: Line 77:
{{{egrep}}} is the oldest and most portable form of the {{{grep}}} command using Extended Regular Expressions (EREs). {{{-E}}} is a POSIX-required switch. The `awk` approach has the advantage of letting you use `awk`'s other features on the matched lines, such as extracting only certain fields.

To match lines that do not contain "foo" AND do not contain "bar":

{{{#!highlight bash
grep -E -v 'foo|bar'
}}}

Or using {{{sed}}}, or {{{awk}}:
{{{#!highlight bash
awk '!/foo|bar/'
sed -e '/foo/d' -e '/bar/d'
}}}

=== foo AND bar in the same file, not necessarily on the same line ===
Line 41: Line 95:
{{{
grep -q foo "$myfile" && grep -q bar "$myfile" && echo "Found both"
{{{#!highligh bash
if grep -q foo "$myfile" && grep -q bar "$myfile"; then
  printf 'Found both\n'
fi
}}}
The double {{{grep -q}}} solution has the advantage of stopping each read whenever it finds a match; so if you have a huge file, but the matched words are both near the top, it will only read the first part of the file. Unfortunately, if the matches are near the bottom (worst case: very last line of the file), you may read the whole file two times.

Another approach is to read the file once, keeping track of what you've seen as you go along. In awk:

{{{#!highligh bash
if awk '/foo/{a=1} /bar/{b=1} a&&b{exit} END{if(a&&b){exit 0};exit 1}' "$myfile"; then
  printf 'Found both\n'
fi
}}}
It reads the file one time, stopping when both patterns have been matched. No matter what happens, the END block is then executed, and the exit status is set accordingly.

If you want to do additional checking of the file's contents, this awk solution can be adapted quite easily.

A perl one-liner that scales to any number of patterns, while also reading each input file only once:

{{{#!highligh bash
perl -e '@pat=("foo","bar"); local $/; L: for $f (@ARGV){open(FH,,$f); $a=<FH>; for(@pat){next L unless $a =~ $_} print "$f\n"}'
Line 45: Line 119:
Another approach is to read the file once, keeping track of what you've seen as you go along. There are several ways to do this in awk - the first example reads the whole file, and, after it reads the whole file, it checks if both were found: === foo but NOT bar in the same file, possibly on different lines ===
Line 47: Line 121:
{{{
awk '/foo/ { foo=1 } /bar/ { bar=1 } END { if (foo && bar) print "found both" }'
This is a variant of the previous case. The advantage here is that if we find "bar", we can stop reading. Here's an awk solution:

{{{#!highligh bash
awk '/foo/{good=1} /bar/{good=0;exit} END{exit !good}'
Line 50: Line 126:

The second, more efficient one avoids reading the whole file by checking if the other string was already matched, and, if so, exiting:

{{{
awk 'function found() { print "Found both!"; exit } /foo/ { a=1; if (b) found() } /bar/ { b=1; if (a) found() }'
}}}

The double {{{grep -q}}} solution has the advantage of stopping each read whenever it finds a match; so if you have a huge file, but the matched words are both near the top, it will only read the first part of the file. The first awk solution reads the whole file one time, while the second one stops reading the file at the second match; if you want to do additional checking of the file contents, the awk solution can be adapted far more readily.

How can I grep for lines containing foo AND bar, foo OR bar? Or for files containing foo AND bar, possibly on separate lines? Or files containing foo but NOT bar?

This is really four different questions, so we'll break this answer into parts.

foo AND bar on the same line

The easiest way to match lines that contain both foo AND bar is to use two grep commands:

   1 grep foo | grep bar
   2 grep foo -- "$myfile" | grep bar   # for those who need the hand-holding

It can also be done with one grep, although (as you can probably guess) this doesn't really scale well to more than two patterns:

   1 grep -E 'foo.*bar|bar.*foo'

If you prefer, you can achieve this in one sed or awk statement:

   1 sed '/foo/!d; /bar/!d'
   2 awk '/foo/ && /bar/'

If you need to scale the awk solution to an arbitrary number of patterns, you can write a function like this:

   1 # POSIX
   2 multimatch() { # usage: multimatch pattern...
   3   awk '
   4     BEGIN {
   5       for ( i = 1; i < ARGC; i++ )
   6         a[i] = ARGV[i]
   7       ARGC = 1
   8     }
   9     {
  10       for (i in a)
  11         if ($0 !~ a[i])
  12           next
  13       print
  14     }' "$@"
  15 }

foo OR bar on the same line

There are lots of ways to match lines containing foo OR bar. grep can be given multiple patterns with -e:

   1 grep -e 'foo' -e 'bar'

Or you can separate the patterns with newlines:

   1 grep 'foo
   2 bar'

Or you can construct one pattern with grep -E:

   1 grep -E 'foo|bar'

(You can't use the | union operator with plain grep. | is only available in Extended Regular Expressions.)

It can also be done with sed, awk, etc.

   1 sed -n -e '/foo/{ p; d; }' -e '/bar/{ p; d; }'
   2 awk '/foo|bar/'

The awk approach has the advantage of letting you use awk's other features on the matched lines, such as extracting only certain fields.

To match lines that do not contain "foo" AND do not contain "bar":

   1 grep -E -v 'foo|bar'

Or using sed, or {awk:

   1 awk '!/foo|bar/'
   2 sed -e '/foo/d' -e '/bar/d'

foo AND bar in the same file, not necessarily on the same line

If you want to match files (rather than lines) that contain both "foo" and "bar", there are several possible approaches. The simplest (although not necessarily the most efficient) is to read the file twice:

if grep -q foo "$myfile" && grep -q bar "$myfile"; then
  printf 'Found both\n'
fi

The double grep -q solution has the advantage of stopping each read whenever it finds a match; so if you have a huge file, but the matched words are both near the top, it will only read the first part of the file. Unfortunately, if the matches are near the bottom (worst case: very last line of the file), you may read the whole file two times.

Another approach is to read the file once, keeping track of what you've seen as you go along. In awk:

if awk '/foo/{a=1} /bar/{b=1} a&&b{exit} END{if(a&&b){exit 0};exit 1}' "$myfile"; then
  printf 'Found both\n'
fi

It reads the file one time, stopping when both patterns have been matched. No matter what happens, the END block is then executed, and the exit status is set accordingly.

If you want to do additional checking of the file's contents, this awk solution can be adapted quite easily.

A perl one-liner that scales to any number of patterns, while also reading each input file only once:

perl -e '@pat=("foo","bar"); local $/; L: for $f (@ARGV){open(FH,,$f); $a=<FH>; for(@pat){next L unless $a =~ $_} print "$f\n"}'

foo but NOT bar in the same file, possibly on different lines

This is a variant of the previous case. The advantage here is that if we find "bar", we can stop reading. Here's an awk solution:

awk '/foo/{good=1} /bar/{good=0;exit} END{exit !good}'

BashFAQ/079 (last edited 2023-01-26 22:54:33 by emanuele6)