Diff for "BashFAQ/079"

Differences between revisions 1 and 26 (spanning 25 versions)

How can I grep for lines containing foo AND bar, foo OR bar? Or for files containing foo AND bar, possibly on separate lines?

This is really three different questions, so we'll break this answer into three parts.

foo AND bar on the same line

The easiest way to match lines that contain both foo AND bar is to use two grep commands:

grep foo | grep bar
grep foo "$myfile" | grep bar   # for those who need the hand-holding

It can also be done with one grep, although (as you can probably guess) this doesn't really scale well to more than two patterns:

grep -E 'foo.*bar|bar.*foo'

If you prefer, you can achieve this in one sed or awk statement:

sed -n '/foo/{/bar/p}'
awk '/foo/ && /bar/'

If you need to scale the awk solution to an arbitrary number of patterns, you can write a function like this:

# POSIX
multimatch() { # usage: multimatch pattern... 
  awk '
    BEGIN {
      for ( i = 1; i < ARGC; i++ )
        a[i] = ARGV[i]
      ARGC = 1
    }
    {
      for (i in a)
        if ($0 !~ a[i])
          next
      print
    }' "$@"
}

foo OR bar on the same line

There are lots of ways to match lines containing foo OR bar. grep can be given multiple patterns with -e:

grep -e 'foo' -e 'bar'

Or you can construct one pattern with grep -E:

grep -E 'foo|bar'

(You can't use the | union operator with plain grep. | is only available in Extended Regular Expressions.)

It can also be done with sed, awk, etc.

awk '/foo|bar/'

The awk approach has the advantage of letting you use awk's other features on the matched lines, such as extracting only certain fields.

To match lines that do not contain "foo" AND do not contain "bar":

grep -E -v 'foo|bar'
# some people prefer grep -E -v 'foo|bar'

foo AND bar in the same file, not necessarily on the same line

If you want to match files (rather than lines) that contain both "foo" and "bar", there are several possible approaches. The simplest (although not necessarily the most efficient) is to read the file twice:

grep -q foo "$myfile" && grep -q bar "$myfile" && echo "Found both"

The double grep -q solution has the advantage of stopping each read whenever it finds a match; so if you have a huge file, but the matched words are both near the top, it will only read the first part of the file. Unfortunately, if the matches are near the bottom (worst case: very last line of the file), you may read the whole file two times.

Another approach is to read the file once, keeping track of what you've seen as you go along. In awk:

awk '/foo/{a=1} /bar/{b=1} a&&b{print "both found";exit} END{if (a&&b){ exit 0} else{exit 1}}'

It reads the file one time, stopping when both patterns have been matched. No matter what happens, the END block is then executed, and the exit status is set accordingly.

If you want to do additional checking of the file's contents, this awk solution can be adapted quite easily.

A perl one-liner that scales to any number of patterns, while also reading each input file only once:

perl -e '@pat=("foo","bar"); local $/; L: for $f (@ARGV){open(FH,,$f); $a=<FH>; for(@pat){next L unless $a =~ $_} print "$f\n"}'

-  ⇤ ← Revision 1 as of 2007-05-03 00:16:03 → 
  Size: 478
  Editor: redondos
  Comment:
+   ← Revision 26 as of 2014-11-27 12:49:03 → ⇥
  Size: 3450
  Editor: geirha
  Comment: typo
-Deletions are marked like this.
+Additions are marked like this.
 Line 1:
-[[Anchor(faq79)]]
== How can I grep for lines containing foo AND bar, foo OR bar? ==
+<<Anchor(faq79)>>
== How can I grep for lines containing foo AND bar, foo OR bar?  Or for files containing foo AND bar, possibly on separate lines? ==
This is really three different questions, so we'll break this answer into three parts.
-Line 4:
+Line 5:
-Well, for lines containing foo AND bar, two grep statements are needed.
+=== foo AND bar on the same line ===

The easiest way to match lines that contain both foo AND bar is to use two {{{grep}}} commands:
-Line 7:
+Line 10:
-grep foo| grep bar
+grep foo | grep bar
grep foo "$myfile" | grep bar   # for those who need the hand-holding
-Line 10:
+Line 14:
-If you prefer, you can achieve this in one sed, or awk statement.
+It can also be done with one {{{grep}}}, although (as you can probably guess) this doesn't really scale well to more than two patterns:

{{{
grep -E 'foo.*bar|bar.*foo'
}}}

If you prefer, you can achieve this in one {{{sed}}} or {{{awk}}} statement:
-Line 17:
+Line 27:
-And for lines containing foo OR bar, grep can do it "nicely", but it can also be done with sed, awk, etc.
+If you need to scale the awk solution to an arbitrary number of patterns, you can write a function like this:
-Line 20:
+Line 30:
-egrep 'foo|bar'
+# POSIX
multimatch() { # usage: multimatch pattern... 
  awk '
    BEGIN {
      for ( i = 1; i < ARGC; i++ )
        a[i] = ARGV[i]
      ARGC = 1
    }
    {
      for (i in a)
        if ($0 !~ a[i])
          next
      print
    }' "$@"
}
}}}

=== foo OR bar on the same line ===

There are lots of ways to match lines containing foo OR bar.  `grep` can be given multiple patterns with `-e`:

{{{
grep -e 'foo' -e 'bar'
}}}

Or you can construct one pattern with {{{grep -E}}}:

{{{
-Line 23:
+Line 60:
+(You can't use the `|` union operator with plain `grep`.  `|` is only available in [[RegularExpression|Extended Regular Expressions]].)

It can also be done with {{{sed}}}, {{{awk}}}, etc.

{{{
awk '/foo|bar/'
}}}

The `awk` approach has the advantage of letting you use `awk`'s other features on the matched lines, such as extracting only certain fields.

To match lines that do not contain "foo" AND do not contain "bar":

{{{
grep -E -v 'foo|bar'
# some people prefer grep -E -v 'foo|bar'
}}}

=== foo AND bar in the same file, not necessarily on the same line ===

If you want to match ''files'' (rather than ''lines'') that contain both "foo" and "bar", there are several possible approaches.  The simplest (although not necessarily the most efficient) is to read the file twice:

{{{
grep -q foo "$myfile" && grep -q bar "$myfile" && echo "Found both"
}}}
The double {{{grep -q}}} solution has the advantage of stopping each read whenever it finds a match; so if you have a huge file, but the matched words are both near the top, it will only read the first part of the file.  Unfortunately, if the matches are near the bottom (worst case: very last line of the file), you may read the whole file two times.

Another approach is to read the file once, keeping track of what you've seen as you go along.  In awk:

{{{
awk '/foo/{a=1} /bar/{b=1} a&&b{print "both found";exit} END{if (a&&b){ exit 0} else{exit 1}}'
}}}
It reads the file one time, stopping when both patterns have been matched.  No matter what happens, the END block is then executed,  and the exit status is set accordingly.

If you want to do additional checking of the file's contents, this awk solution can be adapted quite easily.

A perl one-liner that scales to any number of patterns, while also reading each input file only once:

{{{
perl -e '@pat=("foo","bar"); local $/; L: for $f (@ARGV){open(FH,,$f); $a=<FH>; for(@pat){next L unless $a =~ $_} print "$f\n"}'
}}}