Diff for "BashFAQ/079"

Differences between revisions 9 and 14 (spanning 5 versions)

How can I grep for lines containing foo AND bar, foo OR bar? Or for files containing foo AND bar, possibly on separate lines?

The easiest way to match lines that contain both foo AND bar is to use two grep commands:

grep foo | grep bar
grep foo "$myfile" | grep bar   # for those who need the hand-holding

It can also be done with one egrep, although (as you can probably guess) this doesn't really scale well to more than two patterns:

egrep 'foo.*bar|bar.*foo'

If you prefer, you can achieve this in one sed or awk statement:

sed -n '/foo/{/bar/p}'
awk '/foo/ && /bar/'

If you need to scale the awk solution to an arbitrary number of patterns, you can construct the awk command on the fly:

# bash, ksh93
# Constructs awk "/$1/&&/$2/&&...."
# Data to be matched should be on stdin.
# Writes matching lines to stdout.
multimatch() {
  (($# < 2)) && { echo "usage: multimatch pat1 pat2 [...]" >&2; return 1; }
  awk "/$1/$(printf "&&/%s/" "${@:2}")"
}

To match lines containing foo OR bar, egrep is the natural choice, but it can also be done with sed, awk, etc.

egrep 'foo|bar'
# some people prefer grep -E 'foo|bar'

# This is another option, some people prefer:
grep -e 'foo' -e 'bar'

# awk equivalent (eg if you want to extract fields)
awk '/foo|bar/'

egrep is the oldest and most portable form of the grep command using Extended Regular Expressions (EREs). grep -E is required by POSIX.

To match lines that do not contain "foo" AND do not contain "bar":

grep -E -v 'foo|bar'
# some people prefer egrep -v 'foo|bar'

If you want to match files (rather than lines) that contain both "foo" and "bar", there are several possible approaches. The simplest (although not necessarily the most efficient) is to read the file twice:

grep -q foo "$myfile" && grep -q bar "$myfile" && echo "Found both"

The double grep -q solution has the advantage of stopping each read whenever it finds a match; so if you have a huge file, but the matched words are both near the top, it will only read the first part of the file. Unfortunately, if the matches are near the bottom (worst case: very last line of the file), you may read the whole file two times.

Another approach is to read the file once, keeping track of what you've seen as you go along. In awk:

awk '/foo/{a=1} /bar/{b=1} a&&b{print "both found";exit} END{if (a&&b){ exit 0} else{exit 1}}'

It reads the file one time, stopping when both patterns have been matched. No matter what happens, the END block is then executed, and the exit status is set accordingly.

If you want to do additional checking of the file's contents, this awk solution can be adapted quite easily.

-  ⇤ ← Revision 9 as of 2008-06-04 06:29:41 → 
  Size: 2458
  Editor: pgas
  Comment: remove the useless awk example, use simpler good exemple, all this imho
+   ← Revision 14 as of 2009-03-24 14:34:09 → ⇥
  Size: 2885
  Editor: GreyCat
  Comment: remove extraneous blank line that broke the bot's parsing
-Deletions are marked like this.
+Additions are marked like this.
 Line 1:
-[[Anchor(faq79)]]
+<<Anchor(faq79)>>
 Line 3:
-Line 10:
+Line 9:
-Line 16:
+Line 14:
-If you prefer, you can achieve this in one {{{sed}}} or {{{awk}}} statement.  (The {{{awk}}} example is probably the most scalable.)
+If you prefer, you can achieve this in one {{{sed}}} or {{{awk}}} statement:
-Line 23:
+Line 20:
+If you need to scale the awk solution to an arbitrary number of patterns, you can construct the awk command on the fly:
-Line 24:
+Line 22:
+{{{
# bash, ksh93
# Constructs awk "/$1/&&/$2/&&...."
# Data to be matched should be on stdin.
# Writes matching lines to stdout.
multimatch() {
  (($# < 2)) && { echo "usage: multimatch pat1 pat2 [...]" >&2; return 1; }
  awk "/$1/$(printf "&&/%s/" "${@:2}")"
}
}}}
-Line 36:
+Line 44:
+{{{egrep}}} is the oldest and most portable form of the {{{grep}}} command using [[RegularExpression|Extended Regular Expressions (EREs)]].  {{{grep -E}}} is required by POSIX.
-Line 37:
+Line 46:
-{{{egrep}}} is the oldest and most portable form of the {{{grep}}} command using [:RegularExpression:Extended Regular Expressions (EREs)].  {{{grep -E}}} is required by POSIX.
+To match lines that do not contain "foo" AND do not contain "bar":
-Line 39:
+Line 48:
+{{{
grep -E -v 'foo|bar'
# some people prefer egrep -v 'foo|bar'
}}}
-Line 44:
+Line 57:
-Line 47:
+Line 59:
-Another approach is to read the file once, keeping track of what you've seen as you go along. There are several ways to do this, for instance in awk:
+Another approach is to read the file once, keeping track of what you've seen as you go along.  In awk:
-Line 50:
+Line 62:
- awk '/foo/{a=1} /bar/{b=1} a&&b{print "both found";exit} END{if (a&&b){ exit 0} else{exit 1}}'
+awk '/foo/{a=1} /bar/{b=1} a&&b{print "both found";exit} END{if (a&&b){ exit 0} else{exit 1}}'
-Line 52:
+Line 64:
+It reads the file one time, stopping when both patterns have been matched.  No matter what happens, the END block is then executed,  and the exit status is set accordingly.
-Line 53:
+Line 66:
-It reads the file one time, stopping when both pattern have been matched, not matter what happens the END block is then executed 
and the exit status is set accordingly.

If you want to do additional checking of the file contents, the awk solution can be adapted far more readily.
+If you want to do additional checking of the file's contents, this awk solution can be adapted quite easily.