Differences between revisions 6 and 7
Revision 6 as of 2008-02-13 18:18:54
Size: 2591
Editor: MrIgli
Comment: minor: added another awk example, to discourage ``grep foo | cut..''
Revision 7 as of 2008-05-22 19:23:18
Size: 2719
Editor: GreyCat
Comment: clean up
Deletions are marked like this. Additions are marked like this.
Line 37: Line 37:
{{{egrep}}} is the oldest and most portable form of the {{{grep}}} command using Extended Regular Expressions (EREs). {{{-E}}} is a POSIX-required switch. {{{egrep}}} is the oldest and most portable form of the {{{grep}}} command using [:RegularExpression:Extended Regular Expressions (EREs)]. {{{grep -E}}} is required by POSIX.
Line 45: Line 45:
The double {{{grep -q}}} solution has the advantage of stopping each read whenever it finds a match; so if you have a huge file, but the matched words are both near the top, it will only read the first part of the file. Unfortunately, if the matches are near the bottom (or worse -- not found!), you may read the whole file two times.
Line 51: Line 53:
The second, more efficient one avoids reading the whole file by checking if the other string was already matched, and, if so, exiting: A more efficient way avoids reading the whole file by checking if the other string was already matched, and, if so, exiting:
Line 57: Line 59:
The double {{{grep -q}}} solution has the advantage of stopping each read whenever it finds a match; so if you have a huge file, but the matched words are both near the top, it will only read the first part of the file. The first awk solution reads the whole file one time, while the second one stops reading the file at the second match; if you want to do additional checking of the file contents, the awk solution can be adapted far more readily. The first awk solution reads the whole file one time, while the second one stops reading the file at the second match; if you want to do additional checking of the file contents, the awk solution can be adapted far more readily.

Anchor(faq79)

How can I grep for lines containing foo AND bar, foo OR bar? Or for files containing foo AND bar, possibly on separate lines?

The easiest way to match lines that contain both foo AND bar is to use two grep commands:

grep foo | grep bar
grep foo "$myfile" | grep bar   # for those who need the hand-holding

It can also be done with one egrep, although (as you can probably guess) this doesn't really scale well to more than two patterns:

egrep 'foo.*bar|bar.*foo'

If you prefer, you can achieve this in one sed or awk statement. (The awk example is probably the most scalable.)

sed -n '/foo/{/bar/p}'
awk '/foo/ && /bar/'

To match lines containing foo OR bar, egrep is the natural choice, but it can also be done with sed, awk, etc.

egrep 'foo|bar'
# some people prefer grep -E 'foo|bar'

# This is another option, some people prefer:
grep -e 'foo' -e 'bar'

# awk equivalent (eg if you want to extract fields)
awk '/foo|bar/'

egrep is the oldest and most portable form of the grep command using [:RegularExpression:Extended Regular Expressions (EREs)]. grep -E is required by POSIX.

If you want to match files (rather than lines) that contain both "foo" and "bar", there are several possible approaches. The simplest (although not necessarily the most efficient) is to read the file twice:

grep -q foo "$myfile" && grep -q bar "$myfile" && echo "Found both"

The double grep -q solution has the advantage of stopping each read whenever it finds a match; so if you have a huge file, but the matched words are both near the top, it will only read the first part of the file. Unfortunately, if the matches are near the bottom (or worse -- not found!), you may read the whole file two times.

Another approach is to read the file once, keeping track of what you've seen as you go along. There are several ways to do this in awk - the first example reads the whole file, and, after it reads the whole file, it checks if both were found:

awk '/foo/ { foo=1 } /bar/ { bar=1 } END { if (foo && bar) print "found both" }'

A more efficient way avoids reading the whole file by checking if the other string was already matched, and, if so, exiting:

awk 'function found() { print "Found both!"; exit } /foo/ { a=1; if (b) found() } /bar/ { b=1; if (a) found() }'

The first awk solution reads the whole file one time, while the second one stops reading the file at the second match; if you want to do additional checking of the file contents, the awk solution can be adapted far more readily.

BashFAQ/079 (last edited 2023-01-26 22:54:33 by emanuele6)