Differences between revisions 4 and 5
Revision 4 as of 2012-03-29 20:31:18
Size: 1349
Editor: e36freak
Comment: the awk now uses a true multi-dimensional array
Revision 5 as of 2012-03-29 20:32:44
Size: 1387
Editor: e36freak
Comment:
Deletions are marked like this. Additions are marked like this.
Line 39: Line 39:
And a basic implementation of '''b''' in awk: And a basic implementation of '''b''' in awk, using a true multi-dimensional array:

How can group entries (in a file by common prefixes)?

As in, one wants to convert:

    foo: entry1
    bar: entry2
    foo: entry3
    baz: entry4

to

    foo: entry1 entry3
    bar: entry2
    baz: entry4

There are two simple general methods for this:

  1. sort the file, and then iterate over it, collecting entries until the prefix changes, and then print the collected entries with the previous prefix
  2. iterate over the file, collect entries for each prefix in an array indexed by the prefix

A basic implementation of a in bash:

old=xxx ; stuff=
(sort file ; echo xxx) | while read prefix line ; do 
        if [[ $prefix = $old ]] ; then
                stuff="$stuff $line"
        else
                echo "$old: $stuff"
                old="$prefix"
                stuff=
        fi
done 

And a basic implementation of b in awk, using a true multi-dimensional array:

    {
      a[$1,++b[$1]] = $2;
    }

    END {
      for (i in b) {
        printf("%s", i);
        for (j=1; j<=b[i]; j++) {
          printf(" %s", a[i,j]);
        }
        print "";
      }
    }

Written out as a shell command:

    awk '{a[$1,++b[$1]]=$2} END {for (i in b) {printf("%s", i); for (j=1; j<=b[i]; j++) printf(" %s", a[i,j]); print ""}}' file

BashFAQ/057 (last edited 2012-03-29 20:36:56 by ormaaj)