How can I group entries (in a file by common prefixes)?

As in, one wants to convert:

    foo: entry1
    bar: entry2
    foo: entry3
    baz: entry4

to

    foo: entry1 entry3
    bar: entry2
    baz: entry4

There are two simple general methods for this:

  1. sort the file, and then iterate over it, collecting entries until the prefix changes, and then print the collected entries with the previous prefix
  2. iterate over the file, collect entries for each prefix in an array indexed by the prefix

A basic implementation of a in bash:

old=xxx ; stuff=
(sort file ; echo xxx) | while read prefix line ; do 
        if [[ $prefix = $old ]] ; then
                stuff="$stuff $line"
        else
                echo "$old: $stuff"
                old="$prefix"
                stuff=
        fi
done 

And a basic implementation of b in awk, using a true multi-dimensional array:

    {
      a[$1,++b[$1]] = $2;
    }

    END {
      for (i in b) {
        printf("%s", i);
        for (j=1; j<=b[i]; j++) {
          printf(" %s", a[i,j]);
        }
        print "";
      }
    }

Written out as a shell command:

    awk '{a[$1,++b[$1]]=$2} END {for (i in b) {printf("%s", i); for (j=1; j<=b[i]; j++) printf(" %s", a[i,j]); print ""}}' file

BashFAQ/057 (last edited 2012-03-29 20:36:56 by ormaaj)