Diff for "BashFAQ/057"

Differences between revisions 4 and 5

How can group entries (in a file by common prefixes)?

As in, one wants to convert:

    foo: entry1
    bar: entry2
    foo: entry3
    baz: entry4

    foo: entry1 entry3
    bar: entry2
    baz: entry4

There are two simple general methods for this:

sort the file, and then iterate over it, collecting entries until the prefix changes, and then print the collected entries with the previous prefix
iterate over the file, collect entries for each prefix in an array indexed by the prefix

A basic implementation of a in bash:

old=xxx ; stuff=
(sort file ; echo xxx) | while read prefix line ; do 
        if [[ $prefix = $old ]] ; then
                stuff="$stuff $line"
        else
                echo "$old: $stuff"
                old="$prefix"
                stuff=
        fi
done

And a basic implementation of b in awk, using a true multi-dimensional array:

    {
      a[$1,++b[$1]] = $2;
    }

    END {
      for (i in b) {
        printf("%s", i);
        for (j=1; j<=b[i]; j++) {
          printf(" %s", a[i,j]);
        }
        print "";
      }
    }

Written out as a shell command:

    awk '{a[$1,++b[$1]]=$2} END {for (i in b) {printf("%s", i); for (j=1; j<=b[i]; j++) printf(" %s", a[i,j]); print ""}}' file

-  ⇤ ← Revision 4 as of 2012-03-29 20:31:18 → 
  Size: 1349
  Editor: e36freak
  Comment: the awk now uses a true multi-dimensional array
+   ← Revision 5 as of 2012-03-29 20:32:44 → ⇥
  Size: 1387
  Editor: e36freak
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 39:
-And a basic implementation of '''b''' in awk:
+And a basic implementation of '''b''' in awk, using a true multi-dimensional array: