Diff for "BashFAQ/094"

Differences between revisions 5 and 6

Sadly, Filesystem /dev/sda2 tmpfs udev tmpfs /net/appl/clin /net/appl/tool-share /net/appl/netscape /net/appl/gcc-3.3 /net/appl/gcc-3.2 /net/appl/tool /net/home/wooledg /net/home /net/hosts /net/appl /net/vol /nfs /home /opt /tmp /usr/local /usr /var /stand / ... svr2:/dsk/ Filesystem /dev/wd0a /dev/wd0d /dev/wd0e /dev/wd0f /dev/wd0g /dev/wd0h ~$ df Filesystem /dev/wd0a /dev/wd0d /dev/wd0e /dev/wd0f /dev/wd0g /dev/wd0h Filesystem /dev/hda1 tmpfs /dev/hda5 /dev/hda2 /dev/hda3 /dev/sdc1 /dev/sde1 /dev/sdf1 /dev/sdd1 /dev/sda1 /dev/sdg1 /dev/sdb1 imadev:/home/wooledg Filesystem /dev/sda2 class="anchor" id="line-2"> 28parsing_df_output.29.">I want to get an alert when my disk is full (parsing df output). parsing the output of df really is the most reliable way to determine how full a disk is, on most operating systems. However, please note that this is a "least bad" answer, not a "best" answer. Parsing any command-line reporting tool's output in a program is never pretty. The purpose of this FAQ is to try to describe all the problems this approach is known to encounter, and work around them.

The first, biggest problem with df is that it doesn't work the same way on all operating systems. Unix is divided largely into two families -- System V and BSD. On BSD-like systems (including Linux, in this case), df gives a human-readable report:

 ~$ df 1K-blocks      Used Available Use% Mounted on 8230432   3894324   3918020  50% / 253952         8    253944   1% /lib/init/rw 10240        44     10196   1% /dev 253952         0    253952   0% /dev/shm

However, on System-V-like systems, the output is completely different:

 $ df (svr1:/dsk/2/clin/pa1.1-hpux10HP-UXB.10.20):  1301728 blocks            -1 i-nodes (svr2:/dsk/4/dsk3/tool/share): 51100992 blocks       4340921 i-nodes (svr2:/dsk/4/dsk3/netscape/pa1.1-hpux10HP-UXB.10.20): 51100992 blocks       4340921 i-nodes (svr2:/dsk/4/dsk3/gcc-3.3/pa1.1-hpux10HP-UXB.10.20): 51100992 blocks       4340921 i-nodes (svr2:/dsk/4/dsk3/gcc-3.2/pa1.1-hpux10HP-UXB.10.20): 51100992 blocks       4340921 i-nodes (svr2:/dsk/4/dsk3/tool/pa1.1-hpux10HP-UXB.10.20): 51100992 blocks       4340921 i-nodes (/home/wooledg       ):   658340 blocks     87407 i-nodes (auto.home           ):        0 blocks         0 i-nodes (-hosts              ):        0 blocks         0 i-nodes (auto.appl           ):        0 blocks         0 i-nodes (auto.vol            ):        0 blocks         0 i-nodes (-hosts              ):        0 blocks         0 i-nodes (/dev/vg00/lvol5     ):   658340 blocks     87407 i-nodes (/dev/vg00/lvol6     ):   623196 blocks     83075 i-nodes (/dev/vg00/lvol4     ):    86636 blocks     11404 i-nodes (/dev/vg00/lvol9     ):   328290 blocks     41392 i-nodes (/dev/vg00/lvol7     ):   601750 blocks     80228 i-nodes (/dev/vg00/lvol8     ):   110696 blocks     14447 i-nodes (/dev/vg00/lvol1     ):   110554 blocks     13420 i-nodes (/dev/vg00/lvol3     ):   190990 blocks     25456 i-nodes

So, your first obstacle will be recognizing that you may need to use a different command depending on which OS you're on (e.g. bdf on HP-UX); and that there may be some OSes where it's simply not possible to do this with a shell script at all.

For the rest of this article, we'll assume that you've got a system with a BSD-like df command.

The next problem is that the output format of df is not consistent across platforms. Some plaforms use 6 columns of output. Some use 7. Some platforms (like Linux) use 1-kilobyte blocks by default when reporting the actual space used or available; others, like OpenBSD or IRIX, use 512-byte blocks by default, and need a -k switch to use kilobytes.

Worse, often a line of output will be split into multiple lines on the screen. For example (Linux):

 Filesystem           1K-blocks      Used Available Use% Mounted on 4/dsk3/tool/i686Linux2.4.27-4-686 35194552   7856256  25550496  24% /net/appl/tool

If the device name is sufficiently long (very common with network-mounted file systems), df may split the output onto two lines in an attempt to preserve the columns for human readability. Or it may not... see, for example, OpenBSD 4.3:

 ~$ df 512-blocks      Used     Avail Capacity  Mounted on 253278    166702     73914    69%    / 8121774   6904178    811508    89%    /usr 8121774   6077068   1638618    79%    /var 507230        12    481858     0%    /tmp 8121774   5653600   2062086    73%    /home 125253320 116469168   2521486    98%    /export ~$ sudo mount 192.168.2.5:/var/cache/apt/archives /mnt 512-blocks      Used     Avail Capacity  Mounted on 253278    166702     73914    69%    / 8121774   6904178    811508    89%    /usr 8121774   6077806   1637880    79%    /var 507230        12    481858     0%    /tmp 8121774   5653600   2062086    73%    /home 125253320 116469168   2521486    98%    /export 192.168.2.5:/var/cache/apt/archives    1960616   1638464    222560    88%    /mnt

Most versions of df give you a -P switch which is intended to standardize the output... sort of. Older versions of OpenBSD still split lines of output even when -P is supplied, but Linux will generally force the output for each file system onto a single line.

Therefore, if you want to write something robust, you can't assume the output for a given file system will be on a single line. We'll get back to that later.

You can't assume the columns line up vertically, either:

 ~$ df -P 1024-blocks      Used Available Capacity Mounted on 180639     93143     77859      55% / 318572         4    318568       1% /dev/shm 90297      4131     81349       5% /tmp 5763648    699476   4771388      13% /usr 1829190    334184   1397412      20% /var 2147341696 349228656 1798113040      17% /data3 2147341696 2147312400     29296     100% /data4 1264642176 1264614164     28012     100% /data5 1267823104 1009684668 258138436      80% /hfo 2147341696 2147311888     29808     100% /data1 1953520032 624438272 1329081760      32% /mnt 1267823104 657866300 609956804      52% /data2 3686400   3336736    329184      92% /net/home/wooledg svr2:/dsk/4/dsk3/tool/i686Linux2.4.27-4-686  35194552   7856256  25550496      24% /net/appl/tool svr2:/dsk/4/dsk3/tool/share  35194552   7856256  25550496      24% /net/appl/tool-share

So, what can you actually do?

Use the -P switch. Even if it doesn't make everything 100% consistent, it generally doesn't hurt.
Set your locale to C. You don't need non-English column headers complicating the picture.
Explicitly select a file system. Don't use df -P | grep /dev/hda2 if you want the results for a specific file system. Give df a directory name or a device name as an argument so you only get that file system's output in the first place.
- ```
  ~$  df -P / 1024-blocks      Used Available Capacity Mounted on 8230432   3894360   3917984      50% /
```
Count words of output without respecting newlines. This is the workaround for lines being split unpredictably. For example, using a Bash array:
- ```
  ~$ read -d '' -ra df < <(LC_ALL=C df -P /); echo "${df[11]}" 50%
```
As you can see, we simply slurped the entire output into a single array and then took the 12th word (array indices count from 0). We don't care whether the output got split or not, because that doesn't change the number of words.

Removing the % sign, comparing the number to a specified threshold, scheduling an automatic way to run the script, etc. are left as exercises for you.

-  ⇤ ← Revision 5 as of 2010-08-16 19:35:18 → 
  Size: 8439
  Editor: GreyCat
  Comment:
+   ← Revision 6 as of 2010-08-17 06:31:27 → ⇥
  Size: 8438
  Editor: Lhunath
  Comment: 'read' is saner than unquoted expansion with set -f hackery
-Deletions are marked like this.
+Additions are marked like this.
 Line 114:
-  ~$ set -f; df=($(LC_ALL=C df -P /)); echo "${df[11]}"; set +f
+  ~$ read -d '' -ra df < <(LC_ALL=C df -P /); echo "${df[11]}"