Disk Usage Macros
[posted 2001/11/29]
Moving on to another example: From time to time, PIKT alerts us to a disk nearly filling, as in:
CRITICAL: DiskCapCritical Detect critical filesystem full or near-full situations Filesystem /pub/phd_disk_39 on /dev/dsk/c1t2d0s0 is 96% full, 188384 Kb left 1359599 /pub/phd_disk_39/fowley 625328 /pub/phd_disk_39/skinner 402160 /pub/phd_disk_39/spender ...
An alert such as this is our signal to deploy the move_homedir.pl program to move users to less crowded disks.
Invariably, the next step is to poll all the relevant systems (in this case phd), to find suitable candidate disks for moving the diskhogs over to. Since the disks are scattered across several systems, polling for candidate disks is not as trivial as it might first seem.
In the past, this has always been a manual step. But if we're doing this all the time anyway (in response to a near-disk_full alert), why not have PIKT do it for us automatically? We save ourselves precious time and keystrokes that way. Don't just let PIKT alert you to problems! Wherever possible, have PIKT also diagnose (and include in the alert message) and if possible automatically fix!
Here are a couple of disk usage macros I created in response to this need:
/////////////////////////////////////////////////////////////////////////////// dfpub(C, G, N) // output to communications channel C all local disks as // well as any /pub disks for user group G, from disk_1 up // to disk_N // (C) is the communications channel (e.g., mail); don't // enclose with quotes // (G) is the user group (e.g., fac); don't enclose with quotes // (N) is the maximum number of user disks expected for the // user group (e.g., 40, for up to /pub/fac_disk_40) =outputproc((C), "(=dfl; d=1; while [ \$d -le (N) ]; do =ls /pub/(G)_disk_\$d >/dev/null 2>&1; =dfk /pub/(G)_disk_\$d 2>/dev/null | =tail +2 | =awk '/\\/$/ {next}; {print}'; d=`=expr \$d + 1`; done) | =sort +4n | =uniq") /////////////////////////////////////////////////////////////////////////////// dfall(C) // output to communications channel C all disks relevant to // the appropriate user group // (C) is the communications channel (e.g., mail); don't // enclose with quotes #if fac =dfpub((C), fac, 40) #elsif mba =dfpub((C), mba, 20) #elsif phd =dfpub((C), phd, 50) #else =dfpub((C), nil, 0) #endif ///////////////////////////////////////////////////////////////////////////////
In =dfpub(), the =outputproc() gobbledygook:
- df's the local disks
- ls's as many /pub/*_disk_* as necessary in order to automount them on the current system, thereby having those disks show up in the subsequent df display; we don't care to see the '=ls' output, so we throw it away to /dev/null; we add '2>&1' because we don't want to see error messages for possibly non-existent disks in the 1 ... (N) sequence
- df's each /pub/*_disk_*, one by one; we 'tail +2' to throw away the df header; (forget about the awk filter; it may not strictly be necessary); we sort the output on the capacity field, putting the least-full disks at the top; and finally uniq out any duplicate entries (because we may df the same disk twice)
We don't call =dfpub() directly. Rather, we call the =dfall() macro, which customizes for each user group automatically, within the actual alarm script. Here is an excerpt from DiskCapEmergency:
// report output mail "Filesystem $mount on $fsname is $text(100*#cap,0)% full, $text(#avail) Kb left" =dutop(10, $mount) =dfall(mail) ...
The =dfall() reference is the new addition. It automatically appends a listing to the alert message like:
Filesystem kbytes used avail capacity Mounted on /dev/dsk/c0t3d0s5 388638 13 349762 1% /export/home phd4:/pub/phd_disk_17 2073641 9 1969950 1% /pub/phd_disk_17 phd8:/pub/phd_disk_42 2808734 2626 2749934 1% /pub/phd_disk_42 /dev/dsk/c0t0d0s4 100303 9647 85641 11% /opt phd5:/pub/phd_disk_36 2950083 1072482 1818600 38% /pub/phd_disk_36 /dev/dsk/c0t0d0s0 70676 26473 40670 40% / /dev/dsk/c0t0d0s6 559225 231722 299542 44% /usr /dev/dsk/c0t2d0s3 1016665 427685 527981 45% /var phd4:/pub/phd_disk_16 2073641 1242437 727522 64% /pub/phd_disk_16 phd4:/pub/phd_disk_6 2029842 1292668 635682 68% /pub/phd_disk_6 phd5:/pub/phd_disk_13 4184490 2925944 1174857 72% /pub/phd_disk_13 ...
Great! We no longer have to poll the disks manually ourselves. These PIKT disk usage macros do it for us. And because of the sort, we can instantly identify suitable candidate targets (e.g., /pub/phd_disk_17 or /pub/phd_disk_42, both of which are virtually empty).
Note that in the default =dfall() case
=dfpub((C), nil, 0)
the condition for the while loop (d=1; while [ \$d -le (N) ]; do ...; done) is never satisfied, so no /pub disks are listed for non fac/mba/phd systems.
For more examples, see Developer's Notes.