Full Disks
In this example, we report when file systems are full.
The DiskCap script might send an alert message like the following:
PIKT ALERT Wed Oct 3 04:30:06 2002 athens4 EMERGENCY: DiskCap Report emergency full disk situations Filesystem /ckp on /dev/md/dsk/d10 is 100% full, 0 Kb left 17370930 /ckp/ingres 8 /ckp/lost+found
The DiskCap script makes reference to the DiskCaps.obj file. The script follows.
DiskCap // report, possibly take action, and possibly page if // capacity threshold exceeded (typically 100%--i.e., // the file system is full) init status =piktstatus level =piktlevel task "Report emergency full disk situations" =dflinput =dffilter =dfdata keys $fsname begin #ifdef page set $pagemsg = "$hostname() $mount is full" #endifdef =readvals(=objdir/DiskCaps.obj, diskcapval, ":", 6) rule // every $mount has its own #hr value set #hr = #hour() rule // the 1st val (2nd field) in DiskCaps.obj is the emergency field if $diskcapval["$mount 1"] ne $err() set #caplim = #val($diskcapval["$mount 1"]) set $execcmd = $diskcapval["$mount 6"] else set #caplim = #val($diskcapval["DEFAULT 1"]) set $execcmd = $diskcapval["DEFAULT 6"] endif #ifdef debug rule output "$mount $text(#hr) $text(#cap*100)% $text(#caplim*100)% $execcmd" #endifdef rule // report, possibly take action, and possibly page if // capacity threshold exceeded if #cap >= #caplim && =increased(cap, #caplim, 0%) // report output mail "Filesystem $mount on $fsname is $text(100*#cap,0)% full, $text(#avail) Kb left" =dutop(10, $mount) // possibly take action if $execcmd ne "" =exec $execcmd endif // possibly page #ifdef page # if missioncritical =hourly(=page($pagemsg, =pagesysadmins, =allhours(#now())), ) # if db =hourly(=page($pagemsg, =pagedbadmins, =allhours(#now())), ) # endif # else =every_four_hours(=page($pagemsg, =pagesysadmins, ! =offhours(#now())), ) # endif #endifdef endif
Here is an alternative way to check for full disks using the script macro technique.
This is just one program example. You could add rules, or write new scripts, for example to: report hardware failures, report network cross-mounted disks going off-line, report problems with the RAID setup, clear /tmp files, log I/O stats, etc.
For more examples, see Samples.