Full Disks
In this example, we report when file systems are full.
The DiskCap script might send an alert message like the following:
PIKT ALERT
Wed Oct 3 04:30:06 2002
athens4
EMERGENCY:
DiskCap
Report emergency full disk situations
Filesystem /ckp on /dev/md/dsk/d10 is 100% full, 0 Kb left
17370930 /ckp/ingres
8 /ckp/lost+found
The DiskCap script makes reference to the DiskCaps.obj file. The script follows.
DiskCap // report, possibly take action, and possibly page if
// capacity threshold exceeded (typically 100%--i.e.,
// the file system is full)
init
status =piktstatus
level =piktlevel
task "Report emergency full disk situations"
=dflinput
=dffilter
=dfdata
keys $fsname
begin
#ifdef page
set $pagemsg = "$hostname() $mount is full"
#endifdef
=readvals(=objdir/DiskCaps.obj, diskcapval, ":", 6)
rule // every $mount has its own #hr value
set #hr = #hour()
rule
// the 1st val (2nd field) in DiskCaps.obj is the emergency field
if $diskcapval["$mount 1"] ne $err()
set #caplim = #val($diskcapval["$mount 1"])
set $execcmd = $diskcapval["$mount 6"]
else
set #caplim = #val($diskcapval["DEFAULT 1"])
set $execcmd = $diskcapval["DEFAULT 6"]
endif
#ifdef debug
rule
output "$mount $text(#hr) $text(#cap*100)%
$text(#caplim*100)% $execcmd"
#endifdef
rule // report, possibly take action, and possibly page if
// capacity threshold exceeded
if #cap >= #caplim
&& =increased(cap, #caplim, 0%)
// report
output mail "Filesystem $mount on $fsname is
$text(100*#cap,0)% full,
$text(#avail) Kb left"
=dutop(10, $mount)
// possibly take action
if $execcmd ne ""
=exec $execcmd
endif
// possibly page
#ifdef page
# if missioncritical
=hourly(=page($pagemsg, =pagesysadmins, =allhours(#now())), )
# if db
=hourly(=page($pagemsg, =pagedbadmins, =allhours(#now())), )
# endif
# else
=every_four_hours(=page($pagemsg, =pagesysadmins,
! =offhours(#now())), )
# endif
#endifdef
endif
Here is an alternative way to check for full disks using the script macro technique.
This is just one program example. You could add rules, or write new scripts, for example to: report hardware failures, report network cross-mounted disks going off-line, report problems with the RAID setup, clear /tmp files, log I/O stats, etc.
For more examples, see Samples.