High CPU Usage
In this example, we report high CPU usage--unusually low CPU idle times.
The CPUUsage script might send an alert message like the following:
PIKT ALERT
Tue Apr 24 15:17:12 2007
montreal
URGENT:
CPUUsage
Report unusually high CPU usage
Cpu(s): 48.2% us, 1.3% sy, 0.0% ni, 48.6% id, 1.6% wa, 0.1% hi, 0.2% si
top - 15:17:10 up 18 min, 2 users, load average: 1.50, 1.24, 0.81
Tasks: 97 total, 1 running, 95 sleeping, 0 stopped, 1 zombie
Cpu(s): 48.2% us, 1.3% sy, 0.0% ni, 48.6% id, 1.6% wa, 0.1% hi, 0.2% si
Mem: 1034932k total, 533352k used, 501580k free, 95180k buffers
Swap: 4016168k total, 0k used, 4016168k free, 179540k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
7452 boyce 16 0 220m 24m 11m S 100 2.4 13:42.90 java_vm
7415 boyce 15 0 258m 132m 28m S 22 13.1 1:53.86 firefox-bin
6005 root 15 0 177m 45m 7844 S 14 4.5 0:33.87 X
1 root 15 0 1464 512 456 S 0 0.0 0:01.03 init
...
The script follows.
CPUUsage
// if we ever need to check this on a per-machine (or per-
// hostgroup) basis, we really should set up a new objects file,
// CPUUsage.obj, with fields like so:
//
// //host //#cpuidlim
//
// then read the data in using =readvals() and process in the usual
// manner
init
status =piktstatus
level =piktlevel
task "Report unusually high CPU usage"
#if gentoo | suse | database3
input proc "=cat =hstdir/log/top.CPUUsage 2>/dev/null |
=head -n 3 | =tail -n 1"
dat $ky 1 // invariant key, "Cpu(s):"
dat #cpuid 8
#elsif redhat
input proc "=cat =hstdir/log/top.CPUUsage 2>/dev/null |
=head -n 5 | =tail -n 1"
dat $ky 1 // invariant key, "CPU0"
dat #cpuid $-1
#endif
keys $ky
begin
doexec wait "=top -b -n1 -d1 2>/dev/null > =hstdir/log/top.CPUUsage"
doexec wait "=psall > =hstdir/log/ps.CPUUsage"
set #dotopps = #false()
set #cpuidloglim = 100% // i.e., log always
if $alert() =~ "EMERGENCY"
set #cpuidlim = 10%
elsif $alert() =~ "Urgent"
set #cpuidlim = 50%
else // if $alert() =~ "CPUUsage"
set #cpuidlim = 100%
endif
#ifdef debug
rule
output $inlin
output "\#cpuid is $text(100*#cpuid,1)%"
quit
#endifdef
rule // log high usage
if $alert() =~ "Urgent"
if #cpuid <= #cpuidloglim
=output_alarm_log($inlin)
endif
endif
rule // report unusually high cpu usage
if #cpuid <= #cpuidlim
#if missioncritical
=hourly(output mail $inlin set #dotopps = #true(), )
#else
=every_four_hours(output mail $inlin set #dotopps = #true(), )
#endif
endif
end
if #dotopps
&& $alert() ne "CPUUsage"
output mail =newline
=outputfile(mail, "=hstdir/log/top.CPUUsage")
output mail =newline
=outputfile(mail, "=hstdir/log/ps.CPUUsage")
fi
For more examples, see Samples.