High Zombie Counts
In this example, we report an unusually high number of zombie processes.
The HighZombieCount script might send an alert message like the following:
PIKT ALERT Tue Oct 14 15:09:43 2003 madrid EMERGENCY: HighZombieCount Report extremely high number of zombie processes Unusually high zombie count (43): Tasks: 153 total, 1 running, 109 sleeping, 0 stopped, 43 zombie USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.0 0.0 620 72 ? S Sep30 0:04 init [5] root 2 0.0 0.0 0 0 ? SW Sep30 0:00 [keventd] root 3 0.0 0.0 0 0 ? SW Sep30 0:00 [kapmd] root 4 0.0 0.0 0 0 ? SWN Sep30 0:00 [ksoftirqd_CPU0] root 5 0.0 0.0 0 0 ? SW Sep30 0:11 [kswapd] ... root 29800 0.0 0.0 0 0 ? Z 14:26 0:00 [imapd] <defunct> root 29813 0.0 0.0 0 0 ? Z 14:27 0:00 [imapd] <defunct> root 29829 0.0 0.0 0 0 ? Z 14:27 0:00 [imapd] <defunct> ...
The script follows.
HighZombieCount init status =piktstatus level =piktlevel task "Report extremely high number of zombie processes" input proc "=top -b -d 1 -n 1 | egrep -i '^tasks:'" dat "([[:digit:]]+)[[:space:]]zombie" rule // set the zombie count set #zombct = #val($1) #ifdef debug rule output "\$inlin is $inlin" output "\#zombct is $text(#zombct)" #endifdef rule // for diagnostic purposes =output_alarm_log($inlin) // if we ever need to add more per-machine (or per-hostgroup) cases // than the two below, we really should set up a new objects file, // ZombieCounts.obj, with fields like so: // // //host //zombiecount //zcincr // // then read the data in using =readvals() and process in the usual // manner rule // report unusually high zombie count if #zombct >= # if madrid 20 && =increased(zombct, 20, 5) # else 10 && =increased(zombct, 10, 5) # endif output mail "Unusually high zombie count ($text(#zombct)): $inlin" output mail =newline =outputproc(mail, "=psall") fi
For more examples, see Samples.