Process Count Limits
In this example, we report perilously high per-user process counts.
The PerUserProcessCounts script might send an alert message like the following:
PIKT ALERT
Fri Oct 26 14:26:01 2002
moscow
EMERGENCY:
PerUserProcessCounts
Report unusually high counts of per-user processes.
683 root /usr/lib/sendmail
killed all root /usr/lib/sendmail processes
317 nobody /opt/local/bin/python
killed all nobody /opt/local/bin/python processes
The PerUserProcessCounts script makes reference to the process count limits in the PerUserProcessCounts.obj file. The script follows.
#if solaris
PerUserProcessCounts
init
status =piktstatus
level =piktlevel
task "Report unusually high counts of per-user processes."
// note: a defunct process might show an empty comm field
// below, so we pipe the ps output through the awk filter, too
input proc "=ps -eo user,comm | =behead(1) | =awk 'NF==2' |
=sort | =uniq -c"
dat #count 1
dat $user 2
dat $proc 3
begin // field separator below is "<space><tab>"
=readvals(=objdir/PerUserProcessCounts.obj, proccounts, " ", 4)
#ifdef debug
rule
=output_alarm_log($inlin)
#endifdef
// it would be more efficient to put everything following in a single
// foreach loop, but by breaking things out into separate foreach
// loops, we can group things into four separate rules--to log, alert,
// page, and kill
rule // log, for gathering diagnostic stats
foreach #keys($pf, $proccounts)
do #split($pf)
if $2 eq "1"
if $proc =~~ "$1" // '=~~', not 'eq',
// so that '\\*' works
// as a default
if #val($proccounts[$pf])
&& #count >= #val($proccounts[$pf])
=output_alarm_log($inlin)
fi
break // move on to next rule
fi
fi
endforeach
rule // alert
foreach #keys($pf, $proccounts)
do #split($pf)
if $2 eq "2"
if $proc =~~ "$1"
if #val($proccounts[$pf])
&& #count >= #val($proccounts[$pf])
output mail $inline
fi
break // move on to next rule
fi
fi
endforeach
#ifdef page
rule // page
foreach #keys($pf, $proccounts)
do #split($pf)
if $2 eq "3"
if $proc =~~ "$1"
if #val($proccounts[$pf])
&& #count >= #val($proccounts[$pf])
=page(=pikthostname: $inlin,
=pagesysadmins,
! =offhours(#now()))
pause 5
fi
break // move on to next rule
fi
fi
endforeach
#endifdef // page
rule // kill
foreach #keys($pf, $proccounts)
do #split($pf)
if $2 eq "4"
if $proc =~~ "$1"
if #val($proccounts[$pf])
&& #count >= #val($proccounts[$pf])
=kill_user_proc($proc, $user,
#true())
fi
break // move on to next rule
fi
fi
endforeach
#endif // solaris
This is just one program example. You could add rules, or write new scripts, for example to: report and possibly kill runaway processes, report and possibly kill forbidden processes, report extremely high numbers of zombie and defunct processes, log special process accounting data, etc.
For more examples, see Samples.