Process Count Limits
In this example, we report perilously high per-user process counts.
The PerUserProcessCounts script might send an alert message like the following:
PIKT ALERT Fri Oct 26 14:26:01 2002 moscow EMERGENCY: PerUserProcessCounts Report unusually high counts of per-user processes. 683 root /usr/lib/sendmail killed all root /usr/lib/sendmail processes 317 nobody /opt/local/bin/python killed all nobody /opt/local/bin/python processes
The PerUserProcessCounts script makes reference to the process count limits in the PerUserProcessCounts.obj file. The script follows.
#if solaris PerUserProcessCounts init status =piktstatus level =piktlevel task "Report unusually high counts of per-user processes." // note: a defunct process might show an empty comm field // below, so we pipe the ps output through the awk filter, too input proc "=ps -eo user,comm | =behead(1) | =awk 'NF==2' | =sort | =uniq -c" dat #count 1 dat $user 2 dat $proc 3 begin // field separator below is "<space><tab>" =readvals(=objdir/PerUserProcessCounts.obj, proccounts, " ", 4) #ifdef debug rule =output_alarm_log($inlin) #endifdef // it would be more efficient to put everything following in a single // foreach loop, but by breaking things out into separate foreach // loops, we can group things into four separate rules--to log, alert, // page, and kill rule // log, for gathering diagnostic stats foreach #keys($pf, $proccounts) do #split($pf) if $2 eq "1" if $proc =~~ "$1" // '=~~', not 'eq', // so that '\\*' works // as a default if #val($proccounts[$pf]) && #count >= #val($proccounts[$pf]) =output_alarm_log($inlin) fi break // move on to next rule fi fi endforeach rule // alert foreach #keys($pf, $proccounts) do #split($pf) if $2 eq "2" if $proc =~~ "$1" if #val($proccounts[$pf]) && #count >= #val($proccounts[$pf]) output mail $inline fi break // move on to next rule fi fi endforeach #ifdef page rule // page foreach #keys($pf, $proccounts) do #split($pf) if $2 eq "3" if $proc =~~ "$1" if #val($proccounts[$pf]) && #count >= #val($proccounts[$pf]) =page(=pikthostname: $inlin, =pagesysadmins, ! =offhours(#now())) pause 5 fi break // move on to next rule fi fi endforeach #endifdef // page rule // kill foreach #keys($pf, $proccounts) do #split($pf) if $2 eq "4" if $proc =~~ "$1" if #val($proccounts[$pf]) && #count >= #val($proccounts[$pf]) =kill_user_proc($proc, $user, #true()) fi break // move on to next rule fi fi endforeach #endif // solaris
This is just one program example. You could add rules, or write new scripts, for example to: report and possibly kill runaway processes, report and possibly kill forbidden processes, report extremely high numbers of zombie and defunct processes, log special process accounting data, etc.
For more examples, see Samples.