Dead System Processes
[posted 2000/03/09]
Just another quicky example emphasizing the =remind() macro's usefulness:
On one of our systems, we have a couple of newly dead system processes that the SysProcChkUrgent alarm dutifully alerts us to every hour:
PIKT ALERT Thu Mar 9 10:24:39 2000 moscow URGENT: SysProcChkUrgent Report or restart 'dead' crucial system processes The process 'sac' is not running The process 'ttymon' is not running
For reasons I won't go into here, we don't want to restart those dead system processes on that machine, at least until next week's regularly scheduled reboot. (We reboot our central machines weekly. Don't ask.)
At the same time, we don't want to be bothered with hourly messages about sac and ttymon not running!
I have made a temporary modification to our SysProcChkUrgent as follows:
SysProcChkUrgent init status active level urgent task "Report or restart 'dead' crucial system processes" input file "=sysprocs_obj" seps ":" dat $proc [1] dat $cmd [2] #if moscow begin =remind(2000, 3, 15, "In SysProcChkUrgent, remove the temporary sac/ttymon filter.") rule if $proc =~ "sac|ttymon" next endif #endif rule if #pid($proc) == #nil() if $cmd eq "." output mail "The process '$proc' is not running" else =execwait $cmd endif endif
The change is the addition of the '#if moscow ... #endif' section. It should be self-explanatory by now.
The result? No more bothersome alerts about those two processes not running. And, next Wednesday, we receive reminder mail to remove the temporary filter.
We find ourselves registering little exceptions like this one often. Before, we had no handy way of scheduling the lifting of those exceptions, and some exceptions were forgotten, remaining in the configuration long after they should have been retired. =remind() solves this problem for us.
On a somewhat related topic, one of PIKT's under-appreciated features is #include files. You may use #include files *anywhere* within PIKT's configuration. And you may write scripts, either Pikt scripts or scripts written in another scripting language, to dynamically rewrite those #include files. (A natural and simple example of this is to write a script to automatically update <systems/downsys_systems.cfg> on a regular basis.)
I haven't thought about it too much, but in theory it should be possible to have dynamic #include files remove exceptions automatically without our having to hand edit the configuration directly ourselves. In other words, when registering an exception, you add a little extra code, and the exception just auto-magically dies (and removes itself from the configuration) at the appropriate time.
Depending on how far you wanted to extend this, you could script PIKT to dynamically rewrite much of its own configuration. Think of the possibilities!
For more examples, see Developer's Notes.