Debug Alert Timing
[posted 1999/12/08]
I think I advised a few weeks ago running the Debug alert via cron. The reason is that, if your piktmaster PIKT is down for whatever reason (in my case because I brought it down deliberately for testing then forgot to start it back up), your Debug alert will fail to run. It's vital to keep this PIKT Debug monitoring script up and running.
But what if your Debug alert hangs, so that both your piktd-invoked and cron-invoked Debug alerts stop running (because the Debug.alt.lock file blocks all subsequent runs)?
Here is my new setup in alerts.cfg:
Critical /// things that should be dealt with before too long, /// preferably by day's end; (things reported here /// may not be especially "critical" but are so /// designated to conform with syslog's log levels) #ifndef generic # if misscritsys timing 30 0-22/2 * * * # else timing 30 8-22/2 * * * # endif #elsedef timing 30 8-22/2 * * * #endifdef mailcmd "=mailx -s 'PIKT Alert on =pikthostname: Critical' =piktcritical" alarms #if piktmaster AlertChkCritical #endif ... Debug /// for PIKT self-monitoring; these deserve /// fairly close attention timing 55 6-22/4 * * * nicecmd "=nice -10" mailcmd "=mailx -s 'PIKT Alert on =pikthostname: Debug' =piktdebug" alarms #if piktmaster PiktHeartbeatDebug // it's important to run AlertChkCritical // independently in two separate alerts, Critical // and (our choice) Debug; if you just run it in one // alert, and that alert hangs, then you miss this // vital alarm; we recommend, too, that you run the // Debug alert via cron; in addition to the above // schedule, where 'pikt +A Debug' is invoked by // piktd, we also have cron invoke Debug (from our // root crontab): // // 55 8,12,16,20 * * * /usr/bin/nice -10 // /pikt/bin/pikt +M // "/usr/bin/mailx // -s 'PIKT Alert on vienna: // Debug' pikt-debug" // +A Debug // // so, we run AlertchkCritical independently under // three different schedules: // // 30 8-22/2 * * * [in the Critical alert, // invoked by piktd] // 55 6-22/4 * * * [in the Debug alert, // invoked by piktd] // 55 8,12,16,20 * * * [in the Debug alert, // invoked by cron] AlertChkCritical #endif ...
(Note that we can't use "55 8-20/4" in our crontab, because our system cron doesn't support the x-y/z notation.)
I had to adjust my Debug alert timing, because at three different times of the day there was the possibility of a Critical AlertChkCritical impinging on a Debug AlertChkCritical (i.e., running just a few minutes apart). With the above timings, I have pretty good spacing between AlertChkCritical runs.
For more examples, see Developer's Notes.