Debug Alert Timing
[posted 1999/12/08]
I think I advised a few weeks ago running the Debug alert via cron. The reason is that, if your piktmaster PIKT is down for whatever reason (in my case because I brought it down deliberately for testing then forgot to start it back up), your Debug alert will fail to run. It's vital to keep this PIKT Debug monitoring script up and running.
But what if your Debug alert hangs, so that both your piktd-invoked and cron-invoked Debug alerts stop running (because the Debug.alt.lock file blocks all subsequent runs)?
Here is my new setup in alerts.cfg:
Critical /// things that should be dealt with before too long,
/// preferably by day's end; (things reported here
/// may not be especially "critical" but are so
/// designated to conform with syslog's log levels)
#ifndef generic
# if misscritsys
timing 30 0-22/2 * * *
# else
timing 30 8-22/2 * * *
# endif
#elsedef
timing 30 8-22/2 * * *
#endifdef
mailcmd "=mailx -s 'PIKT Alert on =pikthostname: Critical'
=piktcritical"
alarms
#if piktmaster
AlertChkCritical
#endif
...
Debug /// for PIKT self-monitoring; these deserve
/// fairly close attention
timing 55 6-22/4 * * *
nicecmd "=nice -10"
mailcmd "=mailx -s 'PIKT Alert on =pikthostname: Debug'
=piktdebug"
alarms
#if piktmaster
PiktHeartbeatDebug
// it's important to run AlertChkCritical
// independently in two separate alerts, Critical
// and (our choice) Debug; if you just run it in one
// alert, and that alert hangs, then you miss this
// vital alarm; we recommend, too, that you run the
// Debug alert via cron; in addition to the above
// schedule, where 'pikt +A Debug' is invoked by
// piktd, we also have cron invoke Debug (from our
// root crontab):
//
// 55 8,12,16,20 * * * /usr/bin/nice -10
// /pikt/bin/pikt +M
// "/usr/bin/mailx
// -s 'PIKT Alert on vienna:
// Debug' pikt-debug"
// +A Debug
//
// so, we run AlertchkCritical independently under
// three different schedules:
//
// 30 8-22/2 * * * [in the Critical alert,
// invoked by piktd]
// 55 6-22/4 * * * [in the Debug alert,
// invoked by piktd]
// 55 8,12,16,20 * * * [in the Debug alert,
// invoked by cron]
AlertChkCritical
#endif
...
(Note that we can't use "55 8-20/4" in our crontab, because our system cron doesn't support the x-y/z notation.)
I had to adjust my Debug alert timing, because at three different times
of the day there was the possibility of a Critical AlertChkCritical
impinging on a Debug AlertChkCritical (i.e., running just a few minutes
apart). With the above timings, I have pretty good spacing between
AlertChkCritical runs.
For more examples, see Developer's Notes.