Multiple Alert Timings
[posted 2002/02/27]
In pikt-1.16.0pre2, have implemented a very useful new multiple alert timings option. This will allow one to set, say, different alert schedules for normal in-week business hours, in-week night-time hours, and weekend hours. For example
Critical timing 15,45 8-16 * * 1-5 5 // run every 1/2 hour // (with 5-minute drift) // from 8 AM to before 5 PM // on Monday thru Friday 15 0-7,17-23 * * 1-5 5 // run once hourly // (with 5-minute drift) // before 8 AM and after 5 PM // on Monday thru Friday 15 0-18/6 * * 0,6 5 // run only every six hours // (with 5-minute drift) // on Saturday and Sunday
With these settings, something like the following lines would appear in piktd.conf:
15,45 8-16 * * 1-5 5 /pikt/bin/pikt +M "/usr/bin/mailx -s 'PIKT Alert on vienna: Critical' brahms\" +A Critical 15 0-7,17-23 * * 1-5 5 /pikt/bin/pikt +M "/usr/bin/mailx -s 'PIKT Alert on vienna: Critical' brahms\" +A Critical 15 0-18/6 * * 0,6 5 /pikt/bin/pikt +M "/usr/bin/mailx -s 'PIKT Alert on vienna: Critical' brahms\" +A Critical
The idea here is to cut down on the volume of mail when nobody is usually around to respond to it. (Of course, this might cut down on the frequency of non-mail actions--such as auto-fixing what's broken--during evenings and weekends. So one must use this new feature carefully.)
Recently, we have grown increasingly aggravated with the high volume of PIKT mail overnight and on weekends. We have long seen the need for multiple timing specs. With pikt-1.16.0pre2, we have finally implemented it.
With the appropriate #if and #ifdef directives, you can customize these multiple timings on a per-machine and per-define basis to your heart's content. (For example, go into less-frequent-alert mode on weekends on your non-mission-critical systems only. The mission-critical systems alert as usual, with no diminishment of frequency, overnight and on weekends.)
All of your old timing settings should work. If you specify multiple timings, they must follow this format:
timing <mins> <hrs> <dom> <moy> <dow> <drift> <mins> <hrs> <dom> <moy> <dow> <drift> <mins> <hrs> <dom> <moy> <dow> <drift> ...
up to a maximum of sixteen timing specs. If you use multiple timing specs, all six fields are required; omitting <drift> (or using a separate line 'drift <drift>', permitted with single timing specs) is not an option.
Multiple alert timings is another one of those really good ideas just begging to be implemented. Why would you want to receive the same volume of PIKT alert e-mail during the evening and on weekends when presumably you are either away enjoying yourself or maybe asleep?
We have redone our PIKT alert timings as follows:
/////////////////////////////////////////////////////////////////////////////// EMERGENCY // things that require immediate attention #ifndef test # if hamburg | misscritsys // & ! moscow timing 10,25,40,55 6-18 * * 1-5 5 // mon-fri, day hrs 10 6-18 * * 0,6 5 // sun,sat, day hrs 10 0-5,19-23 * * * 5 // each day, nite hrs # else timing 10,40 6-18 * * 1-5 5 // mon-fri, day hrs 10 6-18/2 * * 0,6 5 // sun,sat, day hrs 10 0,2,4,20,22 * * * 5 // each day, nite hrs # endif #elsedef timing 10 8-16/2 * * * #endifdef // test ... /////////////////////////////////////////////////////////////////////////////// Urgent // things that deserve nearly immediate attention #ifndef test # if misscritsys # if athens2 | athens4 | moscow | murmanks timing 5,20,35,50 6-18 * * 1-5 5 // mon-fri, day hrs 20 6-18 * * 0,6 5 // sun,sat, day hrs 20 0-5,19-23 * * * 5 // each day, nite hrs # else timing 20,50 6-18 * * 1-5 5 // mon-fri, day hrs 20 6-18/2 * * 0,6 5 // sun,sat, day hrs 20 0,2,4,20,22 * * * 5 // each day, nite hrs # endif # else timing 20 6-18 * * 1-5 5 // mon-fri, day hrs 20 6-18/4 * * 0,6 5 // sun,sat, day hrs 20 2,22 * * * 5 // each day, nite hrs # endif #elsedef timing 20 8-16/2 * * * #endifdef // test ... /////////////////////////////////////////////////////////////////////////////// Critical // things that should be dealt with before too long, // preferably by day's end; (things reported here // may not be especially "critical" but are so // designated to conform with syslog's log levels) #ifndef test # if misscritsys timing 30 6-22/2 * * 1-5 5 // mon-fri, day&nite hrs 30 6-22/4 * * 0,6 5 // sun,sat, day&nite hrs # else timing 30 6-18/2 * * 1-5 5 // mon-fri, day&nite hrs // sun,sat, no hrs (no run) # endif #elsedef timing 30 8-16/2 * * * #endifdef // test ... /////////////////////////////////////////////////////////////////////////////// Warning // things that need attention, if not today, then // eventually; after looking at warning alerts, we // often just delete them at the end of the day, // clearing the deck for the next day's warnings #ifndef test # if cssys # if moscow timing 40 3 * * * 10 // each day // give extra time for BakMail // to finish # else timing 40 2 * * 1-5 10 // mon-fri // sun,sat, no run # endif # else timing 40 2 * * 1,4 10 // mon,thu // sun,tue-wed,fri-sat, no run # endif #elsedef timing 40 8-16/2 * * * #endifdef // test ... /////////////////////////////////////////////////////////////////////////////// Debug // for PIKT self-monitoring; these deserve // fairly close attention, especially on the // piktmaster, where we not only run more often, // we also cron it #ifndef test # if piktmaster // crond runs Debug at alternating intervals like so: // 55 1,3,5,7,9,11,13,15,17,19,21,23 * * * // /usr/bin/nice -10 // /pikt/bin/pikt +M "/usr/bin/mailx -s // 'PIKT Alert on vienna: Debug' // berto\" +A Debug timing 55 0-22/2 * * * 5 // each day, day&nite hrs # else timing 55 2-22/4 * * 1-5 5 // mon-fri, day&nite hrs 55 0-18/6 * * 0,6 5 // sun,sat, day&nite hrs # endif #elsedef timing 55 8-16/2 * * * #endifdef // test ... ///////////////////////////////////////////////////////////////////////////////
Because, during evenings and weekends, we are now running some alerts much less frequently, we were careful to make adjustments in logs_pikt_objects.cfg:
/////////////////////////////////////////////////////////////////////////////// // // logs_pikt_objects.cfg // /////////////////////////////////////////////////////////////////////////////// LogsPikt // these fields must be tab-delimited! from this point onward, the only // spaces in this file should be in preprocessor directive lines (e.g., // '#if piktmaster') or comment lines, or in preproc or postproc fields // all numeric fields must be actual numbers, not e.g., 2*60*60 or 2*\=KB, // but rather 7200 or 2048 // when both days and secs are 0, this is the signal for PIKT to skip the // fileage (and existence) test // for the binary field, binary files are 1, text files are 0 // maxsize is in KB; a negative number for the maxsize says not to truncate, // to report only // preproc is the command(s), if any, to do pre-truncation (e.g., to stop a // daemon) // postproc is the command(s), if any, to do post-truncation (e.g., to // restart a daemon) // macroname filepath days secs binary maxsize preproc postproc ... #if hamburg | misscritsys // & ! moscow emergency_log \=logdir/EMERGENCY.log 0 7200 0 10000 . . #else emergency_log \=logdir/EMERGENCY.log 0 14400 0 10000 . . #endif #if misscritsys urgent_log \=logdir/Urgent.log 0 7200 0 10000 . . #else urgent_log \=logdir/Urgent.log 0 39600 0 10000 . . #endif #if misscritsys critical_log \=logdir/Critical.log 0 39600 0 10000 . . #else critical_log \=logdir/Critical.log 2 0 0 10000 . . #endif ... ///////////////////////////////////////////////////////////////////////////////
We upped the secs field figures, and in some cases zeroed secs and set the days instead (as with critical_log for non-mission critical systems).
Without these adjustments, we would receive a flurry of messages about various PIKT log files being out-of-date.
I suppose that one could fancy this up to have days and secs specified on an in-week, weekend, and overnight basis.
After adjusting the timing specs in alerts.cfg, and the out-of-date specs in logs_pikt_objects.cfg, we applied the new timings, and installed the revised Fileages.obj, with:
# piktc -ierv +A all +O all -H downsys
The '-i' in the piktc command above installs all .obj files, including Fileages.obj. One could instead just specify '+O Fileages'. The '+A all' is necessary, not to reinstall all of the .alt files but rather to go with the '-e' to (e)nable the new alert timings (rewrite piktd.conf).
(By the way, you may be wondering about the test timings above, as in
#ifndef test ... #elsedef timing 10 8-16/2 * * * #endifdef // test
When we install PIKT on a new system, we install the new configuration in test--verbose, non-doexec, non-page--mode. Every two hours thereafter, we receive chatty PIKT e-mail from that system informing us what PIKT finds fault with and what it proposes to do, without actually doing it. After several hours of this, we shut down PIKT on the new system. Then, based on the accumulated PIKT alert e-mail, we revise the configuration until it's just right. We then remove the system from the newsys group in alerts.cfg, and reinstall and restart everything with 'piktc -ierv +A all ... +H <newsys>'. Because we do this last piktc action in non-test mode (removing the system from the newsys group in systems.cfg automatically turns on doexec in defines.cfg--take a look at 1.15.0's configs_samples), the newly revised PIKT configuration then goes into production mode with the normal, non-test alert timings. Follow?)
I am pretty excited about this new feature. It solves a long-standing problem. Now if only there were an acceptable way to reduce the hundreds of other, non-PIKT e-mails I routinely receive every day!
For more examples, see Developer's Notes.