alerts.cfg
(NOTE: Some of the techniques shown or described on this page--marked in purple--require new features in the latest official PIKT 1.19.0 release (pikt-current.tar.gz) that are unavailable in any previous version.)
The alerts.cfg file groups together alarm scripts. Use alerts.cfg to specify which alarm scripts to run, where and when to run them, at what priority ("nice" level), and where to send their output (typically e-mail).
For example, here is a specification for a so-called 'Urgent' alert:
/////////////////////////////////////////////////////////////////////////////// // // PIKT alerts.cfg -- grouping and scheduling alarm and program scripts // /////////////////////////////////////////////////////////////////////////////// ... Urgent // things that deserve nearly immediate attention timing 30 * * * * 1 nicecmd "=nice -n 19" mailcmd "=mailx -s 'PIKT Alert on =pikthostname: Urgent' =pikt-urgent" lpcmd "=lp =piktprinter" status active level urgent alarms #if piktmaster SysDown #endif DmesgScan LoadAverage ProcessCounts ZombieCounts CPUUsage ...In alerts.cfg (or any of its #include files), the general stanza format is:
The alert name usually indicates its timing or purpose.timing
[ ] drift [optional] priority|nice [optional] [or:] nicecmd " " [optional] mailcmd " " [optional] lpcmd " " [optional] execcmd " " [optional] status active|inactive|suspended| testing|debugging [optional] level emergency|urgent|critical| error|warning|notice| info|debug [optional] alarms|scripts [optional] ...
The timing parameters follow the usual cron conventions (see the crontab man page) and then some.
One not so usual timing spec is x-y/z, which says to range from x through y at intervals of z.
Another novel timing spec is random timings. You may specify timing values like so:
timing 20% * * * *This says to run the alert randomly on average every five minutes. (That is, there is a 20% chance of running the alert any given minute.)
Here is another example:
timing 30 25% * * *This says to run the alert randomly on average every four hours (at half past the hour). (That is, there is a 25% chance of running the alert any given hour.)
To be more precise, in the first example, the '20%' is like a '*', that is, match every single minute, then apply a 20% probability modifier to the entire timing. Probability modifiers are cumulative. So, in this example,
timing 20% 25% * * *in effect, every minute there is a 20% times 25% chance, or just 5% chance overall, of running the alert (on average three times every hour, because there is a one-in-twenty chance of running each minute, and there are sixty minutes in an hour). To run an alert at 3:30 in the morning on average once every four days, you could do this
timing 30 3 25% * *or this
timing 30 3 * 25% *or this
timing 30 3 * * 25%or even this
timing 30 3 50% * 50%The last four examples would all have the same effect.
The random timing spec is especially useful in security situations, where you want some unpredictability in your monitoring schedules.
Still another novel timing spec is "drift". "drift" is how many minutes an alert launch may randomly occur before or after a specified time.
For example,
timing 0,30 * * * * 5or equivalently,
timing 0,30 * * * * drift 5says to run an alert twice an hour, at the top of the hour also the bottom of the hour, give or take a random 5 minutes. In other words, the alert might run randomly anywhere from, say, 14:55 to 15:05, and from 15:25 to 15:35 (and likewise for every other hour during the day).
Timing drift may be as large as you want, so long as there is no possibility of alert launches overlapping. So, it is even possible to schedule alerts with drift extending to days or even weeks.
Timing drift is especially useful when you want alerts to run around a certain time (e.g., midnight), but you don't want them all to go off at precisely that time on all systems (to avoid "bunching up").
The multiple alert timing option allows you to set, say, different alert schedules for normal in-week business hours, in-week night-time hours, and weekend hours. For example
Critical timing 15,45 8-16 * * 1-5 5 // run every 1/2 hour // (with 5-minute drift) // from 8 AM to before 5 PM // on Monday thru Friday 15 0-7,17-23 * * 1-5 5 // run once hourly // (with 5-minute drift) // before 8 AM and // after 5 PM // on Monday thru Friday 15 0-18/6 * * 0,6 5 // run only every six hours // (with 5-minute drift) // on Saturday and SundayWith these settings, the following lines would appear in piktd.conf (with line wrap shown here for display purposes):
15,45 8-16 * * 1-5 5 /pikt/bin/pikt +M "/usr/bin/mailx -s 'PIKT Alert on vienna: Critical' brahms\" +A Critical 15 0-7,17-23 * * 1-5 5 /pikt/bin/pikt +M "/usr/bin/mailx -s 'PIKT Alert on vienna: Critical' brahms\" +A Critical 15 0-18/6 * * 0,6 5 /pikt/bin/pikt +M "/usr/bin/mailx -s 'PIKT Alert on vienna: Critical' brahms\" +A CriticalThe idea here is to cut down on the volume of mail when nobody is usually around to respond to it. (Of course, this might cut down on the frequency of non-mail actions--such as auto-fixing what's broken--during evenings and weekends. So you must use this feature carefully.)
If you specify multiple timings, they must follow this format:
timingup to a maximum of sixteen timing specs. If you use multiple timing specs, all six fields are required; omitting...
For scripts meant to be executed via 'piktc -x' and not run via piktd, you should specify the timing using the built-in macro =piktnever.
You can specify the alert's "nice" level using either the "priority" or "nice" keyword. The priority level must range from -20 (highest priority) to 19 (lowest priority). If no priority is specified, the nice level defaults to 0.
Alternatively, you can use the "nicecmd" keyword, and specify the full nice command following (e.g., "/usr/bin/nice -10"; you also have the option of inserting here something other than the default Unix nice command).
As alarm scripts are run, their output is queued. At the end of the alert run (an alert is a set of alarms), the queued output may be sent as a single e-mail message, to one or more systems administrators, or printed out. The mailcmd/lpcmd lines indicate what mail/print commands to use, and in the case of mailcmd who the e-mail gets sent to. (It's possible to dispense with any reporting, hence both mailcmd and lpcmd are optional.)
Typically you would set alarm status and level individually in each Pikt script, but you can also set alert-wide defaults in alerts.cfg, then reference those defaults within Pikt scripts using the =piktstatus & =piktlevel built-in macros.
For example, if you specify this in your alerts.cfg:
Urgent ... status suspended level urgent ...you could then do this in alarms.cfg:
ScanSyslog init status =piktstatus level =piktlevel ...When the ScanSyslog alarm is installed on the PIKT slave system, the actual result would be:
ScanSyslog init status suspended level urgent ...Setting alert-wide default alarm status and level in this way is optional. Note that if you have these alert-wide defaults, it is still possible to override status and/or level on a per-alarm basis (for example, in any alarm specification in alarms.cfg, directly specifying 'status suspended' or 'status testing' and so on instead of 'status =piktstatus').
What usually follows is a list of alarms or Pikt scripts. Every item in the alarms or scripts list must have a corresponding alarm or script definition in alarms.cfg.
Alarms are usually grouped together by function, by level of criticality, or by timing.
With execcmd, you may instead register simple one-liner, crontab-like commands directly in piktd.conf, for example:
#if mailserver BakMail // do nightly backup of /var/mail timing 40 23 * * * priority 10 execcmd "=prgdir/bakmail.pl -R 1" #endifThe command
# piktc -ierv +A BakMail +H mailserverinstalls the following alarm script on all mailserver machines
BakMailScript exec "/pikt/lib/programs/bakmail.pl -R 1"(note: registration of this script in alarms.cfg is unnecessary; it is implicit when using execcmd) and adds the following line to those systems' piktd.conf (and restarts the piktd):
40 23 * * * 0 /usr/bin/nice -10 /pikt/bin/pikt +A BakMailNote that you may not use execcmd together with an alarms or scripts listing. You either specify one execcmd, or you specify an alarms or scripts list.
Refer to the sample alerts.cfg for more examples.
prev page | 1st page | next page |