alerts.cfg

(NOTE: Some of the techniques shown or described on this page--marked in purple--require new features in the latest official PIKT 1.19.0 release (pikt-current.tar.gz) that are unavailable in any previous version.)

The alerts.cfg file groups together alarm scripts. Use alerts.cfg to specify which alarm scripts to run, where and when to run them, at what priority ("nice" level), and where to send their output (typically e-mail).

For example, here is a specification for a so-called 'Urgent' alert:

///////////////////////////////////////////////////////////////////////////////
// 
// PIKT alerts.cfg -- grouping and scheduling alarm and program scripts
// 
///////////////////////////////////////////////////////////////////////////////

...

Urgent                  // things that deserve nearly immediate attention

        timing          30 * * * * 1

        nicecmd         "=nice -n 19"
        mailcmd         "=mailx -s 'PIKT Alert on =pikthostname: Urgent' =pikt-urgent"
        lpcmd           "=lp =piktprinter"

        status          active
        level           urgent

        alarms
#if piktmaster
                        SysDown
#endif
                        DmesgScan
                        LoadAverage
                        ProcessCounts
                        ZombieCounts
                        CPUUsage

...

In alerts.cfg (or any of its #include files), the general stanza format is:


        timing             
    []
        drift                                 [optional]

        priority|nice                      [optional]
  [or:]
        nicecmd           ""                  [optional]

        mailcmd           ""                  [optional]
        lpcmd             ""                    [optional]
        execcmd           ""                  [optional]
        status            active|inactive|suspended|
                          testing|debugging            [optional]
        level             emergency|urgent|critical|
                          error|warning|notice|
                          info|debug                   [optional]
        alarms|scripts                          [optional]
                          
                          
                            ...

The alert name usually indicates its timing or purpose.

The timing parameters follow the usual cron conventions (see the crontab man page) and then some.

One not so usual timing spec is x-y/z, which says to range from x through y at intervals of z.

Another novel timing spec is random timings. You may specify timing values like so:

        timing          20% * * * *

This says to run the alert randomly on average every five minutes. (That is, there is a 20% chance of running the alert any given minute.)

Here is another example:

        timing          30 25% * * *

This says to run the alert randomly on average every four hours (at half past the hour). (That is, there is a 25% chance of running the alert any given hour.)

To be more precise, in the first example, the '20%' is like a '*', that is, match every single minute, then apply a 20% probability modifier to the entire timing. Probability modifiers are cumulative. So, in this example,

        timing          20% 25% * * *

in effect, every minute there is a 20% times 25% chance, or just 5% chance overall, of running the alert (on average three times every hour, because there is a one-in-twenty chance of running each minute, and there are sixty minutes in an hour). To run an alert at 3:30 in the morning on average once every four days, you could do this

        timing          30 3 25% * *

or this

        timing          30 3 * 25% *

or this

        timing          30 3 * * 25%

or even this

        timing          30 3 50% * 50%

The last four examples would all have the same effect.

The random timing spec is especially useful in security situations, where you want some unpredictability in your monitoring schedules.

Still another novel timing spec is "drift". "drift" is how many minutes an alert launch may randomly occur before or after a specified time.

For example,

        timing  0,30 * * * * 5

or equivalently,

        timing  0,30 * * * *
        drift   5

says to run an alert twice an hour, at the top of the hour also the bottom of the hour, give or take a random 5 minutes. In other words, the alert might run randomly anywhere from, say, 14:55 to 15:05, and from 15:25 to 15:35 (and likewise for every other hour during the day).

Timing drift may be as large as you want, so long as there is no possibility of alert launches overlapping. So, it is even possible to schedule alerts with drift extending to days or even weeks.

Timing drift is especially useful when you want alerts to run around a certain time (e.g., midnight), but you don't want them all to go off at precisely that time on all systems (to avoid "bunching up").

The multiple alert timing option allows you to set, say, different alert schedules for normal in-week business hours, in-week night-time hours, and weekend hours. For example

Critical

       timing     15,45 8-16      * * 1-5 5  // run every 1/2 hour
                                             // (with 5-minute drift)
                                             // from 8 AM to before 5 PM
                                             // on Monday thru Friday
                  15    0-7,17-23 * * 1-5 5  // run once hourly
                                             // (with 5-minute drift)
                                             // before 8 AM and
                                             // after 5 PM
                                             // on Monday thru Friday
                  15    0-18/6    * * 0,6 5  // run only every six hours
                                             // (with 5-minute drift)
                                             // on Saturday and Sunday

With these settings, the following lines would appear in piktd.conf (with line wrap shown here for display purposes):

15,45 8-16 * * 1-5 5 /pikt/bin/pikt +M "/usr/bin/mailx -s
  'PIKT Alert on vienna: Critical' brahms\" +A Critical
15 0-7,17-23 * * 1-5 5 /pikt/bin/pikt +M "/usr/bin/mailx -s
  'PIKT Alert on vienna: Critical' brahms\" +A Critical
15 0-18/6 * * 0,6 5 /pikt/bin/pikt +M "/usr/bin/mailx -s
  'PIKT Alert on vienna: Critical' brahms\" +A Critical

The idea here is to cut down on the volume of mail when nobody is usually around to respond to it. (Of course, this might cut down on the frequency of non-mail actions--such as auto-fixing what's broken--during evenings and weekends. So you must use this feature carefully.)

If you specify multiple timings, they must follow this format:

up to a maximum of sixteen timing specs. If you use multiple timing specs, all six fields are required; omitting (or using a separate line 'drift ', permitted with single timing specs) is not an option. With the appropriate #if and #ifdef directives, you can customize these multiple timings on a per-machine and per-define basis to your heart's content. (For example, go into less-frequent-alert mode on weekends on your non-mission-critical systems only. The mission-critical systems alert as usual, with no diminishment of frequency, overnight and on weekends.)

For scripts meant to be executed via 'piktc -x' and not run via piktd, you should specify the timing using the built-in macro =piktnever.

You can specify the alert's "nice" level using either the "priority" or "nice" keyword. The priority level must range from -20 (highest priority) to 19 (lowest priority). If no priority is specified, the nice level defaults to 0.

Alternatively, you can use the "nicecmd" keyword, and specify the full nice command following (e.g., "/usr/bin/nice -10"; you also have the option of inserting here something other than the default Unix nice command).

As alarm scripts are run, their output is queued. At the end of the alert run (an alert is a set of alarms), the queued output may be sent as a single e-mail message, to one or more systems administrators, or printed out. The mailcmd/lpcmd lines indicate what mail/print commands to use, and in the case of mailcmd who the e-mail gets sent to. (It's possible to dispense with any reporting, hence both mailcmd and lpcmd are optional.)

Typically you would set alarm status and level individually in each Pikt script, but you can also set alert-wide defaults in alerts.cfg, then reference those defaults within Pikt scripts using the =piktstatus & =piktlevel built-in macros.

For example, if you specify this in your alerts.cfg:

Urgent
        ...
        status          suspended
        level           urgent
        ...

you could then do this in alarms.cfg:

ScanSyslog
        init
                status =piktstatus
                level =piktlevel
                ...

When the ScanSyslog alarm is installed on the PIKT slave system, the actual result would be:

ScanSyslog
        init
                status suspended
                level urgent
                ...

Setting alert-wide default alarm status and level in this way is optional. Note that if you have these alert-wide defaults, it is still possible to override status and/or level on a per-alarm basis (for example, in any alarm specification in alarms.cfg, directly specifying 'status suspended' or 'status testing' and so on instead of 'status =piktstatus').

What usually follows is a list of alarms or Pikt scripts. Every item in the alarms or scripts list must have a corresponding alarm or script definition in alarms.cfg.

Alarms are usually grouped together by function, by level of criticality, or by timing.

With execcmd, you may instead register simple one-liner, crontab-like commands directly in piktd.conf, for example:

#if mailserver

BakMail         // do nightly backup of /var/mail

        timing          40 23 * * *
        priority        10
        execcmd         "=prgdir/bakmail.pl -R 1"

#endif

The command

# piktc -ierv +A BakMail +H mailserver

installs the following alarm script on all mailserver machines

BakMailScript
        exec "/pikt/lib/programs/bakmail.pl -R 1"

(note: registration of this script in alarms.cfg is unnecessary; it is implicit when using execcmd) and adds the following line to those systems' piktd.conf (and restarts the piktd):

40 23 * * * 0 /usr/bin/nice -10 /pikt/bin/pikt +A BakMail

Note that you may not use execcmd together with an alarms or scripts listing. You either specify one execcmd, or you specify an alarms or scripts list.

Refer to the sample alerts.cfg for more examples.

1st page