Doing Something Substantial
Now it's time to do something a little more substantial, and somewhat more complicated.
(Note: For the rest of this Tutorial, we use Solaris as our example operating system. Make adjustments to your own situation as necessary.)
First, edit the alerts.cfg file, removing a couple of lines and uncommenting others. After the edits, your alerts.cfg file should look like this:
/////////////////////////////////////////////////////////////////////////// /////////////////////////////////////////////////////////////////////////// Critical // things that should be dealt with before too long, // preferably by day's end; (things reported here // may not be especially "critical" but are so // designated to conform with syslog's log levels) timing 20 * * * * mailcmd "=mailx -s 'PIKT Alert on =pikthostname: Critical' =piktadmin" alarms LoadAverageCritical ProcCountTotalCritical RootCoreFileExistCritical PasswdFileProblemsCritical MailQueueLengthyCritical MessagesScanCritical // DiskFullCritical /////////////////////////////////////////////////////////////////////////// Debug // for PIKT self-monitoring; these deserve // fairly close attention, especially on the // piktmaster timing 55 * * * * mailcmd "=mailx -s 'PIKT Alert on =pikthostname: Debug' =piktadmin" alarms PiktCriticalLogScanDebug /////////////////////////////////////////////////////////////////////////// #ifdef test //# if piktmaster Test // use for testing newly developed alarm scripts; // install with 'piktc -iv +D test +A Test +H ...' // or maybe 'piktc -iv +D test debug verbose -D page doexec +A Test +H ...' // after testing, remove all traces of // the Test alert with 'piktc -tv +A Test +H ...' timing =piktnever mailcmd "=mailx -s 'PIKT Alert on =pikthostname: Test' =piktadmin" alarms DiskFullCritical //# endif // piktmaster #endifdef // test ///////////////////////////////////////////////////////////////////////////The alarm scripts
LoadAverageCritical ProcCountTotalCritical RootCoreFileExistCritical PasswdFileProblemsCritical MailQueueLengthyCritical MessagesScanCriticalare a simple set of generic scripts that should work as is on just about any system. If any are inappropriate for your system or cause you any difficulty, simply comment them out in the same way that the DiskFullCritical script is commented out:
// DiskFullCriticalThe one Debug alarm script
PiktCriticalLogScanDebugscans the Critical.log file for any signs of trouble. It, too, is generic and should work as is on any system.
If you did your edits correctly, everything should check okay:
/pikt/bin/piktc -cv +H mysystem checking mysystem...Restart the piktc_svc (if you had killed it at the end of the first chapter of this Tutorial, Getting Started):
/pikt/bin/piktc_svcNext, install all alerts with:
/pikt/bin/piktc -iv +A all +H mysystem processing mysystem... installing file(s)... Debug.alt installed Critical.alt installedThis will install the alerts in the /pikt/lib/alerts directory:
ls /pikt/lib/alerts Critical.alt Debug.altIf you inspect either of these .alt files, the scripts therein will look much like the versions in alarms.cfg with the following differences:
- all // and /* */ comments are stripped
- all macros are expanded (e.g., =ps -> /usr/bin/ps)
- there are perhaps some minor layout differences
/pikt/bin/pikt +A Critical User nobody4 has nonexistent shell the size of /etc/passwd has changed by >= 10%, was -2 lines, is now 12 Sep 22 09:36:57 hissystem sshd[8955]: [ID 363151 daemon.notice] log: ROOT LOGIN as 'root' from hersystem.uppity.edu ...You may or may not get any output, depending on your setup (e.g., the contents of your passwd and messages files). If pikt complains about not finding the messages (or any other) file, edit the macros.cfg file and substitute the appropriate path.
Now try running the Debug scripts:
/pikt/bin/pikt +A Debug Sep 22 14:26:59 vienna pikt[1528]: [ID 1 INFO] [WARNING] in dorules(), MailQueueLengthyCritical, no input data sh: /usr/bin/uptime: not found Sep 22 14:29:44 vienna pikt[1528]: [ID 1 INFO] [WARNING] in dorules(), LoadAverageCritical, no input data Sep 22 14:29:47 vienna pikt[1528]: [ID 1 INFO] [WARNING] in dorules(), MailQueueLengthyCritical, no input data Sep 22 14:29:48 vienna pikt[1528]: [ID 1 INFO] [WARNING] in dorules(), MessagesScanCritical, no input dataDon't worry about the "no input data". They may be perfectly normal and expected for your situation. Look in the configs_samples to see how to filter out such alert messages.
If you see one or more error messages like
sh: /usr/bin/uptime: not foundthis is a sign that one or more of the command paths defined in macros.cfg are in error. If so, please fix them now.
If you are working through this Tutorial guide from the beginning, you still have the file /pikt/etc/piktd.conf. If that file exists, view its contents now.
cat /pikt/etc/piktd.conf * * * * * 0 /pikt/bin/pikt +M "/usr/bin/mailx -s 'PIKT Alert on mysystem: Critical' root" +A CriticalLooks much like a normal Unix crontab file, doesn't it?
This file was installed by your earlier '/pikt/bin/piktc -erv +A Critical +H mysystem' command.
Now, instead do
/pikt/bin/piktc -ev +A all +H mysystem processing mysystem... enabling alert(s)... Debug enabled Critical enabledInspect anew the piktd.conf file:
cat /pikt/etc/piktd.conf 55 * * * * 0 /pikt/bin/pikt +M "/usr/bin/mailx -s 'PIKT Alert on mysystem: Debug' root" +A Debug 20 * * * * 0 /pikt/bin/pikt +M "/usr/bin/mailx -s 'PIKT Alert on mysystem: Critical' root" +A CriticalYou have overwritten the previous piktd.conf with the latest specifications in your alerts.cfg file. There are now two lines, one for every alert listed in alerts.cfg, because you specified 'piktc -ev +A all', whereas earlier you specified '+A Critical' only. (Why doesn't the Test alert appear? It's because that alert is wrapped within an '#ifdef test ... #endifdef'. Since test is set to FALSE in defines.cfg, the preprocessor bypasses that alert. More on this later.)
So, 'piktc -e' enables or adds updated piktd.conf entries. ('piktc -d' disables or removes alert entries entirely.) Whenever you want to change your alert schedules, you would edit alerts.cfg then re-enable the affected alerts using 'piktc -e'.
Verify that no piktd is running:
ps -ef | grep pikt | grep -v grep root 9771 1 0 Sep 22 ? 0:00 /pikt/bin/piktc_svcThen (re)start the piktd with:
/pikt/bin/piktc -rv +H mysystem processing mysystem... (re)starting daemon (piktd)... daemon (re)startedVerify that piktd is now running:
mysystem:506> ps -ef | grep pikt | grep -v grep root 9771 1 0 Sep 22 ? 0:00 /pikt/bin/piktc_svc root 10521 1 0 18:46:35 ? 0:00 /pikt/bin/piktdYour half dozen or so Critical alarm scripts are now on duty, on the lookout for the indicated signs of trouble. You can check their status with
/pikt/bin/piktc -sv +A all +H mysystem processing mysystem... showing alert stata... Critical active Debug activeIf the process count were to spike, you (actually, the root account, or whatever you have defined as =piktadmin in macros.cfg) would receive email like
PIKT ALERT Wed Sep 26 17:20:36 2001 mysystem CRITICAL: ProcCountTotalCritical Report perilously high overall system process count The process count is 66(Actually, in order to have it run as is on any system, we simply feed 'ps' to the 'wc -l' in ProcCountTotalCritical. A simple 'ps' will not count every process on a system. You would normally use something like 'ps -aux' or 'ps -ef' instead. That's an exercise left to you, the reader.)
After a time, your log files should show signs of life:
ls -l /pikt/var/log total 62 -rw------- 1 root other 7308 Sep 26 17:59 Critical.log -rw------- 1 root other 1289 Sep 26 17:55 Debug.log -rw------- 1 root other 924 Sep 26 16:20 MessagesScan.log -rw------- 1 root other 516 Sep 26 17:46 piktc.log -rw------- 1 root other 1684 Sep 26 17:46 piktc_svc.log -rw------- 1 root other 902 Sep 26 17:55 piktd.logInspect these if you wish. With PIKT, many different things are logged. Get in the habit of referring to the log files if--no, when!--something goes wrong.
prev page | 1st page | next page |