Scanning a Log File
(NOTE: Some of the techniques shown or described on this page--marked in purple--require new features in the latest official PIKT 1.19.0 release (pikt-current.tar.gz) that are unavailable in any previous version.)
PIKT is commonly used as a log file analyzer. You can employ PIKT to routinely scan and analyze log files and report items of special concern or interest. Under extraordinary circumstances, you can even program PIKT to take action, such as shut down a network service or lock a user account.
For example, DmesgScan is a Pikt script to scan system dmesg (kernel ring buffer) output. Let's put it in a separate #include file, log_system_alarms.cfg. In the top-level macros.cfg config file, we would then reference the #include file this way:
/////////////////////////////////////////////////////////////////////////////// // // alarms.cfg // /////////////////////////////////////////////////////////////////////////////// ... #includeHere is the DmesgScan script:... ///////////////////////////////////////////////////////////////////////////////
/////////////////////////////////////////////////////////////////////////////// // // logs_system_alarms.cfg // /////////////////////////////////////////////////////////////////////////////// DmesgScan init status =piktstatus level =piktlevel task "Scan the dmesg output for troublesome or noteworthy entries" input proc "if [ -e =hstdir/log/dmesg=piktalert=_.bak ]; then =diff =hstdir/log/dmesg=piktalert=_.bak =hstdir/log/dmesg=piktalert | =egrep '>' | =cut -b 3- 2>/dev/null; else =cat =hstdir/log/dmesg=piktalert 2>/dev/null; fi" begin if $level() !~~ "emerg|alert" =checkpoint(=lalim) fi doexec wait "=dmesg | =uniq > =hstdir/log/dmesg=piktalert" rule if $level() =~~ "emerg|alert" // alert == urgent && $inlin =~~ "=redflags" leave elsif $level() =~~ "critical" && $inlin =~~ "=yellowflags" leave elsif $alarm() eq "ScanDmesg" && $inlin =~~ "=redflags|=yellowflags" leave else next fi rule // bypass this junk if $inlin =~~ "^[[:space:]]*[[:digit:]]+[[:space:]]*$" || $inlin =~~ "^[[:space:]]*not found!" #if rotterdam || $inlin =~~ "dev fd0" #endif next fi rule // log anything not bypassed =output_alarm_log($inlin) rule // ignore scheduled reboots if =reboot_period(#daynumber(), #hour()) next fi rule // bypasses if $inlin =~~ "pcie_portdrv_probe->.+has invalid irq| exception support|exception polling| obsolete setsockopt|reset.+speed usb| failed to allocate mem resource| scsi0: aen: warning|set_dentry_child_flags| write protect|handling phase mismatch| too many iterations.+nv_nic_irq| analog subsections not ready" next fi rule // report the good stuff output mail $inlin end // make dmesg*.bak for diff'ing against next time around exec wait "=mv =hstdir/log/dmesg=piktalert =hstdir/log/dmesg=piktalert=_.bak" ...We actually invoke the DmesgScan script in three different contexts.
First, within the Urgent alert group. Here is the Urgent stanza from alerts.cfg:
/////////////////////////////////////////////////////////////////////////////// Urgent // things that deserve nearly immediate attention timing 15,45 * * * * 1 nicecmd "=nice -n 19" mailcmd "=mailx -a 'From: piktadmin' -s 'PIKT Alert on =pikthostname: Urgent' =sysadmins-urgent" status active level urgent alarms ... DmesgScan ... ///////////////////////////////////////////////////////////////////////////////Second, within the Critical alert group in alerts.cfg:
/////////////////////////////////////////////////////////////////////////////// Critical // things that should be dealt with before too long, // preferably by day's end timing 0,30 * * * * 1 nicecmd "=nice -n 19" mailcmd "=mailx -a 'From: piktadmin' -s 'PIKT Alert on =pikthostname: Critical' =sysadmins-critical" status active level critical alarms ... DmesgScan ... ///////////////////////////////////////////////////////////////////////////////Returning to the DmesgScan script, and skipping ahead in the script for a moment, in the begin section, we first check--but for low-priority scripts only--the system load average
if $level() !~~ "emerg|alert" =checkpoint(=lalim) fiusing the =checkpoint() & =lalim macros:
lalim // load average limit #if missioncritical 0.50 #else # ifdef debug 0.10 # elsedef 1.00 # endifdef #endif checkpoint(LA) #ifdef debug output $chop($command("=uptime | =awk '{print \$10}'")) #endifdef set #la = #val($chop($command("=uptime | =awk '{print \$10}'"))) if #la >= (LA) output log "=logdir/Checkpoint.log" "quit $alarm(), load average $text(#la, 2)" quit fiIn other words, if the load average is unusually high (where "high" means > 0.5 on missioncritical systems--defined in systems.cfg--else > 1.0 everywhere else), we abort the script.
(You might consider adding the =checkpoint() macro to the begin section of many of your Pikt scripts, especially the CPU and system resources intensive ones. There's no sense in PIKT making a bad situation worse! Having said that, we also can say that with well crafted scripts properly scheduled, PIKT adds very little overhead to the system under normal operations.)
If the load average is not too high, hence we can proceed beyond the =checkpoint(), we capture dmesg output (squelching any duplicate entries by means of the =uniq filter) to a log file:
doexec wait "=dmesg | =uniq > =hstdir/log/dmesg=piktalert"Only after processing the begin section do we open the script input. In the 'input proc' statement
input proc "if [ -e =hstdir/log/dmesg=piktalert=_.bak ]; then =diff =hstdir/log/dmesg=piktalert=_.bak =hstdir/log/dmesg=piktalert | =egrep '>' | =cut -b 3- 2>/dev/null; else =cat =hstdir/log/dmesg=piktalert 2>/dev/null; fi"we check for the existence of a dmesgUrgent.bak (or dmesgCritical.bak or ...) backup file. If it exists, we diff it against the current dmesgUrgent (or ...) file created in the begin section. We pipe the diff output to an egrep & cut filter to show just the new stuff (minus the line-beginning '> '). Otherwise, if there is no backup file, we simply cat the current dmesgUrgent (or ...) file. (We could use 'dmesg -c' to clear the ring buffer each time we run the script, allowing us to greatly simplify getting at the new stuff. But we prefer to retain the dmesg contents for other purposes. Hence all of these complexities.)
So, when all is said and done, the script input is just any new entries in the current dmesg ring buffer.
Entering the script input loop, we first check for entries of interest:
rule if $level() =~~ "emerg|alert" // alert == urgent && $inlin =~~ "=redflags" leave elsif $level() =~~ "critical" && $inlin =~~ "=yellowflags" leave elsif $alarm() eq "ScanDmesg" && $inlin =~~ "=redflags|=yellowflags" leave else next fiThe =redflags and =yellowflags macros, used here and widely elsewhere within our PIKT setup, are defined as
redflags fail|error| err |die|down|seg.*fault|terminate|attack|abort yellowflags warn|no such|broken|invalid|attempt|time(d )*out|reset|corrupt| not responding| bad |going down|unauthorized|exception|halt| authentication|obsolete|not ready|not found|abnormal|correctable| unexpected|please enable|can.t find|unable to|no device found| too many|is bad|can.t read|corrected|does not setup|inconsistent| fixed|runaway|mismatch| is off|overdue|could not be|quirk detected| cannot allocate|spuriousIn other words, the first rule says:
- If this is a high priority alert, with a level of "emerg" (emergency) or "alert" (aka "urgent", in PIKT-speak), then only check for red-flagged, especially troublesome entries. (So if the conditions are true, we 'leave' this rule and proceed to subsequent rules. 'next' would have us go on to the next input line, without proceeding to subsequent rules.)
- Else if this is a not-so-high priority alert, with a level of "critical", only check for less-troubling, yellow-flagged entries.
- Else if the alarm script's name is "ScanDmesg", a script to be run interactively from the command-line, check for both red-flagged and yellow-flagged entries.
- Else for low-priority alerts, or for input lines that don't pattern match =redflag or =yellowflag, disregard the current dmesg entry, and move on to the next (in the script input stream).
Of course, you can still use preprocessor #if and #ifdef directives and other controls to customize your scripts before run-time, at the time of installation. All things considered, these are very powerful features that allow us to write fewer, adaptable, all-purpose scripts rather than many, many inflexible single-use scripts which are basically all alike except for a few details. (Although sometimes simplicity in script design is better. It all depends.)
In the DmesgScan script's second rule, we bypass some uninteresting junk entries. For the rotterdam system, we also bypass any entry reporting that system's flakey floppy disk drive.
In the third rule, we log anything not bypassed so far in a special DmesgScan.log log file using the macro
output_alarm_log(M) output log "=logdir/" . $alarm() . ".log" (M)So, later on even if we don't e-mail certain entries because we've filtered them out, we still have them logged, just in case.
In the fourth rule
rule // ignore scheduled reboots if =reboot_period(#daynumber(), #hour()) next fion certain systems, which are automatically rebooted early in the morning each day, we ignore their reboot entries (much the same day after day) using the macro
reboot_period(D, H) // (D) is currently unused #if tabletserver ((H) < 2) // (after midnight &) before 2 am #else ((H) < 0) // i.e., never #endifIn the fifth rule, we bypass some oft-appearing nuisance entries. (But remember: we logged them in the previous rule.)
In the sixth and final rule, here is where we send the alert e-mail. A typical message might look something like
PIKT ALERT Sun Mar 18 21:15:05 2007 milan CRITICAL: DmesgScan Scan the dmesg output for troublesome or noteworthy entries usb 2-1.2: reset low speed USB device using ohci_hcd and address 18 usb 2-1.2: reset low speed USB device using ohci_hcd and address 18 hub 2-1:1.0: cannot reset port 3 (err = -19) hub 2-1:1.0: cannot reset port 3 (err = -19) hub 2-1:1.0: cannot reset port 3 (err = -19) usb 2-1.2: reset low speed USB device using ohci_hcd and address 21 usb 2-1.2: reset low speed USB device using ohci_hcd and address 23 ...Finally, in the script end section, we mv the current dmesgUrgent (or ...) file to its backup version for diffing the next time around.
DmesgScan is another rather elaborate example script showcasing many of PIKT's advanced features. Don't be intimidated by all the potential complexity! Make your Pikt scripts as simple or as complex as you wish. Use #include files, macros, #ifdef and #if directives, self-adapting scripts, and other sophisticated PIKT techniques depending on your needs and according to your style and taste. Complexity and cleverness are cool, but there's also wisdom in the KISS Principle ("Keep It Simple, Stupid.").
Refer to the PIKT Samples for many, many other script examples, large and small, long and short, simple and complex.
prev page | 1st page | next page |