SNMP Monitoring
(NOTE: Some of the techniques shown or described on this page--marked in purple--require new features in the latest official PIKT 1.19.0 release (pikt-current.tar.gz) that are unavailable in any previous version.)
SNMP (Simple Network Management Protocol) is a widely used protocol for monitoring networked devices, such as computers, routers, printers, and even things such as UPSes. Following is a discussion of how to integrate PIKT with SNMP, for example using PIKT to monitor the health of machine room air conditioning and power units.
In this example, we will be using Net-SNMP, an Open Source SNMP toolkit. Refer to the the Net-SNMP Home Page for more information.
We also make reference to several products from the Liebert Corporation, a division of . In particular, we will use PIKT to monitor Liebert air conditioners and UPSes.
First, we write a PIKT objects.cfg file, to specify the devices, their associated MIBs (Management Information Bases), OIDs (Object Identifiers) of special interest, device states, and threshold values (maximums and minimums). (Note that, for display purposes, we are forced to linewrap lines such as '// device ... pval' and 'liebert-air-1.acme.com ... 75 80'. In reality, in the actual snmp_liebert_objects.cfg file, those would be very wide unbroken lines.)
/////////////////////////////////////////////////////////////////////////////// // // snmp_liebert_objects.cfg // /////////////////////////////////////////////////////////////////////////////// /////////////////////////////////////////////////////////////////////////////// #indent /////////////////////////////////////////////////////////////////////////////// // device mib oid typ mval pval /////////////////////////////////////////////////////////////////////////////// // lgpEnvTemperatureMeasurementDegF OBJECT-TYPE // SYNTAX Integer32 // UNITS "degrees Fahrenheit" // MAX-ACCESS read-only // STATUS current // DESCRIPTION // "The measured temperature value." // ::= { lgpEnvTemperatureEntryDegF 3 } // // /usr/local/bin/snmpget -M /usr/local/share/snmp/mibs -m ALL // liebert-air-1.acme.com 1.3.6.1.4.1.476.1.42.3.4.1.2.3.1.3.1 liebert-air-1.acme.com LIEBERT_GP_ENV.MIB LIEBERT-GP-ENVIRONMENTAL-MIB::lgpEnvTemperatureMeasurementDegF.1 + 75 80 liebert-air-2.acme.com LIEBERT_GP_ENV.MIB LIEBERT-GP-ENVIRONMENTAL-MIB::lgpEnvTemperatureMeasurementDegF.1 + 90 100 /////////////////////////////////////////////////////////////////////////////// // lgpEnvHumidityMeasurementRel OBJECT-TYPE // SYNTAX Integer32 // UNITS "percent Relative Humidity" // MAX-ACCESS read-only // STATUS current // DESCRIPTION // "The measured humidity value." // ::= { lgpEnvHumidityEntryRel 3 } liebert-air-1.acme.com LIEBERT_GP_ENV.MIB LIEBERT-GP-ENVIRONMENTAL-MIB::lgpEnvHumidityMeasurementRel.1 + 60 80 liebert-air-2.acme.com LIEBERT_GP_ENV.MIB LIEBERT-GP-ENVIRONMENTAL-MIB::lgpEnvHumidityMeasurementRel.1 + 80 100 /////////////////////////////////////////////////////////////////////////////// ... /////////////////////////////////////////////////////////////////////////////// // lgpPwrOutputToLoadOnInverter OBJECT-TYPE // SYNTAX INTEGER { yes(1), no(2)} // MAX-ACCESS read-only // STATUS current // DESCRIPTION // "The present source of output power is the Inverter." // ::= { lgpPwrStatus 7 } // // /usr/local/bin/snmpget -M /usr/local/share/snmp/mibs -m ALL // liebert-power-1.acme.com 1.3.6.1.4.1.476.1.42.3.5.3.7.0 liebert-power-1.acme.com LIEBERT_GP_POWER.MIB LIEBERT-GP-POWER-MIB::lgpPwrOutputToLoadOnInverter.0 ? no no /////////////////////////////////////////////////////////////////////////////// // lgpPwrBatteryCapacity OBJECT-TYPE // SYNTAX Integer32 // UNITS "percent" // MAX-ACCESS read-only // STATUS current // DESCRIPTION // "The present percentage of battery capacity." // ::= { lgpPwrBattery 19 } // // /usr/local/bin/snmpget -M /usr/local/share/snmp/mibs -m ALL // liebert-power-1.acme.com 1.3.6.1.4.1.476.1.42.3.5.1.19.0 liebert-power-1.acme.com LIEBERT_GP_POWER.MIB LIEBERT-GP-POWER-MIB::lgpPwrBatteryCapacity.0 - 60 40 /////////////////////////////////////////////////////////////////////////////// #unindent /////////////////////////////////////////////////////////////////////////////// ///////////////////////////////////////////////////////////////////////////////Most of snmp_liebert_objects.cfg is comments. The only lines of real consequence are the uncommented lines beginning with device names, such as 'liebert-power-1.acme.com'.
snmp_liebert_objects.cfg is an #include file, referenced in objects.cfg as:
#if snmpmaster SNMPLiebert #includewhere snmpmaster is defined in systems.cfg as the piktmaster system.#endif // snmpmaster
We would install the SNMPLiebert.obj file on the snmpmaster using the piktc command
# piktc -iv +O SNMPLiebert +H snmpmasterThe resulting SNMPLiebert.obj file would look like
liebert-air-1.acme.com LIEBERT_GP_ENV.MIB LIEBERT-GP-ENVIRONMENTAL-MIB::lgpEnvTemperatureMeasurementDegF.1 + 75 80 liebert-air-2.acme.com LIEBERT_GP_ENV.MIB LIEBERT-GP-ENVIRONMENTAL-MIB::lgpEnvTemperatureMeasurementDegF.1 + 90 100 ... liebert-power-1.acme.com LIEBERT_GP_POWER.MIB LIEBERT-GP-POWER-MIB::lgpPwrBatteryCapacity.0 - 60 40that is, just the 'liebert...' lines and no comment lines. (Again, the displayed lines in reality would be unbroken.)
Next, we show the monitoring script, SNMPLiebert:
/////////////////////////////////////////////////////////////////////////////// // // liebert_alarms.cfg // /////////////////////////////////////////////////////////////////////////////// SNMPLiebert init status =piktstatus level =piktlevel task "Report worrisome Liebert SNMP conditions" input file "=objdir/SNMPLiebert.obj" dat $dev 1 dat $mib 2 dat $oid 3 dat $typ 4 dat $mval 5 dat $pval 6 keys $dev $oid rule switch $typ case "+" #ifndef debug set #mailval = #val($mval) set #pageval = #val($pval) #elsedef set #mailval = 0 set #pageval = 0 #endifdef breaksw case "-" #ifndef debug set #mailval = #val($mval) set #pageval = #val($pval) #elsedef set #mailval = 1000000 set #pageval = 1000000 #endifdef breaksw case "?" #ifndef debug set $mailtxt = $mval set $pagetxt = $pval #elsedef set $mailtxt = "." set $pagetxt = "." #endifdef breaksw endsw rule set $snmpcmd = "=snmpget -m '=snmpmibsdir/$mib' -Ovq $dev $oid 2>/dev/null" #ifdef debug output $snmpcmd #endifdef rule set $oidtxt = $command("$snmpcmd | =awk '{print \$1}'") #ifdef debug output $oidtxt #endifdef rule // log the reading =output_alarm_log("$dev: $oid $oidtxt") rule if $typ ne "?" set #oidval = #val($oidtxt) fi rule // for exceeding max thresholds if $typ eq "+" set $msg = "$dev: $oid $text(#oidval) >= mailval $text(#mailval)" // email if greater than or equal to #mailval, // and thereafter only if is rising if #oidval >= #mailval && (! #defined(%oidval) || #oidval > %oidval ) output mail $msg =shout($msg, =always) fi // page if greater than or equal to #pageval, // but only once every hour if #oidval >= #pageval =hourly(=page($msg, =allpager, =always), ) fi fi rule // for falling short of min thresholds if $typ eq "-" set $msg = "$dev: $oid $text(#oidval) <= mailval $text(#mailval)" // email if lesser than or equal to #mailval, // and thereafter only if is falling if #oidval <= #mailval && (! #defined(%oidval) || #oidval < %oidval ) output mail $msg =shout($msg, =always) fi // page if lesser than or equal to #pageval, // but only once every hour if #oidval <= #pageval =hourly(=page($msg, =allpager, =always), ) fi fi rule // for error conditions if $typ eq "?" set $msg = "$dev: $oid $oidtxt" // email if $mailtxt if $oidtxt =~~ $mailtxt output mail $msg =shout($msg, =always) fi // page if $pagetxt, // but only once every hour if $oidtxt =~~ $pagetxt =hourly(=page($msg, =allpager, =always), ) fi fi ///////////////////////////////////////////////////////////////////////////////In the init section, we use dat statements to assign the six fields in each SNMPLiebert.obj line to the six variables ($dev, ..., $pval) shown. Because we also reference history values (for example, %oidval) later on in the script, we specify our unique lookup keys with the statement 'keys $dev $oid'.
In the first script rule, we assign some thresholds, or report and page triggers. The "+" $typ refers to values we are concerned might go too high (such as temperatures). The "-" $typ refers to values we are concerned might go too low (such as battery capacity). The "?" $typ refers to fault or failure states.
For testing purposes, by means of the '#ifndef debug ... elsedef ... #endifdef directives, we setup some alternate thresholds and device states, for example a temperature #mailval of 0. When debugging, we are always assured of having any recorded temperature exceeding the artificially low threshold of 0.
In the second rule, we set an SNMP command string. "=snmpget" and "=snmpmibsdir" are macros defined in macros.cfg as
snmpget /usr/local/bin/snmpget ... snmpmibsdir /usr/local/share/snmp/mibs
In the third rule, using the $command() function, we issue the snmpget command, i.e., poll the indicated device, and assign its return value to the variable $oidtxt.
In the fourth rule, we log the reading using the =output_alarm_log macro. Resulting log output, in the file =logdir/SNMPLiebert.log, might look something like:
... Jun 14 05:00:02 hamburg pikt[31621]: [ID 1 INFO] liebert-air-1.acme.com: LIEBERT-GP-ENVIRONMENTAL-MIB::lgpEnvTemperatureMeasurementDegF.1 69 Jun 14 05:00:03 hamburg pikt[31621]: [ID 1 INFO] liebert-air-2.acme.com: LIEBERT-GP-ENVIRONMENTAL-MIB::lgpEnvTemperatureMeasurementDegF.1 78 Jun 14 05:00:04 hamburg pikt[31621]: [ID 1 INFO] liebert-air-1.acme.com: LIEBERT-GP-ENVIRONMENTAL-MIB::lgpEnvHumidityMeasurementRel.1 35 Jun 14 05:00:05 hamburg pikt[31621]: [ID 1 INFO] liebert-air-2.acme.com: LIEBERT-GP-ENVIRONMENTAL-MIB::lgpEnvHumidityMeasurementRel.1 48 ... Jun 14 05:00:06 hamburg pikt[31621]: [ID 1 INFO] liebert-power-1.acme.com: LIEBERT-GP-POWER-MIB::lgpPwrOutputToLoadOnInverter.0 yes Jun 14 05:00:10 hamburg pikt[31621]: [ID 1 INFO] liebert-power-1.acme.com: LIEBERT-GP-POWER-MIB::lgpPwrBatteryCapacity.0 84 ...We could scope out the temperature history of the liebert-air-1 device using the command,
# egrep liebert-air-1 /pikt/var/log/SNMPLiebert.log | egrep -i tempIn the script fifth rule, in the case of "+" and/or "-" value types, we convert the $oidtxt string to a #oidval numerical value.
In the sixth rule, we test against some max thresholds. If the measured value exceeds the mail threshold, and thereafter only if the measured value is rising, we send an e-mail message reporting the fault condition.
Just in case we are distracted away from our mail reader (doing other work in other virtual screens or other terminal windows), we also wall the piktmaster system using the =shout() macro:
shout(M, H) #if piktmaster if (H) doexec wait "=echo '(M)' | =wall" fi #else =piktnullchar #endif(We could easily change the #if ... #endif to =shout(), i.e., wall to systems in addition to or other than the piktmaster.)
If the threshold is exceeded, we also send a page, but only once every hour, using the macros
hourly(A1, A2) set #tv60 = #now() if ! #defined(%tv60) || (#tv60 - %tv60 >= 60*60 - =driftfactor) (A1) else (A2) set #tv60 = %tv60 fi ... page(M, R, H) // send a page message (M) to recipients (pager phone alias) // (R) but only during hours (H), and only if =etcdir/nopage // and tmp/nopage don't exist (i.e., 'touch /pikt/etc/nopage' // or 'touch /tmp/nopage' will block all paging) // sample use: =page($host is sick/down, =pagesysadmins, =allhours(#now())) if ! -e "=etcdir/nopage" && ! -e "/tmp/nopage" if (H) doexec wait "=echo '(M)' | =mailx -a 'From: piktadmin' -s '(M)' (R)" #if piktmaster =shout((M), =always) #endif fi fi ... #ifdef debug allpager =piktadmin // brahms\ #elsedef allpager emergency\ #endifdef ... always #true()Note how we also =shout() within the =page() macro, but only to the piktmaster system. (We could easily change the #if ... #endif to =shout(), i.e., wall to additional or other systems.)
In the seventh rule, we likewise test against some min thresholds.
In the eighth and final rule, we pattern match against specified error strings, for example, if the present source of output power is the inverter. (When in debug mode, we pattern match against the "." wildcard, which always matches.)
Here is a sample mail alert from SNMPLiebert:
PIKT ALERT Thu Jun 14 02:30:08 2007 hamburg EMERGENCY: SNMPLiebert Report worrisome Liebert SNMP conditions liebert-air-1.acme.com: LIEBERT-GP-ENVIRONMENTAL-MIB::lgpEnvTemperatureMeasurementDegF.1 83 >= mailval 75And here is a sample page message:
liebert-air-1.acme.com: LIEBERT-GP-ENVIRONMENTAL-MIB::lgpEnvTemperatureMeasurementDegF.1 83 >= pageval 80Since this script employs paging, we might want to use all the safeguards and precautions described in Script Development and Testing to ensure we don't accidentally page co-workers during the script development and testing process.
There's much more we could do with PIKT and SNMP, including writing scripts to control devices, not just report error and fault conditions. Whatever, with PIKT, we have a powerful whip for taming the SNMP beast.
prev page | 1st page | next page |