SNMP Monitoring
(NOTE: Some of the techniques shown or described on this page--marked in purple--require new features in the latest official PIKT 1.19.0 release (pikt-current.tar.gz) that are unavailable in any previous version.)
SNMP (Simple Network Management Protocol) is a widely used protocol for monitoring networked devices, such as computers, routers, printers, and even things such as UPSes. Following is a discussion of how to integrate PIKT with SNMP, for example using PIKT to monitor the health of machine room air conditioning and power units.
In this example, we will be using Net-SNMP, an Open Source SNMP toolkit. Refer to the the Net-SNMP Home Page for more information.
We also make reference to several products from the Liebert Corporation, a division of Emerson Network Power. In particular, we will use PIKT to monitor Liebert air conditioners and UPSes.
First, we write a PIKT objects.cfg file, to specify the devices, their associated MIBs (Management Information Bases), OIDs (Object Identifiers) of special interest, device states, and threshold values (maximums and minimums). (Note that, for display purposes, we are forced to linewrap lines such as '// device ... pval' and 'liebert-air-1.acme.com ... 75 80'. In reality, in the actual snmp_liebert_objects.cfg file, those would be very wide unbroken lines.)
///////////////////////////////////////////////////////////////////////////////
//
// snmp_liebert_objects.cfg
//
///////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////
#indent
///////////////////////////////////////////////////////////////////////////////
// device mib
oid
typ mval pval
///////////////////////////////////////////////////////////////////////////////
// lgpEnvTemperatureMeasurementDegF OBJECT-TYPE
// SYNTAX Integer32
// UNITS "degrees Fahrenheit"
// MAX-ACCESS read-only
// STATUS current
// DESCRIPTION
// "The measured temperature value."
// ::= { lgpEnvTemperatureEntryDegF 3 }
//
// /usr/local/bin/snmpget -M /usr/local/share/snmp/mibs -m ALL
// liebert-air-1.acme.com 1.3.6.1.4.1.476.1.42.3.4.1.2.3.1.3.1
liebert-air-1.acme.com LIEBERT_GP_ENV.MIB
LIEBERT-GP-ENVIRONMENTAL-MIB::lgpEnvTemperatureMeasurementDegF.1
+ 75 80
liebert-air-2.acme.com LIEBERT_GP_ENV.MIB
LIEBERT-GP-ENVIRONMENTAL-MIB::lgpEnvTemperatureMeasurementDegF.1
+ 90 100
///////////////////////////////////////////////////////////////////////////////
// lgpEnvHumidityMeasurementRel OBJECT-TYPE
// SYNTAX Integer32
// UNITS "percent Relative Humidity"
// MAX-ACCESS read-only
// STATUS current
// DESCRIPTION
// "The measured humidity value."
// ::= { lgpEnvHumidityEntryRel 3 }
liebert-air-1.acme.com LIEBERT_GP_ENV.MIB
LIEBERT-GP-ENVIRONMENTAL-MIB::lgpEnvHumidityMeasurementRel.1
+ 60 80
liebert-air-2.acme.com LIEBERT_GP_ENV.MIB
LIEBERT-GP-ENVIRONMENTAL-MIB::lgpEnvHumidityMeasurementRel.1
+ 80 100
///////////////////////////////////////////////////////////////////////////////
...
///////////////////////////////////////////////////////////////////////////////
// lgpPwrOutputToLoadOnInverter OBJECT-TYPE
// SYNTAX INTEGER { yes(1), no(2)}
// MAX-ACCESS read-only
// STATUS current
// DESCRIPTION
// "The present source of output power is the Inverter."
// ::= { lgpPwrStatus 7 }
//
// /usr/local/bin/snmpget -M /usr/local/share/snmp/mibs -m ALL
// liebert-power-1.acme.com 1.3.6.1.4.1.476.1.42.3.5.3.7.0
liebert-power-1.acme.com LIEBERT_GP_POWER.MIB
LIEBERT-GP-POWER-MIB::lgpPwrOutputToLoadOnInverter.0
? no no
///////////////////////////////////////////////////////////////////////////////
// lgpPwrBatteryCapacity OBJECT-TYPE
// SYNTAX Integer32
// UNITS "percent"
// MAX-ACCESS read-only
// STATUS current
// DESCRIPTION
// "The present percentage of battery capacity."
// ::= { lgpPwrBattery 19 }
//
// /usr/local/bin/snmpget -M /usr/local/share/snmp/mibs -m ALL
// liebert-power-1.acme.com 1.3.6.1.4.1.476.1.42.3.5.1.19.0
liebert-power-1.acme.com LIEBERT_GP_POWER.MIB
LIEBERT-GP-POWER-MIB::lgpPwrBatteryCapacity.0
- 60 40
///////////////////////////////////////////////////////////////////////////////
#unindent
///////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////
Most of snmp_liebert_objects.cfg is comments. The only lines of real consequence are the uncommented lines beginning with device names, such as 'liebert-power-1.acme.com'.
snmp_liebert_objects.cfg is an #include file, referenced in objects.cfg as:
#if snmpmaster SNMPLiebert #include <objects/snmp_liebert_objects.cfg> #endif // snmpmasterwhere snmpmaster is defined in systems.cfg as the piktmaster system.
We would install the SNMPLiebert.obj file on the snmpmaster using the piktc command
# piktc -iv +O SNMPLiebert +H snmpmasterThe resulting SNMPLiebert.obj file would look like
liebert-air-1.acme.com LIEBERT_GP_ENV.MIB
LIEBERT-GP-ENVIRONMENTAL-MIB::lgpEnvTemperatureMeasurementDegF.1
+ 75 80
liebert-air-2.acme.com LIEBERT_GP_ENV.MIB
LIEBERT-GP-ENVIRONMENTAL-MIB::lgpEnvTemperatureMeasurementDegF.1
+ 90 100
...
liebert-power-1.acme.com LIEBERT_GP_POWER.MIB
LIEBERT-GP-POWER-MIB::lgpPwrBatteryCapacity.0
- 60 40
that is, just the 'liebert...' lines and no comment lines. (Again, the displayed lines in reality would be unbroken.)
Next, we show the monitoring script, SNMPLiebert:
///////////////////////////////////////////////////////////////////////////////
//
// liebert_alarms.cfg
//
///////////////////////////////////////////////////////////////////////////////
SNMPLiebert
init
status =piktstatus
level =piktlevel
task "Report worrisome Liebert SNMP conditions"
input file "=objdir/SNMPLiebert.obj"
dat $dev 1
dat $mib 2
dat $oid 3
dat $typ 4
dat $mval 5
dat $pval 6
keys $dev $oid
rule
switch $typ
case "+"
#ifndef debug
set #mailval = #val($mval)
set #pageval = #val($pval)
#elsedef
set #mailval = 0
set #pageval = 0
#endifdef
breaksw
case "-"
#ifndef debug
set #mailval = #val($mval)
set #pageval = #val($pval)
#elsedef
set #mailval = 1000000
set #pageval = 1000000
#endifdef
breaksw
case "?"
#ifndef debug
set $mailtxt = $mval
set $pagetxt = $pval
#elsedef
set $mailtxt = "."
set $pagetxt = "."
#endifdef
breaksw
endsw
rule
set $snmpcmd = "=snmpget -m '=snmpmibsdir/$mib' -Ovq $dev $oid 2>/dev/null"
#ifdef debug
output $snmpcmd
#endifdef
rule
set $oidtxt = $command("$snmpcmd | =awk '{print \$1}'")
#ifdef debug
output $oidtxt
#endifdef
rule // log the reading
=output_alarm_log("$dev: $oid $oidtxt")
rule
if $typ ne "?"
set #oidval = #val($oidtxt)
fi
rule // for exceeding max thresholds
if $typ eq "+"
set $msg = "$dev: $oid $text(#oidval) >= mailval $text(#mailval)"
// email if greater than or equal to #mailval,
// and thereafter only if is rising
if #oidval >= #mailval
&& (! #defined(%oidval) || #oidval > %oidval )
output mail $msg
=shout($msg, =always)
fi
// page if greater than or equal to #pageval,
// but only once every hour
if #oidval >= #pageval
=hourly(=page($msg, =allpager, =always), )
fi
fi
rule // for falling short of min thresholds
if $typ eq "-"
set $msg = "$dev: $oid $text(#oidval) <= mailval $text(#mailval)"
// email if lesser than or equal to #mailval,
// and thereafter only if is falling
if #oidval <= #mailval
&& (! #defined(%oidval) || #oidval < %oidval )
output mail $msg
=shout($msg, =always)
fi
// page if lesser than or equal to #pageval,
// but only once every hour
if #oidval <= #pageval
=hourly(=page($msg, =allpager, =always), )
fi
fi
rule // for error conditions
if $typ eq "?"
set $msg = "$dev: $oid $oidtxt"
// email if $mailtxt
if $oidtxt =~~ $mailtxt
output mail $msg
=shout($msg, =always)
fi
// page if $pagetxt,
// but only once every hour
if $oidtxt =~~ $pagetxt
=hourly(=page($msg, =allpager, =always), )
fi
fi
///////////////////////////////////////////////////////////////////////////////
In the init section, we use dat statements to assign the six fields in each SNMPLiebert.obj line to the six variables ($dev, ..., $pval) shown. Because we also reference history values (for example, %oidval) later on in the script, we specify our unique lookup keys with the statement 'keys $dev $oid'.
In the first script rule, we assign some thresholds, or report and page triggers. The "+" $typ refers to values we are concerned might go too high (such as temperatures). The "-" $typ refers to values we are concerned might go too low (such as battery capacity). The "?" $typ refers to fault or failure states.
For testing purposes, by means of the '#ifndef debug ... elsedef ... #endifdef directives, we setup some alternate thresholds and device states, for example a temperature #mailval of 0. When debugging, we are always assured of having any recorded temperature exceeding the artificially low threshold of 0.
In the second rule, we set an SNMP command string. "=snmpget" and "=snmpmibsdir" are macros defined in macros.cfg as
snmpget /usr/local/bin/snmpget ... snmpmibsdir /usr/local/share/snmp/mibs
In the third rule, using the $command() function, we issue the snmpget command, i.e., poll the indicated device, and assign its return value to the variable $oidtxt.
In the fourth rule, we log the reading using the =output_alarm_log macro. Resulting log output, in the file =logdir/SNMPLiebert.log, might look something like:
... Jun 14 05:00:02 hamburg pikt[31621]: [ID 1 INFO] liebert-air-1.acme.com: LIEBERT-GP-ENVIRONMENTAL-MIB::lgpEnvTemperatureMeasurementDegF.1 69 Jun 14 05:00:03 hamburg pikt[31621]: [ID 1 INFO] liebert-air-2.acme.com: LIEBERT-GP-ENVIRONMENTAL-MIB::lgpEnvTemperatureMeasurementDegF.1 78 Jun 14 05:00:04 hamburg pikt[31621]: [ID 1 INFO] liebert-air-1.acme.com: LIEBERT-GP-ENVIRONMENTAL-MIB::lgpEnvHumidityMeasurementRel.1 35 Jun 14 05:00:05 hamburg pikt[31621]: [ID 1 INFO] liebert-air-2.acme.com: LIEBERT-GP-ENVIRONMENTAL-MIB::lgpEnvHumidityMeasurementRel.1 48 ... Jun 14 05:00:06 hamburg pikt[31621]: [ID 1 INFO] liebert-power-1.acme.com: LIEBERT-GP-POWER-MIB::lgpPwrOutputToLoadOnInverter.0 yes Jun 14 05:00:10 hamburg pikt[31621]: [ID 1 INFO] liebert-power-1.acme.com: LIEBERT-GP-POWER-MIB::lgpPwrBatteryCapacity.0 84 ...We could scope out the temperature history of the liebert-air-1 device using the command,
# egrep liebert-air-1 /pikt/var/log/SNMPLiebert.log | egrep -i tempIn the script fifth rule, in the case of "+" and/or "-" value types, we convert the $oidtxt string to a #oidval numerical value.
In the sixth rule, we test against some max thresholds. If the measured value exceeds the mail threshold, and thereafter only if the measured value is rising, we send an e-mail message reporting the fault condition.
Just in case we are distracted away from our mail reader (doing other work in other virtual screens or other terminal windows), we also wall the piktmaster system using the =shout() macro:
shout(M, H)
#if piktmaster
if (H)
doexec wait "=echo '(M)' | =wall"
fi
#else
=piktnullchar
#endif
(We could easily change the #if ... #endif to =shout(), i.e., wall to systems in addition to or other than the piktmaster.)
If the threshold is exceeded, we also send a page, but only once every hour, using the macros
hourly(A1, A2)
set #tv60 = #now()
if ! #defined(%tv60) || (#tv60 - %tv60 >= 60*60 - =driftfactor)
(A1)
else
(A2)
set #tv60 = %tv60
fi
...
page(M, R, H) // send a page message (M) to recipients (pager phone alias)
// (R) but only during hours (H), and only if =etcdir/nopage
// and tmp/nopage don't exist (i.e., 'touch /pikt/etc/nopage'
// or 'touch /tmp/nopage' will block all paging)
// sample use: =page($host is sick/down, =pagesysadmins, =allhours(#now()))
if ! -e "=etcdir/nopage" && ! -e "/tmp/nopage"
if (H)
doexec wait "=echo '(M)' | =mailx -a 'From: piktadmin' -s '(M)' (R)"
#if piktmaster
=shout((M), =always)
#endif
fi
fi
...
#ifdef debug
allpager =piktadmin // brahms\@acme.com
#elsedef
allpager emergency\@acme.com
#endifdef
...
always #true()
Note how we also =shout() within the =page() macro, but only to the piktmaster system. (We could easily change the #if ... #endif to =shout(), i.e., wall to additional or other systems.)
In the seventh rule, we likewise test against some min thresholds.
In the eighth and final rule, we pattern match against specified error strings, for example, if the present source of output power is the inverter. (When in debug mode, we pattern match against the "." wildcard, which always matches.)
Here is a sample mail alert from SNMPLiebert:
PIKT ALERT
Thu Jun 14 02:30:08 2007
hamburg
EMERGENCY:
SNMPLiebert
Report worrisome Liebert SNMP conditions
liebert-air-1.acme.com: LIEBERT-GP-ENVIRONMENTAL-MIB::lgpEnvTemperatureMeasurementDegF.1 83
>= mailval 75
And here is a sample page message:
liebert-air-1.acme.com: LIEBERT-GP-ENVIRONMENTAL-MIB::lgpEnvTemperatureMeasurementDegF.1 83 >= pageval 80Since this script employs paging, we might want to use all the safeguards and precautions described in Script Development and Testing to ensure we don't accidentally page co-workers during the script development and testing process.
There's much more we could do with PIKT and SNMP, including writing scripts to control devices, not just report error and fault conditions. Whatever, with PIKT, we have a powerful whip for taming the SNMP beast.
| | 1st page | next page |