SNMP Monitoring

(NOTE:  Some of the techniques shown or described on this page--marked in purple--require new features in the latest official PIKT 1.19.0 release (pikt-current.tar.gz) that are unavailable in any previous version.)

SNMP (Simple Network Management Protocol) is a widely used protocol for monitoring networked devices, such as computers, routers, printers, and even things such as UPSes.  Following is a discussion of how to integrate PIKT with SNMP, for example using PIKT to monitor the health of machine room air conditioning and power units.

In this example, we will be using Net-SNMP, an Open Source SNMP toolkit.  Refer to the the Net-SNMP Home Page for more information.

We also make reference to several products from the Liebert Corporation, a division of Emerson Network Power.  In particular, we will use PIKT to monitor Liebert air conditioners and UPSes.

First, we write a PIKT objects.cfg file, to specify the devices, their associated MIBs (Management Information Bases), OIDs (Object Identifiers) of special interest, device states, and threshold values (maximums and minimums).  (Note that, for display purposes, we are forced to linewrap lines such as '// device ... pval' and 'liebert-air-1.acme.com ... 75 80'.  In reality, in the actual snmp_liebert_objects.cfg file, those would be very wide unbroken lines.)

///////////////////////////////////////////////////////////////////////////////
//
// snmp_liebert_objects.cfg
//
///////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////

#indent

///////////////////////////////////////////////////////////////////////////////

// device                               mib
        oid
        typ             mval            pval

///////////////////////////////////////////////////////////////////////////////

// lgpEnvTemperatureMeasurementDegF OBJECT-TYPE
//    SYNTAX      Integer32
//    UNITS       "degrees Fahrenheit"
//    MAX-ACCESS  read-only
//    STATUS      current
//    DESCRIPTION
//        "The measured temperature value."
//    ::= { lgpEnvTemperatureEntryDegF 3 }
//
// /usr/local/bin/snmpget -M /usr/local/share/snmp/mibs -m ALL
//    liebert-air-1.acme.com 1.3.6.1.4.1.476.1.42.3.4.1.2.3.1.3.1

liebert-air-1.acme.com            LIEBERT_GP_ENV.MIB
        LIEBERT-GP-ENVIRONMENTAL-MIB::lgpEnvTemperatureMeasurementDegF.1
        +               75              80
liebert-air-2.acme.com            LIEBERT_GP_ENV.MIB
        LIEBERT-GP-ENVIRONMENTAL-MIB::lgpEnvTemperatureMeasurementDegF.1
        +               90              100

///////////////////////////////////////////////////////////////////////////////

// lgpEnvHumidityMeasurementRel OBJECT-TYPE
//    SYNTAX      Integer32
//    UNITS       "percent Relative Humidity"
//    MAX-ACCESS  read-only
//    STATUS      current
//    DESCRIPTION
//        "The measured humidity value."
//    ::= { lgpEnvHumidityEntryRel 3 }

liebert-air-1.acme.com            LIEBERT_GP_ENV.MIB
        LIEBERT-GP-ENVIRONMENTAL-MIB::lgpEnvHumidityMeasurementRel.1
        +               60              80
liebert-air-2.acme.com            LIEBERT_GP_ENV.MIB
        LIEBERT-GP-ENVIRONMENTAL-MIB::lgpEnvHumidityMeasurementRel.1
        +               80              100

///////////////////////////////////////////////////////////////////////////////

...

///////////////////////////////////////////////////////////////////////////////

// lgpPwrOutputToLoadOnInverter OBJECT-TYPE
//    SYNTAX      INTEGER { yes(1), no(2)}
//    MAX-ACCESS  read-only
//    STATUS      current
//    DESCRIPTION
//        "The present source of output power is the Inverter."
//    ::= { lgpPwrStatus 7 }
//
// /usr/local/bin/snmpget -M /usr/local/share/snmp/mibs -m ALL
//    liebert-power-1.acme.com 1.3.6.1.4.1.476.1.42.3.5.3.7.0

liebert-power-1.acme.com          LIEBERT_GP_POWER.MIB
        LIEBERT-GP-POWER-MIB::lgpPwrOutputToLoadOnInverter.0
        ?               no              no

///////////////////////////////////////////////////////////////////////////////

// lgpPwrBatteryCapacity OBJECT-TYPE
//    SYNTAX      Integer32
//        UNITS           "percent"
//    MAX-ACCESS  read-only
//    STATUS      current
//    DESCRIPTION
//        "The present percentage of battery capacity."
//    ::= { lgpPwrBattery 19 }
//
// /usr/local/bin/snmpget -M /usr/local/share/snmp/mibs -m ALL
//    liebert-power-1.acme.com 1.3.6.1.4.1.476.1.42.3.5.1.19.0

liebert-power-1.acme.com          LIEBERT_GP_POWER.MIB
        LIEBERT-GP-POWER-MIB::lgpPwrBatteryCapacity.0
        -               60              40

///////////////////////////////////////////////////////////////////////////////

#unindent

///////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////
Most of snmp_liebert_objects.cfg is comments.  The only lines of real consequence are the uncommented lines beginning with device names, such as 'liebert-power-1.acme.com'.

snmp_liebert_objects.cfg is an #include file, referenced in objects.cfg as:

#if snmpmaster

SNMPLiebert

#include <objects/snmp_liebert_objects.cfg>

#endif  // snmpmaster
where snmpmaster is defined in systems.cfg as the piktmaster system.

We would install the SNMPLiebert.obj file on the snmpmaster using the piktc command

# piktc -iv +O SNMPLiebert +H snmpmaster
The resulting SNMPLiebert.obj file would look like
liebert-air-1.acme.com            LIEBERT_GP_ENV.MIB
        LIEBERT-GP-ENVIRONMENTAL-MIB::lgpEnvTemperatureMeasurementDegF.1
        +               75              80
liebert-air-2.acme.com            LIEBERT_GP_ENV.MIB
        LIEBERT-GP-ENVIRONMENTAL-MIB::lgpEnvTemperatureMeasurementDegF.1
        +               90              100
...
liebert-power-1.acme.com          LIEBERT_GP_POWER.MIB
        LIEBERT-GP-POWER-MIB::lgpPwrBatteryCapacity.0
        -               60              40
that is, just the 'liebert...' lines and no comment lines.  (Again, the displayed lines in reality would be unbroken.)

Next, we show the monitoring script, SNMPLiebert:

///////////////////////////////////////////////////////////////////////////////
//
// liebert_alarms.cfg
//
///////////////////////////////////////////////////////////////////////////////

SNMPLiebert

        init
                status =piktstatus
                level =piktlevel
                task "Report worrisome Liebert SNMP conditions"
                input file "=objdir/SNMPLiebert.obj"
                dat $dev  1
                dat $mib  2
                dat $oid  3
                dat $typ  4
                dat $mval 5
                dat $pval 6
                keys $dev $oid

        rule
                switch $typ
                        case "+"
#ifndef debug
                                set #mailval = #val($mval)
                                set #pageval = #val($pval)
#elsedef
                                set #mailval = 0
                                set #pageval = 0
#endifdef
                                breaksw
                        case "-"
#ifndef debug
                                set #mailval = #val($mval)
                                set #pageval = #val($pval)
#elsedef
                                set #mailval = 1000000
                                set #pageval = 1000000
#endifdef
                                breaksw
                        case "?"
#ifndef debug
                                set $mailtxt = $mval
                                set $pagetxt = $pval
#elsedef
                                set $mailtxt = "."
                                set $pagetxt = "."
#endifdef
                                breaksw
                endsw

        rule
                set $snmpcmd = "=snmpget -m '=snmpmibsdir/$mib' -Ovq $dev $oid 2>/dev/null"
#ifdef debug
                output $snmpcmd
#endifdef

        rule
                set $oidtxt = $command("$snmpcmd | =awk '{print \$1}'")
#ifdef debug
                output $oidtxt
#endifdef

        rule    // log the reading
                =output_alarm_log("$dev: $oid $oidtxt")

        rule
                if $typ ne "?"
                        set #oidval = #val($oidtxt)
                fi

        rule    // for exceeding max thresholds
                if $typ eq "+"
                        set $msg = "$dev: $oid $text(#oidval) >= mailval $text(#mailval)"
                        // email if greater than or equal to #mailval,
                        // and thereafter only if is rising
                        if    #oidval >= #mailval
                           && (! #defined(%oidval) || #oidval > %oidval )
                                output mail $msg
                                =shout($msg, =always)
                        fi
                        // page if greater than or equal to #pageval,
                        // but only once every hour
                        if    #oidval >= #pageval
                                =hourly(=page($msg, =allpager, =always), )
                        fi
                fi

        rule    // for falling short of min thresholds
                if $typ eq "-"
                        set $msg = "$dev: $oid $text(#oidval) <= mailval $text(#mailval)"
                        // email if lesser than or equal to #mailval,
                        // and thereafter only if is falling
                        if    #oidval <= #mailval
                           && (! #defined(%oidval) || #oidval < %oidval )
                                output mail $msg
                                =shout($msg, =always)
                        fi
                        // page if lesser than or equal to #pageval,
                        // but only once every hour
                        if    #oidval <= #pageval
                                =hourly(=page($msg, =allpager, =always), )
                        fi
                fi

        rule    // for error conditions
                if $typ eq "?"
                        set $msg = "$dev: $oid $oidtxt"
                        // email if $mailtxt
                        if $oidtxt =~~ $mailtxt
                                output mail $msg
                                =shout($msg, =always)
                        fi
                        // page if $pagetxt,
                        // but only once every hour
                        if $oidtxt =~~ $pagetxt
                                =hourly(=page($msg, =allpager, =always), )
                        fi
                fi

///////////////////////////////////////////////////////////////////////////////
In the init section, we use dat statements to assign the six fields in each SNMPLiebert.obj line to the six variables ($dev, ..., $pval) shown.  Because we also reference history values (for example, %oidval) later on in the script, we specify our unique lookup keys with the statement 'keys $dev $oid'.

In the first script rule, we assign some thresholds, or report and page triggers.  The "+" $typ refers to values we are concerned might go too high (such as temperatures).  The "-" $typ refers to values we are concerned might go too low (such as battery capacity).  The "?" $typ refers to fault or failure states.

For testing purposes, by means of the '#ifndef debug ... elsedef ... #endifdef directives, we setup some alternate thresholds and device states, for example a temperature #mailval of 0.  When debugging, we are always assured of having any recorded temperature exceeding the artificially low threshold of 0.

In the second rule, we set an SNMP command string.  "=snmpget" and "=snmpmibsdir" are macros defined in macros.cfg as

snmpget		/usr/local/bin/snmpget

...

snmpmibsdir     /usr/local/share/snmp/mibs

In the third rule, using the $command() function, we issue the snmpget command, i.e., poll the indicated device, and assign its return value to the variable $oidtxt.

In the fourth rule, we log the reading using the =output_alarm_log macro.  Resulting log output, in the file =logdir/SNMPLiebert.log, might look something like:

...
Jun 14 05:00:02 hamburg pikt[31621]: [ID 1 INFO] liebert-air-1.acme.com:
  LIEBERT-GP-ENVIRONMENTAL-MIB::lgpEnvTemperatureMeasurementDegF.1 69
Jun 14 05:00:03 hamburg pikt[31621]: [ID 1 INFO] liebert-air-2.acme.com:
  LIEBERT-GP-ENVIRONMENTAL-MIB::lgpEnvTemperatureMeasurementDegF.1 78
Jun 14 05:00:04 hamburg pikt[31621]: [ID 1 INFO] liebert-air-1.acme.com:
  LIEBERT-GP-ENVIRONMENTAL-MIB::lgpEnvHumidityMeasurementRel.1 35
Jun 14 05:00:05 hamburg pikt[31621]: [ID 1 INFO] liebert-air-2.acme.com:
  LIEBERT-GP-ENVIRONMENTAL-MIB::lgpEnvHumidityMeasurementRel.1 48
...
Jun 14 05:00:06 hamburg pikt[31621]: [ID 1 INFO] liebert-power-1.acme.com:
  LIEBERT-GP-POWER-MIB::lgpPwrOutputToLoadOnInverter.0 yes
Jun 14 05:00:10 hamburg pikt[31621]: [ID 1 INFO] liebert-power-1.acme.com:
  LIEBERT-GP-POWER-MIB::lgpPwrBatteryCapacity.0 84
...
We could scope out the temperature history of the liebert-air-1 device using the command,
# egrep liebert-air-1 /pikt/var/log/SNMPLiebert.log | egrep -i temp
In the script fifth rule, in the case of "+" and/or "-" value types, we convert the $oidtxt string to a #oidval numerical value.

In the sixth rule, we test against some max thresholds.  If the measured value exceeds the mail threshold, and thereafter only if the measured value is rising, we send an e-mail message reporting the fault condition. 

Just in case we are distracted away from our mail reader (doing other work in other virtual screens or other terminal windows), we also wall the piktmaster system using the =shout() macro:

shout(M, H)
#if piktmaster
                        if (H)
                                doexec wait "=echo '(M)' | =wall"
                        fi
#else
                        =piktnullchar
#endif
(We could easily change the #if ... #endif to =shout(), i.e., wall to systems in addition to or other than the piktmaster.)

If the threshold is exceeded, we also send a page, but only once every hour, using the macros

hourly(A1, A2)
                        set #tv60 = #now()
                        if ! #defined(%tv60) || (#tv60 - %tv60 >= 60*60 - =driftfactor)
                                (A1)
                        else
                                (A2)
                                set #tv60 = %tv60
                        fi

...

page(M, R, H)   // send a page message (M) to recipients (pager phone alias)
                // (R) but only during hours (H), and only if =etcdir/nopage
                // and tmp/nopage don't exist (i.e., 'touch /pikt/etc/nopage'
                // or 'touch /tmp/nopage' will block all paging)
                // sample use:  =page($host is sick/down, =pagesysadmins, =allhours(#now()))
                if ! -e "=etcdir/nopage" && ! -e "/tmp/nopage"
                        if (H)
                                doexec wait "=echo '(M)' | =mailx -a 'From: piktadmin' -s '(M)' (R)"
#if piktmaster
                                =shout((M), =always)
#endif
                        fi
                fi

...

#ifdef debug
allpager        =piktadmin      // brahms\@acme.com
#elsedef
allpager        emergency\@acme.com
#endifdef

...

always          #true()
Note how we also =shout() within the =page() macro, but only to the piktmaster system.  (We could easily change the #if ... #endif to =shout(), i.e., wall to additional or other systems.)

In the seventh rule, we likewise test against some min thresholds.

In the eighth and final rule, we pattern match against specified error strings, for example, if the present source of output power is the inverter.  (When in debug mode, we pattern match against the "." wildcard, which always matches.)

Here is a sample mail alert from SNMPLiebert:

                               PIKT ALERT
                         Thu Jun 14 02:30:08 2007
                                hamburg

EMERGENCY:
    SNMPLiebert
        Report worrisome Liebert SNMP conditions

        liebert-air-1.acme.com: LIEBERT-GP-ENVIRONMENTAL-MIB::lgpEnvTemperatureMeasurementDegF.1 83
          >= mailval 75
And here is a sample page message:
liebert-air-1.acme.com: LIEBERT-GP-ENVIRONMENTAL-MIB::lgpEnvTemperatureMeasurementDegF.1 83
  >= pageval 80
Since this script employs paging, we might want to use all the safeguards and precautions described in Script Development and Testing to ensure we don't accidentally page co-workers during the script development and testing process.

There's much more we could do with PIKT and SNMP, including writing scripts to control devices, not just report error and fault conditions.  Whatever, with PIKT, we have a powerful whip for taming the SNMP beast.

prev page 1st page next page
 
Home | FAQ | News | Intro | Samples | Tutorial | Reference | Software
Developer's Notes | Licensing | Authors | Pikt-Users | Pikt-Workers | Related Projects | Site Index | Privacy Policy | Contact Us
Page best viewed at 1024x768 or greater.   Page last updated 2019-01-12.   This site is PIKT® powered.
Copyright © 1998-2019 Robert Osterlund. All rights reserved.
Home FAQ News Intro Samples Tutorial Reference Software
PIKT Logo
PIKT Page Title
View sample
systems revival
script macro