High Load Averages

In this example, we report perilously high load averages.

The LoadAverage script might send an alert message like the following:

                                PIKT ALERT
                         Tue Feb 26 16:54:09 2002
                                 murmansk

URGENT:
    LoadAverage
        Report perilously high system load averages

        uptime - 19:40:10 up 120 days,  4:46,  1 user,  load average: 10.75, 6.35, 4.22

        top - 19:40:13 up 120 days,  4:46,  1 user,  load average: 10.75, 6.35, 4.22
        Tasks:  91 total,   2 running,  89 sleeping,   0 stopped,   0 zombie
        Cpu(s):  0.0% us,  0.4% sy,  0.0% ni, 96.9% id,  2.6% wa,  0.0% hi,  0.1% si
        Mem:   2058228k total,  2048120k used,    10108k free,   104816k buffers
        Swap:  4016168k total,      240k used,  4015928k free,  1756208k cached

          PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
        22516 root      15   0  8032  748  616 R   36  0.0   0:30.33 tar                
          177 root      15   0     0    0    0 D    2  0.0  31:32.46 kswapd0            
         3714 root      10  -5     0    0    0 D    2  0.0  42:24.87 kjournald          
            1 root      16   0  3632  596  508 S    0  0.0   0:03.18 init
        ...

The script follows.

LoadAverage

        init
                status =piktstatus
                level =piktlevel
                task "Report perilously high system load averages"
                input proc "=uptime"
                dat $ky  $-3    // invariant key, "average:",
                                // as in "load average:"
                dat $a1  $-2
                dat $a5  $-1
                dat $a15 $
                keys $ky

        begin
#if london | copenhagen
                if $alert() =~ "EMERGENCY"
                        set #lalim = 15.0
                elsif $alert() =~ "Urgent"
                        set #lalim = 10.0
#else
                if #pid("dd") > 0
                        set #lalim = 15.0
                elsif $alert() =~ "EMERGENCY"
                        set #lalim = 10.0
                elsif $alert() =~ "Urgent"
                        set #lalim = 5.0
#endif
                else // if $alert() =~ "LoadAverages"
                        set #lalim = 1.0
                fi

        rule    // dispose of trailing comma, and set value
                set #la1 = #value($chop($a1,1))

#ifdef debug
        rule
                output "\#la1 is $text(#la1), \#lalim is $text(#lalim)"
#endifdef

        rule    // if exceeds threshold
                if #la1 >= #lalim
                        // always report if manual LoadAverages script
                        if $alert() eq "LoadAverages"
                                output $trim($inline)
                        else
                                // unless load avg is rising (is at least
                                // one level higher than before)
#if missioncritical
                                =hourly(if #trunc(#la1) > #trunc(%la1)
                                        output mail "uptime - $trim($inline)"
                                        output mail =newline =toptop(1000)
                                        fi, )
#else
                                =every_four_hours(if #trunc(#la1) > #trunc(%la1)
                                                  output mail "uptime - $trim($inline)"
                                                  output mail =newline =toptop(1000)
                                                  fi, )
#endif
                        fi
                fi

        rule    // only log load averages for Urgent alerts
                if $alert() eq "Urgent"
                        =output_alarm_log($inlin)
                fi

This is just one program example.  You could add rules, or write new scripts, for example to:  report and possibly kill runaway processes, report unusually high counts of per-user processes, report and possibly kill forbidden processes, report extremely high numbers of zombie and defunct processes, log special process accounting data, etc.

Open Hand For more examples, see Samples.

 
Home | FAQ | News | Intro | Samples | Tutorial | Reference | Software
Developer's Notes | Licensing | Authors | Pikt-Users | Pikt-Workers | Related Projects | Site Index | Privacy Policy | Contact Us
Page best viewed at 1024x768 or greater.   Page last updated 2008-03-27.   This site is PIKT® powered.
PIKT® is a registered trademark of the University of Chicago.   Copyright © 1998-2008 Robert Osterlund. All rights reserved.
Home FAQ News Intro Samples Tutorial Reference Software
PIKT Logo
PIKT Page Title
View sample
restart syslog
Pikt script