High Load Averages

In this example, we report perilously high load averages.

The LoadAverage script might send an alert message like the following:

                                PIKT ALERT
                         Tue Feb 26 16:54:09 2002
                                 murmansk

URGENT:
    LoadAverage
        Report perilously high system load averages

        uptime - 19:40:10 up 120 days,  4:46,  1 user,  load average: 10.75, 6.35, 4.22

        top - 19:40:13 up 120 days,  4:46,  1 user,  load average: 10.75, 6.35, 4.22
        Tasks:  91 total,   2 running,  89 sleeping,   0 stopped,   0 zombie
        Cpu(s):  0.0% us,  0.4% sy,  0.0% ni, 96.9% id,  2.6% wa,  0.0% hi,  0.1% si
        Mem:   2058228k total,  2048120k used,    10108k free,   104816k buffers
        Swap:  4016168k total,      240k used,  4015928k free,  1756208k cached

          PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
        22516 root      15   0  8032  748  616 R   36  0.0   0:30.33 tar                
          177 root      15   0     0    0    0 D    2  0.0  31:32.46 kswapd0            
         3714 root      10  -5     0    0    0 D    2  0.0  42:24.87 kjournald          
            1 root      16   0  3632  596  508 S    0  0.0   0:03.18 init
        ...

The script follows.

LoadAverage

        init
                status =piktstatus
                level =piktlevel
                task "Report perilously high system load averages"
                input proc "=uptime"
                dat $ky  $-3    // invariant key, "average:",
                                // as in "load average:"
                dat $a1  $-2
                dat $a5  $-1
                dat $a15 $
                keys $ky

        begin
#if london | copenhagen
                if $alert() =~ "EMERGENCY"
                        set #lalim = 15.0
                elsif $alert() =~ "Urgent"
                        set #lalim = 10.0
#else
                if #pid("dd") > 0
                        set #lalim = 15.0
                elsif $alert() =~ "EMERGENCY"
                        set #lalim = 10.0
                elsif $alert() =~ "Urgent"
                        set #lalim = 5.0
#endif
                else // if $alert() =~ "LoadAverages"
                        set #lalim = 1.0
                fi

        rule    // dispose of trailing comma, and set value
                set #la1 = #value($chop($a1,1))

#ifdef debug
        rule
                output "\#la1 is $text(#la1), \#lalim is $text(#lalim)"
#endifdef

        rule    // if exceeds threshold
                if #la1 >= #lalim
                        // always report if manual LoadAverages script
                        if $alert() eq "LoadAverages"
                                output $trim($inline)
                        else
                                // unless load avg is rising (is at least
                                // one level higher than before)
#if missioncritical
                                =hourly(if #trunc(#la1) > #trunc(%la1)
                                        output mail "uptime - $trim($inline)"
                                        output mail =newline =toptop(1000)
                                        fi, )
#else
                                =every_four_hours(if #trunc(#la1) > #trunc(%la1)
                                                  output mail "uptime - $trim($inline)"
                                                  output mail =newline =toptop(1000)
                                                  fi, )
#endif
                        fi
                fi

        rule    // only log load averages for Urgent alerts
                if $alert() eq "Urgent"
                        =output_alarm_log($inlin)
                fi

This is just one program example.  You could add rules, or write new scripts, for example to:  report and possibly kill runaway processes, report unusually high counts of per-user processes, report and possibly kill forbidden processes, report extremely high numbers of zombie and defunct processes, log special process accounting data, etc.

For more examples, see Samples.

 
Home | FAQ | News | Intro | Samples | Tutorial | Reference | Software
Developer's Notes | Licensing | Authors | Pikt-Users | Pikt-Workers | Related Projects | Site Index | Privacy Policy | Contact Us
Page best viewed at 1024x768 or greater.   Page last updated 2019-01-12.   This site is PIKT® powered.
Copyright © 1998-2019 Robert Osterlund. All rights reserved.
Home FAQ News Intro Samples Tutorial Reference Software
PIKT Logo
PIKT Page Title
View sample
high
load averages
Pikt script