Runaway Process Macro

The runaway_process_alarms_macros.cfg is a script macro to report runaway processes--processes using excessive percentages of either CPU or MEM (memory).

runaway_process(S, s, F, LC, LU, LE, B)
 
        init
                status =piktstatus
                level =piktlevel
                task "Report runaway processes using a high % of the (S)"
                input proc "=psls | =trim(160) | =awk '\$(F)>=(LC)'"
                filter "=egrep '^[A-Za-z0-9\-]+[ ]+[0-9]+ ' | =zapinterpreters"
                dat #cpu  3
                dat #mem  4
                dat $proc 11
                keys $proc
        begin
                set #doheader = #true()
                doexec wait "=top -b -n1 -d1 2>/dev/null > =hstdir/log/top." . $alarm()
                doexec wait "=psall > =hstdir/log/ps." . $alarm()

#ifdef debug
        rule
                output mail "$text(#cpu,1) $text(#mem,1) $basename($proc)"
#endifdef

        rule    // permanent bypasses
                if $proc =~~ "=nonesuch"
                        next
                fi

        rule    // special bypasses
                if $proc =~~ "(B)"
                        next
                fi

        rule
                if $alert() =~ "RED|EMERGENCY"
                        if #(s) < (LE)
                                next
                        fi
                fi

        rule
                if $alert() =~ "Urgent"
                        if #(s) < (LU)
                                next
                        fi
                fi

        rule
#if missioncritical
                if $alert() =~ "RED|EMERGENCY"
                        =periodically(if #doheader
                                      output mail $command("=psall\ | =head -n 1")
                                      set #doheader = #false()
                                      fi
                                      output mail $inlin, , 60)
                elsif $alert() =~ "Urgent"
                        =periodically(if #doheader
                                      output mail $command("=psall\ | =head -n 1")
                                      set #doheader = #false()
                                      fi
                                      output mail $inlin, , 120)
                else
                        =periodically(if #doheader
                                      output mail $command("=psall\ | =head -n 1")
                                      set #doheader = #false()
                                      fi
                                      output mail $inlin, , 240)
                fi
#else
                if $alert() =~ "RED|EMERGENCY"
                        =periodically(if #doheader
                                      output mail $command("=psall\ | =head -n 1")
                                      set #doheader = #false()
                                      fi
                                      output mail $inlin, , 120)
                elsif $alert() =~ "Urgent"
                        =periodically(if #doheader
                                      output mail $command("=psall\ | =head -n 1")
                                      set #doheader = #false()
                                      fi
                                      output mail $inlin, , 240)
                else
                        =daily(if #doheader
                               output mail $command("=psall | =hea\d -n 1")
                               set #doheader = #false()
                               fi
                               output mail $inlin, )
                fi
#endif

        end
                if ! #doheader
                        output mail =newline
                        =outputfile(mail, "=hstdir/log/top." . $alarm())
                        output mail =newline
                        =outputfile(mail, "=hstdir/log/ps." . $alarm())
                fi
                quit

You might invoke the =runaway_process() macro in your alarms.cfg file thusly:

///////////////////////////////////////////////////////////////////////////////
 
RunawayCPUProcs
 
        =runaway_process(CPU, cpu, 3, 90.0, 99.0, 100.0, =nonesuch)
 
///////////////////////////////////////////////////////////////////////////////
 
RunawayMEMProcs
 
        =runaway_process(MEM, mem, 4, 40.0, 50.0, 60.0, =nonesuch)
 
///////////////////////////////////////////////////////////////////////////////

Note that we can invoke either of these scripts in various alert groups throughout our alerts.cfg, and the script will adapt (by means of, for example, 'if $alert() =~ "Urgent" ... fi' to its alert level.

In an Urgent group, output from the RunawayMEMProcs script might look like, for example:

URGENT:
    RunawayMEMProcs
        Report runaway processes using a high % of the MEM
 
        USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
         
        kik      27978  7.7 51.0 559692 528512 ?       S    08:04  34:53 /usr/local/acme/bin/prodtool
         
        top - 15:32:07 up 30 days,  6:51,  2 users,  load average: 0.27, 0.16, 0.10
        Tasks: 104 total,   2 running, 102 sleeping,   0 stopped,   0 zombie
        Cpu(s):  1.4% us,  0.2% sy,  0.0% ni, 98.0% id,  0.4% wa,  0.0% hi,  0.0% si
        Mem:   1034932k total,   984324k used,    50608k free,     5604k buffers
        Swap:  4016168k total,    28252k used,  3987916k free,   119952k cached
         
          PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
        28044 kik       16   0  271m 168m  23m R   95 16.7  42:51.45 firefox-bin        
         7940 root      15   0  187m  52m 4260 S    6  5.2 151:00.56 X                  
            1 root      16   0  1468  472  448 S    0  0.0   0:01.92 init               
            2 root      RT   0     0    0    0 S    0  0.0   0:00.11 migration/0        
            3 root      34  19     0    0    0 S    0  0.0   0:00.01 ksoftirqd/0   
        [...]
        27978 kik       17   0  546m 516m  17m S    0 51.1  34:53.39 prodtool
        [...]

Note also how, in addition to reporting the runaway process, we also report top and ps output, in order to give context to the runaway.

For more examples, see Samples.

 
Home | FAQ | News | Intro | Samples | Tutorial | Reference | Software
Developer's Notes | Licensing | Authors | Pikt-Users | Pikt-Workers | Related Projects | Site Index | Privacy Policy | Contact Us
Page best viewed at 1024x768 or greater.   Page last updated 2019-01-12.   This site is PIKT® powered.
Copyright © 1998-2019 Robert Osterlund. All rights reserved.
Home FAQ News Intro Samples Tutorial Reference Software
PIKT Logo
PIKT Page Title
View sample
kill idle
user sessions
Pikt script