Site-Wide System Scanning

(NOTE:  Some of the techniques shown or described on this page--marked in purple--require new features in the latest official PIKT 1.19.0 release (pikt-current.tar.gz) that are unavailable in any previous version.)

Actually, in addition to the Urgent and Critical alert groups, we make use of DmesgScan in one other context within alerts.cfg:

///////////////////////////////////////////////////////////////////////////////

ScanDmesg

        status          active
        level           info

        scripts         DmesgScan

///////////////////////////////////////////////////////////////////////////////
We have similar brief stanzas in alerts.cfg, and a macro in macros.cfg referencing them all:
scripts
                DownSystems
                DownServers
                DownClients

                DownRpc
                DownRpcServers
                DownRpcClients

                SysReboots

                ScanDmesg
                ScanSyslogCritical
                ScanSyslogKernel

                LoadAverages
                Processes
                Zombies
                RunawayCPUProcs
                RunawayMEMProcs
                CPUUsage
                ...

                section DownProd
                section MissingAcmeProcesses
                ...
We also make a few additions and/or adjustments to scripts, for example to the =service_downage() script macro:
service_downage(S, I, T, A, M)

        ...

        rule
                if $alert() =~~ "client"
                        if " $missioncritical " =~ " $host "
                                set $state = %state
                                next
                        fi
                fi

        rule
                if $alert() =~ "(A)"
                        output $host
                        output =newline
                fi

        ...

        rule
                if (T)
                        if $alert() =~ "(A)"
                                output "$host's (S) services are down ((M))"
                                output =newline
                        else
                                ...
                        fi
                        next
                fi
Note that we have
  • added a new macro argument, (A), to the =service_downage() script
  • added a new rule to skip mission-critical systems when running client scripts, such as DownRpcClients
  • added a new rule to output the $host and a newline, depending on the alert context
  • added a new if block to report to screen (not send e-mail via 'output mail') if a host's services are down
Note, too, that we would have to change the =service_downage() invocations, adding the new fourth argument, (A), as in:
#if piktmaster

RpcDown

        =service_downage(RPC, =piktc -L +H pikt -H down sick, =rpcfail($host),
                         DownRpc|DownRpcServers|DownRpcClients,
                         rpcinfo -p failure)

#endif
We install the listed scripts (with their revised script macro definitions and macro invocations) on the piktmaster and elsewhere using the piktc command
# piktc -ivU +A =scripts -H down sick
Now, at the piktmaster system, we can interactively issue commands to check on the health of our systems, for example:

A pikt command, run on the piktmaster, that polls all servers, reporting if any of them are down:

# pikt -U +A DownServers 2>&1 | tee /tmp/DownServers.out
A pikt command, run on the piktmaster, that polls all clients, reporting if the RPC services on any client system are down:
# pikt -U +A DownRpcClients 2>&1 | tee /tmp/DownRpcClients.out
(RPC services are essential to PIKT operations, so if RPC is down on any client system, the piktmaster can't communicate with that system, and we would want to know that.)

A piktc command, also run on the piktmaster, to run a Pikt script remotely on any PIKT slave system, for example:

# piktc -xvU +C "/pikt/bin/pikt -U +A ScanDmesg" +H helsinki
And many other single-purpose scanning scripts to look for problems on any system within our PIKT network.

In fact, we have aggregated all of these system check scripts into one uber-script, /pikt/lib/programs/scansys.sh:

#!/bin/bash

function divider () {
        echo ""
        echo "###############################################################################"
        echo ""
}

function section () {
        divider
        echo $1
        echo ""
}

if [ "$1" = "-s" ]; then
        SYS=server
elif [ "$1" = "-c" ]; then
        SYS=client
else
        SYS=pikt
fi

section DownSystems
if [ $SYS = "server" ]; then
        /pikt/bin/pikt -U +A DownServers 2>&1 | tee /tmp/DownSystems.out
elif [ $SYS = "client" ]; then
        /pikt/bin/pikt -U +A DownClients 2>&1 | tee /tmp/DownSystems.out
else
        /pikt/bin/pikt -U +A DownSystems 2>&1 | tee /tmp/DownSystems.out
fi

section DownRpc
if [ $SYS = "server" ]; then
        /pikt/bin/pikt -U +A DownRpcServers 2>&1 | tee /tmp/DownRpc.out
elif [ $SYS = "client" ]; then
        /pikt/bin/pikt -U +A DownRpcClients 2>&1 | tee /tmp/DownRpc.out
else
        /pikt/bin/pikt -U +A DownRpc 2>&1 | tee /tmp/DownRpc.out
fi

section SysReboots
/pikt/bin/piktc -xU +C "hostname; /pikt/bin/pikt -U +A SysReboots; echo"
  +H $SYS -H down sick 2>&1 | tee /tmp/SysReboots.out

section ScanDmesg
/pikt/bin/piktc -xU +C "hostname; /pikt/bin/pikt -U +A ScanDmesg; echo"
  +H $SYS -H down sick 2>&1 | tee /tmp/ScanDmesg.out

...

if [ $SYS = "server" -o $SYS = "pikt" ]; then

        section DownProd
        /pikt/bin/pikt -U +A DownProd 2>&1 | tee /tmp/DownProd.out

        section MissingAcmeProcesses
        /pikt/bin/piktc -xU +C "hostname; /pikt/bin/pikt -U +A MissingAcmeProcesses; echo"
                        +H acmeserver -H down sick 2>&1 | tee /tmp/MissingAcmeProcesses.out

        ...

fi

divider

exit
Now we can issue the command, in this case for server systems (specified in the PIKT systems.cfg file):
# /pikt/lib/programs/scansys.sh -s 2>&1 | tee /tmp/scansys.servers.out
or the command, in this case for client systems (defined in systems.cfg):
# /pikt/lib/programs/scansys.sh -c 2>&1 | tee /tmp/scansys.clients.out
or this command, without any program argument, for all systems:
# /pikt/lib/programs/scansys.sh 2>&1 | tee /tmp/scansys.clients.out
and get output like this:

###############################################################################

DownSystems

hamburg

salonika

valencia

oslo is down, or off the network (ping failure)

rome

...

###############################################################################

DownRPC

hamburg

salonika

valencia

oslo
oslo's RPC services are down (rpcinfo -p failure)

rome

...

###############################################################################

SysReboots

hamburg

salonika
reboot   system boot  2.6.17-gentoo-r5 Thu May 10 10:31          (13:59)

valencia

oslo

rome

...

###############################################################################

ScanDmesg

hamburg

salonika

valencia

...

glasgow
usb 2-1.2: reset low speed USB device using ohci_hcd and address 7
usb 2-1.2: reset low speed USB device using ohci_hcd and address 9
usb 2-1.2: reset low speed USB device using ohci_hcd and address 11
hub 2-1:1.0: cannot reset port 3 (err = -110)
hub 2-1:1.0: cannot reset port 3 (err = -110)

...

###############################################################################

MissingAcmeProcesses

...

granada

liverpool

zurich
ltdapi
ltdsrv
acmesrv

...

###############################################################################
With all of this, we have a handy means of scanning all of our systems and all log files at will, whenever we want (such as when we first come into work each day, or leave work for the day, or especially when leaving for the weekend) or whenever there is a need to (such as when there are unspecified problems reported somewhere on our network).

So, you can see that Pikt scripts are not just for automated problem reporting (and fixing) by way of piktd.  We can also run them interactively, either singly, on any PIKT system, or en masse, on the piktmaster, as described above.  We'll have more to say about using PIKT interactively at the command line in Enhancing the Command Line.

prev page 1st page next page
 
Home | FAQ | News | Intro | Samples | Tutorial | Reference | Software
Developer's Notes | Licensing | Authors | Pikt-Users | Pikt-Workers | Related Projects | Site Index | Privacy Policy | Contact Us
Page best viewed at 1024x768 or greater.   Page last updated 2008-02-27.   This site is PIKT® powered.
PIKT® is a registered trademark of the University of Chicago.   Copyright © 1998-2008 Robert Osterlund. All rights reserved.
Home FAQ News Intro Samples Tutorial Reference Software
PIKT Logo
PIKT Page Title
View sample
revived systems
Pikt script