Systems Down Macro

The systems_down_alarms_macros.cfg is a script macro for reporting when systems go down or off the network and, optionally, when they come back up.

///////////////////////////////////////////////////////////////////////////////
//
// systems_down_alarms_macros.cfg
//
///////////////////////////////////////////////////////////////////////////////
 
systems_down(SYS)
	init
		status =piktstatus
		level =piktlevel
		task "Report systems down or off the network"
		input proc "(SYS)"
		dat $host 1
		keys $host

	begin
                // initialize the mission-critical systems list
		=initmisscrit
		if $alert() =~ "DownSystems|DownServers|DownClients"
			set #interactive = #true()
		else
			set #interactive = #false()
		fi

	rule	// determine if host is mission-critical
		=setmisscrit

	rule
		=bypass_server_reboots

	rule	// if high-level alert, bypass non mission-critical systems
		if =highlevelalert
			if ! #misscrit
				set $state = %state
				next
			fi
		fi

	rule	// if low-level alert, bypass mission-critical systems
		if ! =highlevelalert
			if #misscrit
				set $state = %state
				next
			fi
		fi

#ifdef debug
	rule
		output $host
#elsedef
	rule
		if #interactive
			output $host
			output =newline
		fi
#endifdef

	rule	// initialize messages
		set $dnmsg = "$host is down, or off the network (ping failure)"
		set $upmsg = "$host is (back) up"

	rule	// initially, assume system is up
		set $state = "+"

	rule	// ping the host
		// do the initial poll quickly
		if =pingfail($host, 1, 1)
			// if the first ping failed, try again with more
			// retries and longer timeouts
			if =pingfail($host, 3, 5)
				set $state = "-"
			fi
		fi

	rule
		if $state eq "+"
#ifdef verbose
			// for high-level alerts, report if systems back up
			if =highlevelalert
				// if state was "-", is now "+", report change
				if #defined(%state) && $state ne %state
					output mail $upmsg
				fi
			fi
#endifdef
			next
		fi

	// only down hosts after this point

	rule	// report downages for interactive scripts
		if #interactive
			output $dnmsg
			output =newline
			next
		fi

	// non-interactive scripts after this point

	rule	// page, but only periodically, if highest-level alert
		if =highestlevelalert
			=hourly(=page($dnmsg, =allpager, =always), )
		fi

	rule	// for all systems, always report new downages
		if $state ne %state
			output mail $dnmsg
			next
		fi

	rule	// for missioncritical systems, report continuing downages,
		// but only periodically
		if #misscrit
			=every_four_hours(output mail $dnmsg, )
			if =highestlevelalert
				=hourly(=output_other_mail(SYSDOWN, 'PIKT SysDown', =sysadmins, $dnmsg), )
			fi
			next
		fi

	end
		quit
 
///////////////////////////////////////////////////////////////////////////////

You might invoke the =systems_down() macro in your alarms.cfg file thusly:

///////////////////////////////////////////////////////////////////////////////
//
// downage_alarms.cfg
//
///////////////////////////////////////////////////////////////////////////////
 
#if piktmaster
 
SysDown
        =systems_down(=piktc -L -H down)

#endif

///////////////////////////////////////////////////////////////////////////////

#if piktmaster | piktmistress

ACDown

        =systems_down(=piktc -L +H ac)

#endif

///////////////////////////////////////////////////////////////////////////////

#if piktmaster | piktmistress

PowerDown

        =systems_down(=piktc -L +H power)

#endif
 
///////////////////////////////////////////////////////////////////////////////

where 'down' is a host group of known down systems (specified in down_systems.cfg), 'ac' is a host group of networked air-conditioning systems, and 'power' is a host group of networked power supply systems.

Since monitoring air conditioning and power unit downages is so vital, we run these scripts on both the piktmaster system as well as the so-called 'piktmistress', an alias (specified in systems.cfg for a system that backs up the piktmaster for certain crucial functions.

In the SysDown macro invocation, the macro argument '=piktc -L -H down' is a call to piktc to list all systems except for known down systems.  Similarly, the macro arguments in ACDown and PowerDown has piktc list the air conditioning and power systems respectively.

Output from this script might look like, for example:

URGENT:
    SysDown
       Report systems down or off the network

       oslo is down, or off the network (ping failure)
       manchester is down, or off the network (ping failure)
       kiev is (back) up

For more examples, see Samples.

 
Home | FAQ | News | Intro | Samples | Tutorial | Reference | Software
Developer's Notes | Licensing | Authors | Pikt-Users | Pikt-Workers | Related Projects | Site Index | Privacy Policy | Contact Us
Page best viewed at 1024x768 or greater.   Page last updated 2019-01-12.   This site is PIKT® powered.
Copyright © 1998-2019 Robert Osterlund. All rights reserved.
Home FAQ News Intro Samples Tutorial Reference Software
PIKT Logo
PIKT Page Title
View sample
reset iptables
script config file