Our mail situation is quieting down, thankfully. We still experience an "imapd storm" every couple of days, but the Pikt script described last week is keeping them manageable.
One new problem, though, is with our NIS service. Our NIS master, a Sparc 10, managed just fine when the passwd map was ~4,000 lines, but with the addition of 32,000+ accounts, that system has been huffing and puffing making the maps, and sometimes the maps get corrupted, or don't get pushed to the slave servers properly, or ???
One fix for this NIS malfunction is to move NIS master service to a new machine. That's in the cards, but not until interim week a month and a half now. For now, we just can't risk further, possibly worse problems by messing with our basic NIS setup.
Here is one of those Pikt alarms I've been thinking about for quite a while but didn't get around to implementing until the latest crises forced me to:
NISChkEmergency init status active level emergency task "Report NIS service malfunctions" begin set $state = "+" // initialize pause #random(60) // up to a 1-minute pause so we don't // hit the NIS servers all at once // (although alert timing drift may // already have solved this problem) // add more cases as warranted // this is a required test, because nextuid should be the // last entry in our NIS passwd file set $c = $command("=ypmatch nextuid passwd 2>&1") if $c !~ "^nextuid:" output mail $c set $state = "-" fi set $c = $command("=ypmatch brahms passwd 2>&1") if $c !~ "^brahms:" output mail $c set $state = "-" fi set $c = $command("=ypmatch 508 passwd.byuid 2>&1") if $c !~ "^brahms:" output mail $c set $state = "-" fi set $c = $command("=ypmatch johannes.brahms aliases 2>&1") if $c !~ "^brahms@" output mail $c set $state = "-" fi set $c = $command("=ypmatch hamburg hosts 2>&1") if $c !~ "^111.222" output mail $c set $state = "-" fi #if nisserver if $command("=ypcat passwd | =tail -10 | =wc -l") !~ "10" #else if $command("=ypcat passwd | =head -20 | =wc -l") !~ "20" #endif set $c = $command("=ypcat passwd 2>&1") // get first // line (err // msg) only output mail $c set $state = "-" fi # if nismaster do #split($command("=wc -l /etc/NIS/passwd")) set #lcurr = #val($1) do #split($command("=wc -l /etc/NIS/passwd.nightly.backup")) set #lback = #val($1) if #lcurr < #lback - 10 // allow for limited truncation output mail "NIS passwd file may be too small! Current line count is $text(#lcurr), was $text(#lback) in passwd.nightly.backup." set $state = "-" fi # endif end if $state eq "-" set $server = $command("=ypwhich 2>&1") output mail "NIS ON $upper($server) IS SICK/DOWN!" fi # ifdef page // page just once per downage if $state eq "-" && ( ! #defined(%state) || $state ne %state ) // exec wait "echo 'NIS on $server is sick/down' | // =mailx -s 'NIS on $server is sick/down' // pagebrahms\@egbdf" exec wait "echo 'NIS on $server is sick/down' | =mailx -s 'NIS on $server is sick/down' =pagesysadmins" endif # endifdef
This should be self-explanatory, except possibly for =pagesysadmins, which is a macro defined in macros.cfg as
pagesysadmins pagedonizetti\@egbdf pagebrahms\@egbdf pageliszt\@egbdf
and "pagebrahms\@egbdf", in turn, is an email alias resolving to my pager number.
I've thought of ways we could, if the bound-to NIS server is sick/down, auto-force a rebinding to an alternative server. (We can't do that easily for reasons I won't go into here.) Like most Pikt scripts, we'll revise and tweak this over time.
Got anything interesting/new you would like to share?
For more examples, see Developer's Notes.