Temporary Glitches

[posted 2000/05/10]

Successfully running validation tests on new PIKT code is not enough (because your results are only as good as your tests).  One has to run the new code in an actual production environment as well.

We're running the latest pikt-1.10.0pre5 code on our master machine and are suddenly seeing these alerts:

                                PIKT ALERT
                         Wed May 10 12:55:02 2000
                                  vienna

DEBUG:
    AlertChkCritical
        Detect PIKT alert/service daemon failures/restarts/redundancies

        piktd appears to be hung/dead on rheims
        piktd appears to be hung/dead on nantes
        ...

It seems like the piktd daemon is dead everywhere!

I cp'ed Debug.alt to Test.alt, edited out every alarm script except for AlertChkCritical, then added another output line to the last rule of AlertChkCritical:

        rule
                do #split ( $time , ":" )
[new -->]       output "$time, $[1], $[2], $[3]"
                set #t = #datevalue ( #yrnow - #if ( #monnow == 1 &&
                                      $mon eq "Dec" , 1 , 0 ) ,
                                      #monthnumber ( $mon ) , #date ) +
                                      #timevalue ( #val ( $[1] ) , #val ( $[2] ) ,
                                                   #val ( $[3] ) )
                if #timenow - #t > #hrs * 60 * 60
                output mail "piktd appears to be hung/dead on $sys"
                endif

After doing a 'pikt +A Test', I got these results:

12:41:00, :, ÿ, ÿ
piktd appears to be hung/dead on rheims
12:42:00, :, ÿ, ÿ
piktd appears to be hung/dead on nantes
...

I recognized this instantly as a bug in the revised #split() implementation. I'll fix the bug later this week.  In the meantime, we don't want to continue seeing bogus messages about supposed piktd failure.  So, I commented out the last rule of AlertChkCritical and added a =remind() macro to remind me a week from now to reactivate the rule:

        rule    // if it's been more than #hrs hours since the last piktd_log
                // entry, piktd is not logging, hence appears to be hung/dead;
        =remind(2000, 5, 17, "REACTIVATE FINAL ALERTCHKCRITICAL RULE AFTER DEBUG")
//              do #split($time, ":")
//              set #t =   #datevalue(#yrnow - #if(#monnow == 1 &&
                                                   $mon eq "Dec", 1, 0),
                                      #monthnumber($mon), #date)
//                       + #timevalue(#val($[1]), #val($[2]), #val($[3]))
//              if #timenow - #t > #hrs*60*60
//                      output mail "piktd appears to be hung/dead on $sys"
//              endif

(I then reinstalled with 'piktc -iv +A Debug +H vienna', it perhaps goes without saying.)

Consider doing something like this when faced with similar temporary glitches.

For more examples, see Developer's Notes.

 
Home | FAQ | News | Intro | Samples | Tutorial | Reference | Software
Developer's Notes | Licensing | Authors | Pikt-Users | Pikt-Workers | Related Projects | Site Index | Privacy Policy | Contact Us
Page best viewed at 1024x768 or greater.   Page last updated 2019-01-12.   This site is PIKT® powered.
Copyright © 1998-2019 Robert Osterlund. All rights reserved.
Home FAQ News Intro Samples Tutorial Reference Software
PIKT Logo
PIKT Page Title
See how to
automatically
delete junk files