We mostly organize our alarms into alert groupings corresponding to syslog levels, the familiar Emergency (syslog's "emerg"), Urgent ("alert"), Critical ("crit"), and so on. We try to confine relatively unimportant stuff to Warning, Notice, and Info. In Warning, we put stuff that should be looked at daily but needn't be acted on immediately. In Notice and Info, we put stuff that may be safely ignored. In the higher level alerts go things that should not be ignored or postponed.
So, quite often on busy days, I routinely simply delete Notice and Info alert mail messages without looking at them. I glance through the Warnings, but most of the time never attend to them by the end of the day. Usually, then, one of the first things I do the next day is to delete the previous day's Warning messages.
A group of warnings I've been in the habit of ignoring lately are like so:
WARNING: UserDirsChkWarning Check for userdir anomalies /pub/mus_disk_1/couperin may be orphaned, last mod was 424 days ago nis passwd entry: couperin:v5KnGT83pRJZ6:453:45:Frances Couperin: /home/kiev/couperin:/local/bin/ksh /pub/mus_disk_1/liszt may be orphaned, last mod was 695 days ago nis passwd entry: liszt:xx58zK9TZyVH2:7204:45:Franz Liszt: /home/kiev/liszt:/local/bin/tcsh mail file: -rw-rw---- 1 liszt mail 5117 Aug 12 18:54 /var/mail/liszt ...
Not only is it getting to be a nuisance seeing the same messages day after day (I have #undef'ed stifle), the number of such messages is growing, and for security and other reasons (like reclaiming idle disk space), we really should dispose of orphaned user accounts.
This is actually a very complex matter, determining what accounts are safe to retire. If Prof. Bigshot only uses his account to pop mail off the server down to his PC mail client, if he hasn't logged in for several years, if I blow away his "orphaned" account and he starts losing e-mail, there will be hell to pay! Or when users leave this place, we extend their account privileges for a certain time before cancelling them. One problem is that for some account classes, it's not very clear how long the grace period should be, and for others the grace period seems to be "indefinitely" or "forever".
I could go on describing the complexities, but suffice it to say that keeping track of 3,500+ current user accounts (not to mention 30,000+ alumni accounts) is a constant challenge, one I've been reluctant to face head on.
Today I implemented something I've been meaning to do for a long time-- a new PIKT script, FindAuthUsersAdmin, and supporting configurations designed to cut down on some of the noise.
I know that several of our e-mail lists are closely managed and kept relatively up to date. If Prof. Bigshot moves on to MIT, he goes off the faculty list soon thereafter. Graduating MBA students are purged from the mba-students list several months after graduation. Knowing that our e-mail lists are up-to-date and accurate, I coded this new script:
/////////////////////////////////////////////////////////////////////////////// #ifndef generic #if moscow FindAuthUsersAdmin // from the actively maintained mail lists, // generate a file listing all authorized // user accounts init status active level debug task "Update the list of authorized user accounts." input proc "echo 'mba-students phd-students faculty staff computing-services labs' | =onecol" // input proc "echo 'faculty' | =onecol" // for testing begin if #fopen(AUTHUSERS, "=authusers", "w") == #err() output mail "Can't open =authusers file!" quit endif rule // give priority to new mailman lists over old legacy // majordomo lists if -d "/opt/mailman/lists/$inlin" do #popen(MEMBERS, "/opt/mailman/bin/list_members $inlin", "r") elseif -f "/opt/mail/lists/$inlin" do #popen(MEMBERS, "=cat /opt/mail/lists/$inlin", "r") else output mail "Can't determine $inlin list membership!" next endif // the resolution algorithm is not perfect but is good enough // for our purposes while #read(MEMBERS) > 0 #ifdef debug output "$rdlin" #endifdef set #bypass = #false() // dispose of "@..." set #i = #index($rdlin, "\@") if #i if $rdlin !~ "\@egbdf" #ifdef debug output mail "bypassing $rdlin: non-egbdf address" #endifdef // set #bypass = #true() cont // bypass if not egbdf address endif set $user = $left($rdlin, #i-1) else set $user = $rdlin endif // keep resolving dotted alias until we have // a simple user name set $aliases = " " while $user =~ "\\." #ifdef debug output "$user" #endifdef if #find($aliases, " $user ") output mail "Can't resolve alias $user: circular reference" set #bypass = #true() break else set $aliases .= "$user " endif set $match = $command("=ypmatch $user aliases 2>&1") if $match !~ "no such key" set $user = $match // dispose of "@..." that might result // from resolving dotted alias set #i = #index($user, "\@") if #i if $user !~ "\@egbdf" #ifdef debug output mail "bypassing $user: non-egbdf address" #endifdef set #bypass = #true() break // bypass if not egbdf address endif set $user = $left($user, #i-1) endif else output mail "Can't resolve alias $user: no such key" set #bypass = #true() break endif endwhile if #bypass cont endif // dispose of "@..." that might result // from resolving dotted alias set #i = #index($user, "\@") if #i if $user !~ "\@egbdf" cont // skip if not egbdf address endif set $user = $left($user, #i-1) endif do #write(AUTHUSERS, $user) endwhile do #pclose(MEMBERS) end do #fclose(AUTHUSERS) #endif // moscow #endifdef // generic ///////////////////////////////////////////////////////////////////////////////
Basically, what this script does it to take our all-inclusive mail lists, some of them spit out by GNU Mailman's list_members command and others in the form of Majordomo flat text files, and convert addresses like
francois.couperin anton.bruckner@egbdf joplin email@example.com firstname.lastname@example.org ...
couperin bruckner joplin [no "debussey" here because claude has moved on] copland ...
Some thoughts about this script: In several places (determining if an address is an "@egbdf" address), this might benefit from user-definable functions. The Pikt script language will have them some day. The script is not perfect but is "good enough". I debated whether to write the thing in Perl but decided that was overkill. It was a close call, though.
=authusers is a macro defined in macros/piktfiles_fil_macros.cfg as:
/usr/local/etc is a universal NFS-shared directory tree at our site, so every machine has access to the same authusers file.
I added the following to our alerts.cfg:
#ifndef generic #if moscow Admin timing 45 3 * * 1 // only do this once weekly // early in the monday am mailcmd "=piktmail -s 'PIKT Alert on =pikthostname: Admin' =piktadmin" scripts FindAuthUsersAdmin #endif // moscow #endifdef // generic
Note that I run this only once a week. It's not crucial keeping the authusers file constantly up-to-date. The script processes 3,500+ addresses in just a couple of minutes, so I could do daily updates if I really wanted to.
I also added the following to objects.cfg:
// for UpdateFiles, the number is the fileage (in days) beyond which // a file is deemed out of date UpdateFiles ... #if moscow ... =authusers 7 #endif
I have other monitoring in place to tell me if the Admin script fails to run on moscow. This UpdateFiles entry is just added insurance.
In alarms.cfg, I changed UserDirsChkWarning as follows:
if #fa >= 365 #ifdef stifle && #fa % 15 == 0 // report only every 15 days #endifdef && $name !~ "lost\\+found" && $access !~ "^l" if $name =~ "=sysadmins" || $name =~ "=compsys" exec wait "=touch $name" elseif $substr($name,#rindex($name,"/")+1,1) eq "c" // comp account && #fa < 730 // do nothing, just skip this rule; i suppose // it's possible that some comp's never access // their homedirs their entire two years here else // NEW STUFF BEGINS HERE if $command("=grep $owner =authusers 2>/dev/null") eq "" // NEW STUFF ENDS HERE output mail "$name may be orphaned, last mod was $text(#fa) days ago" set $user = $substr($name, #rindex($name,"/")+1) output mail " nis passwd entry: " . $command("=ypmatch $user passwd 2>/dev/null") if -e "=maildir/$user" output mail " mail file: " . $command("=ll =maildir/$user") endif // output mail =nl // NEW STUFF BEGINS HERE else // owner is an authorized user #if madrid | mus | perf | ! cssys // only touch on shared, user machines exec "=touch $name" #endif endif // NEW STUFF ENDS HERE endif endif
I've removed some of the usual '#ifndef generic' and #ifdef doexec' clutter to make this clearer.
I debated whether or not to make =authusers an internal macro and not an external file. That is, I considered having FindAuthUsersAdmin write a one-line #include file like
roussel urbach bizet sibelius purcell franck gounod rossini ...
which I could then include into macros.cfg thusly:
authusers #include <macros/authusers_macros.cfg>
Done as an include file, the test above would become instead:
// NEW STUFF BEGINS HERE if #find(" =authusers ", $owner) // NEW STUFF ENDS HERE
One problem here is that in the client-side Warning.alt file, we would have a *very* long #find():
if #find(" roussel urbach bizet sibelius purcell franck ...", $owner)
Doing a string lookup in RAM is faster than $command()'ing a grep on a disk file, but I'm not sure that this wouldn't overrun GNU flex's internal buffer (pikt can handle very long quoted strings, but flex probably can't). This logical test is not called too often--only if a home directory is found to be a year or two old--so I can live with the slower grep and file access.
I'm sure you could think up your own different solutions and approaches.
After running the new Admin.alt on moscow, thereby generating the authusers file, I added UserDirsChkWarning to the Test alert and installed on kiev with:
piktc -iv -D doexec +A Test +H kiev
Observing the correct results going to stdout, I deleted the Test alert from kiev with
piktc -tv +A Test +H kiev
and installed the revised Warning alert, also all objects to reflect things like the UpdateFiles addition, with
piktc -iv +A Warning +O all -H downsys
With all of these changes in place, supposed "orphaned" home directories will now be "touched" (given a current date stamp) for authorized accounts, and we'll no longer receive bogus warnings about them.
I am tempted now to do things like Unix mv truly "orphaned" home
directories to "$name.REMOVE_ME" or "$name.REMOVE_ME_ON_001231" and do other
automated fancy stuff, but I like to take things a step at a time. I'm sure
I will complicate the script and data configurations in due course, but for
now I'm content knowing that my Warning alert messages will be significantly
smaller from now on.
For more examples, see Developer's Notes.