A couple of months ago, when we were struggling through our email crisis (caused by our merging our 34,000+ alumni mail operation with our 4,000+ current user main mail operation), one of the problems that beset us was persistent python processes, raising the system load average to 4 and above, and lasting for days. Eventually, we traced the cause to bad addresses in our GNU Mailman lists (we run 450+ of them). I cleaned out the bad addresses (most of them, anyway) by hand but was unable to set up bad-address monitoring. Until now, that is. Here is a new Pikt script we've installed on our primary e-mail server:
#if moscow MailmanAddressesChkWarning init status active level warning task "Check to see that local accounts/aliases referenced in Mailman lists are legitimate." input file "=mmusers" filter "=grep '\@egbdf'" seps ":@" dat $list 1 dat $acct 2 begin // because this takes quite a while, check on mondays only if #weekday() != 2 quit endif // record all users in an instance-count assoc array if #fopen(ALLUSERS, "=allusers", "r") != #err() while #read(ALLUSERS) > 0 # ifdef debug // output every 100th user to screen =incr(#u) if #u % 100 == 0 output "$text(#u) $rdlin" endif # endifdef =incr(#users[$rdlin]) endwhile do #fclose(ALLUSERS) else output mail "can't open =allusers for reading!" quit endif // record all local aliases in an instance-count assoc array if #popen(ALIFILES, "=ls -1 /etc/mail/ali*", "r") while #read(ALIFILES) > 0 if $rdlin !~ "^/etc/mail/(aliases|ali[[:digit:]])$" cont endif set $alifile = $rdlin if #fopen(ALIASES, $alifile, "r") while #read(ALIASES) > 0 # ifdef debug // output every 100th user // to screen =incr(#a) if #a % 100 == 0 output "$text(#a) $rdlin" endif # endifdef if $rdlin =~ "^[[:alpha:]]" && #split($rdlin, ":") == 2 =incr(#aliases[$1]) endif endwhile do #fclose(ALIASES) else do #pclose(ALIFILES) output mail "can't open $alifile for reading!" quit endif endwhile do #pclose(ALIFILES) else output mail "can't determine aliases files!" quit endif # ifdef debug rule // output every user to screen output "$text(#innum()) $acct" // quit at 1000th user //if #innum() == 1000 // quit //endif # endifdef rule if ! #defined(#users[$acct]) && ! #defined(#aliases[$acct]) && $command("=ypmatch $acct aliases 2>&1") =~~ "no such" && $command("=ls /opt/mailman/lists/$acct 2>&1") =~~ "no such" output mail "bad address: $inlin" endif #endif // moscow #endifdef // generic
"=mmusers" is an alias pointing to a file with all list-user pairs like so:
weekend-students:email@example.com weekend-students:firstname.lastname@example.org weekend-students:email@example.com weekend-students:firstname.lastname@example.org ...
"=allusers" is simply a list of all user accounts, e.g.,
groucho harpo chico zeppo ...
We generate these lists with a couple of Pikt Admin scripts. Look in the 1.10.0 (and now 1.10.1) configs_samples if you're interested.
Since this script has a lot of data to chunk, we've decided to run this just once a week, early on Monday morning. We have an Info alert that runs every Monday morning, but in order to highlight the bad addresses report, we include it in the daily Warning run but actually run this alarm on Mondays only via the
if #weekday() != 2 quit endif
To store the list of user accounts in memory, we use an instance-count associative array:
=incr() is a macro defined in macros.cfg as:
incr(N) if ! #defined((N)) set (N) = 1 else set (N) += 1 fi
You must try to remember that, unlike in Perl for example, uninitialized Pikt variables don't default to 0. You can't just do a
set #users[$rdlin] += 1
because that presupposes a default value, which Pikt doesn't do. (An attempted access of an uninitialized variable like the one above would generate a WARNING in the script .log file. You don't get that WARNING with the =incr() macro because it does the explicit initialization when necessary--i.e., if the variable is not yet #defined().)
On the mailserver, we store our regular aliases (including Mailman list aliases) in /etc/mail/aliases, and our alumni aliases in /etc/mail/ali[0-9]. Here, too, we store in memory all local aliases in an instance-count associative array:
For our input loop, we only look at local addresses:
input file "=mmusers" filter "=grep '\@egbdf'"
We'd like to consider all addresses, but this is much, much better than nothing (the great majority of addresses are a local mailbox).
In the final rule, we check
- if the account (or alias) is not in the allusers list
- if the account (or alias) is not in the local aliases list
- if the account (or alias) is not in the NIS aliases database
- if the "account" (or "alias") is not a Mailman list
If all of the above are true, we conclude that we have a bad local address. For now, we just note this in an 'output mail' statement. Eventually, as we grow more confident about this screening, we can automate the removal of the bad addresses using the appropriate GNU Mailman script commands.
(By the way, it was while processing close to 80,000 account and alias names that we observed that weird "nan" assoc array index bug. There's nothing like chunking on real data for discovering program flaws.)
This routine is not perfect, for it doesn't register all accounts and
aliases local to the dozens of private faculty desktops in our domain.
Still, most of those are in NIS, and their number is dwarfed by our main
central account and alias databases. So this script should discover 99+%
of our bad Mailman addresses--the local ones, that is.
For more examples, see Developer's Notes.