Swap Check Example
Case Study 3: SwapChk
Another thing we monitor is if systems run out of swap space. For that purpose, we use the SwapChk script, a portion of which is shown in Listing 5.
Listing 5: SwapChk (fragment)
rule output log "=swapchk_log" $inline() end // only report if use is very high and increased by at // least 5% since last time (hence don't report when // swap use is high but declining) set #use = (#blksum-#fresum)/#blksum if ( #use >= 80% ) && ( ( ! #defined(%use) ) || ( %use < 80% ) || ( #use - %use >= 5% ) ) output mail "swap utilization is $text(100*#use,0)%:=newline" output mail "swapfile dev swaplo blocks free" for #i=1 #i<=#innum() #i+=1 output mail $line[#i] endfor output mail =newline output mail $command("=dfk /tmp | =behead(1)") =dutop(10, /tmp) output mail "contents of /tmp:=newline" do #popen(LL, "=ll /tmp", "r") while #read(LL) > 0 output mail $rdlin endwhile do #pclose(LL) output mail =newline =toptop(20) endif
The input for this script comes from 'input proc "=swap -l | =behead(1)"'. The last rule above logs all input. This might come in handy some day if we need data to justify purchase of additional RAM.
At the end of all input, we compute #use as a percentage. If #use is equal or greater than 80%, or if %use is not defined (because this is the first alarm run, say), or if %use was less than 80% previously, or #use has gone up by at least 5% over the previous %use, we format a report and send it off as alert mail. Listing 6 is a sample report:
Listing 6: SwapChk (sample report)
PIKT ALERT Thu Aug 17 21:20:14 2000 paris6 URGENT: SwapChk Report when swap use is high swap utilization is 98%: swapfile dev swaplo blocks free /dev/dsk/c0t0d0s1 32,1 16 1003184 24384 /pub/perf_disk_20/swap - 16 524272 0 swap 803568 757800 45768 95% /tmp 758376 /tmp/SAS_worka0000420D 8 /tmp/screens 240 /tmp/ups_data ... contents of /tmp: total 544 drwx------ 2 freil perf 629 Aug 17 21:18 SAS_work drwxr-xr-x 2 root other 69 Aug 16 06:15 screens -rw-rw-r-- 1 root sys 239160 Aug 16 11:12 ups_data ... last pid: 17014; load averages: 0.20, 0.23, 0.23 21:20:21 54 processes: 46 sleeping, 3 zombie, 4 stopped, 1 on cpu Memory: 128M real, 1576K free, 738M swap in use, 7984K swap free PID USERNAME THR PRI NICE SIZE RES STATE TIME CPU CMD 16845 freil 1 35 0 12M 3336K sleep 4:27 9.28% r3 16909 freil 3 35 0 6432K 1464K sleep 1:37 5.21% sas 16969 root 1 33 0 4872K 2792K sleep 0:00 2.80% pikt ...
PIKT has assembled for us automatically all the diagnostic information we need to assess the situation. Moreover, after we have identified user freil as the memory hog, we can simply add some extra comments to the top of this alert e-mail and forward it along to freil--demonstrating one advantage of using e-mail as PIKT's primary notification mechanism.
We could also, at least under certain circumstances or on certain systems, augment swap space on the fly by adding the appropriate Pikt exec statements.
prev page | 1st page | next page |