Swap Check Example
Case Study 3: SwapChk
Another thing we monitor is if systems run out of swap space. For that
purpose, we use the SwapChk script, a portion of which is shown in Listing 5.
Listing 5: SwapChk (fragment)
rule
output log "=swapchk_log" $inline()
end // only report if use is very high and increased by at
// least 5% since last time (hence don't report when
// swap use is high but declining)
set #use = (#blksum-#fresum)/#blksum
if ( #use >= 80% )
&& ( ( ! #defined(%use) )
|| ( %use < 80% )
|| ( #use - %use >= 5% )
)
output mail "swap utilization is $text(100*#use,0)%:=newline"
output mail "swapfile dev swaplo blocks free"
for #i=1 #i<=#innum() #i+=1
output mail $line[#i]
endfor
output mail =newline
output mail $command("=dfk /tmp | =behead(1)")
=dutop(10, /tmp)
output mail "contents of /tmp:=newline"
do #popen(LL, "=ll /tmp", "r")
while #read(LL) > 0
output mail $rdlin
endwhile
do #pclose(LL)
output mail =newline
=toptop(20)
endif
The input for this script comes from 'input proc "=swap -l | =behead(1)"'. The last rule above logs all input. This might come in handy some day if we need data to justify purchase of additional RAM.
At the end of all input, we compute #use as a percentage. If #use is equal
or greater than 80%, or if %use is not defined (because this is the first
alarm run, say), or if %use was less than 80% previously, or #use has gone
up by at least 5% over the previous %use, we format a report and send it off
as alert mail. Listing 6 is a sample report:
Listing 6: SwapChk (sample report)
PIKT ALERT
Thu Aug 17 21:20:14 2000
paris6
URGENT:
SwapChk
Report when swap use is high
swap utilization is 98%:
swapfile dev swaplo blocks free
/dev/dsk/c0t0d0s1 32,1 16 1003184 24384
/pub/perf_disk_20/swap - 16 524272 0
swap 803568 757800 45768 95% /tmp
758376 /tmp/SAS_worka0000420D
8 /tmp/screens
240 /tmp/ups_data
...
contents of /tmp:
total 544
drwx------ 2 freil perf 629 Aug 17 21:18 SAS_work
drwxr-xr-x 2 root other 69 Aug 16 06:15 screens
-rw-rw-r-- 1 root sys 239160 Aug 16 11:12 ups_data
...
last pid: 17014; load averages: 0.20, 0.23, 0.23 21:20:21
54 processes: 46 sleeping, 3 zombie, 4 stopped, 1 on cpu
Memory: 128M real, 1576K free, 738M swap in use, 7984K swap free
PID USERNAME THR PRI NICE SIZE RES STATE TIME CPU CMD
16845 freil 1 35 0 12M 3336K sleep 4:27 9.28% r3
16909 freil 3 35 0 6432K 1464K sleep 1:37 5.21% sas
16969 root 1 33 0 4872K 2792K sleep 0:00 2.80% pikt
...
PIKT has assembled for us automatically all the diagnostic information we need to assess the situation. Moreover, after we have identified user freil as the memory hog, we can simply add some extra comments to the top of this alert e-mail and forward it along to freil--demonstrating one advantage of using e-mail as PIKT's primary notification mechanism.
We could also, at least under certain circumstances or on certain systems,
augment swap space on the fly by adding the appropriate Pikt exec statements.
| | 1st page | next page |