Report Recent Google Googlebot and Mediabot Visits

googlebot.pl is a simple Perl script to scan the Apache web server access logs and report recent visits by the Google googlebot (search engine spider) and Adsense mediabot (ad server crawler).  (The ad server agent identifies itself as "Mediapartners-Google").

googlebot.pl might produce output like the following:

authors.html  |  09/Sep/2005:01:45:05  |  20/Aug/2005:08:37:10
changes.html  |  05/Sep/2005:19:00:56  |  07/Apr/2005:04:17:56
...
faq.html  |  09/Sep/2005:01:46:59  |  08/Sep/2005:09:51:33
index.html  |  08/Sep/2005:21:52:05  |  08/Sep/2005:10:28:15
intro/intro.html  |  09/Sep/2005:01:47:26  |  06/Sep/2005:17:49:29
...

Here is the Perl script:

#!/usr/bin/perl

$site = $ARGV[0];

# inventory the pages
open (PAGES, "/usr/bin/find /var/www/html/$site -name \\*.htm -o -name \\*.html -print |");
while (<PAGES>) {
        chomp;
        s/\.\///g;
        s/\/var\/www\/html\/$site\///;
        $pages{$_}++;
}
close(PAGES);

# scan the access log(s), and for each accessed page, note the latest access
# date and time
open(LOG, "/bin/cat /var/log/httpd/access_log_$site.07* /var/log/httpd/access_log_$site |");
while (<LOG>) {
        /^.+\[(\d+\/\w+\/\d+:\d+:\d+:\d+).+\"(get|head)\s(\S+)\s
         .+\"\s+\d+\s+\S+\s?\"([^\"]+)\".+\"([^\"]+)\"$/i;
        $date = $1;
        $page = $3;
        $agent = $5;
        if ($page =~ /\/pikt\/(.+)/) {
                $page = $1;
        }
        if (($page eq "/") || ($page eq "/pikt/")) {
                $page = "index.html";
        }
        if ($page =~ /^\/(.+)/) {
                $page = $1;
        }
        if ($agent =~ /(googlebot|google\.com\/bot\.html)/i) {
                $googlebot{$page} = $date;
        }
        if ($agent =~ /mediapartners-google/i) {
                $mediabot{$page} = $date;
        }
}
close(LOG);

# for all inventoried pages, report the date and time of last googlebot and
# mediabot access
foreach $page (sort keys %pages) {
        print "$page  |  $googlebot{$page}  |  $mediabot{$page}\n";
}

exit 0;

googlebot.pl is called by the GooglebotVisitPIKT and GoogleMediabotVisitPIKT Pikt scripts.

For more examples, see Samples.

 
Home | FAQ | News | Intro | Samples | Tutorial | Reference | Software
Developer's Notes | Licensing | Authors | Pikt-Users | Pikt-Workers | Related Projects | Site Index | Privacy Policy | Contact Us
Page best viewed at 1024x768 or greater.   Page last updated 2019-01-12.   This site is PIKT® powered.
Copyright © 1998-2019 Robert Osterlund. All rights reserved.
Home FAQ News Intro Samples Tutorial Reference Software
PIKT Logo
PIKT Page Title
View sample
HTTP log entries
Pikt script