Sunday, September 30, 2012

Extracting urls from Google search result pages in html(Perl script)

If you want to extract the urls of the site you give in site: in Google search from the html code of the SERP ( google search results pages) then here is a simple Perl script to do just that:

use strict;
use feature "switch";



my $file ="f:/tmp/v1.htm";
my $url_starts_with="http://www.creditcardpaymentgateways.in/2012.php";  

my $content;
{
 local $/;
 
 open FP, "<$file" or die "Can't open $file for reading";
 
 $content = <FP>;
 
 close FP;
}

my %map;
while(
 $content =~ m#($url_starts_with[^"]+)#gsi) 
    
{
 $map{$1}=1;
}

map { print "$_\n"; } sort keys %map;

No comments:

Post a Comment