Perl, Tutorials

How to use Perl to fetch website details

Sending
User Rating 5 (1 vote)

Whenever I was surfing any good technical website,  I was getting curious to know what’s its page rank, alexa rank, where it is hosted, who handles its mails, when this domain was registered and by whom, when it will get expired and so on.
For all these, I had to visit different websites to gather all such information, then I thought it’s better to write a script which will fetch all the details for me and came up with site info details script using Perl.
Fully working code is available here.

What we need to run this script successfully?

  • Perl interpreter should be installed. If you are on Linux , it will be available usually else you need to install it.
  • We need four Perl modules to get all these information listed below:
    • Net::DNS   -> to get all the information related to Domain Name
    • Net::Whois::Raw qw( whois ) -> to get domain information from whois servers
    • WWW::Google::PageRank -> to get the page rank for a website
    • LWP::Simple -> to fetch the content for a given url
  • If you don’t have these modules installed then please install these before using the script.
This script will work only iff:
What it does?
This script will gather all the information like:
  • DNS Servers
  • Mail Servers
  • Google Page Rank
  • Alexa Rank
  • Creation Date
  • Expiration Date
As of now we display only this information. This script may not work 100% for all domain names.
How it does?
  1. We need to create an object for Net::DNS to get name servers and mail servers details i.e.
    	my $res = Net::DNS::Resolver->new();
    	
Then to get name servers details just write:

my $name_servers = $res->query($domain, "NS");

To get mail servers list you can use:

my @mx = mx($res, $domain);

It’s just an another way to use Net::DNS object $res

  1. To get the Google Page Rank you need to write this code:
my $page_rank = WWW::Google::PageRank->new();
my $rank = $page_rank->get("http://www.$domain_name");
  1. To get the Alexa rank of a domain we need to get the result from alexa’s official website using get method define under LWP::Simple i.e.
my $url = "http://www.alexa.com/siteinfo/$domain";
my $content = get($url);

and then it’s just matter of finding a tag which has Alexa rank. This method will stop working if Alexa’s website would change its way od displaying the content. (But then again, we can tweak and write regular expression according to that)

At present regular expression to get Alexa Rank from that html page is:

#Regular Expression to Find Global Rank
my @alexa_rank = $content =~ m/globe-sm.jpg(.*?)"\/>\n(.*?)<\/div>/gis;
print "Alexa Rank: $alexa_rank[1]<br />";

To get the site details, first we need to query whois servers and have to use regular expression accordingly to fetch the desired results.

my $site = whois($domain_name); # will return the output from whois server

Now we need to use regular expression to get Expiration Date, Creation Date, Modified date etc.
Here is one of those regular expressions that I have used in this script:

my @created = $site =~ m/registration\s+date(.*?):\s*(.*?)\n+/sgi;
print "Created on: $created[1]\n";

I explained the steps because fully working code is already posted earlier and many users asked how it is working and how we can run it. You can copy fully working code from here.

Please give credit to authors while sharing this code.
In case, if you find any error or this script doesn’t work properly or you want some more information apart from these details, then please post your details as a comment here.

Note: whois query may take time if it’s not TLD (Top Level Domain) or may not show result if it’s some hosting company who has its own hosting servers, registrars etc  like facebook.com, google.com, godaddy.com etc

One Comment

  1. Your mode of describing everything in this post is in fact nice, every
    one can easily understand it, Thanks a lot.

Share your Thoughts