Python to get the website PR and Baidu weight _python

Source: Internet
Author: User
Tags ord

Last time I used the requests library to write a crawl page of the simple link to the code, extended, we can also use it to get our site's PR and Baidu weight. The principle is similar. Finally, we can even write a circular batch to query the relevant information of the website.

First talk about Googlepr, full name PageRank. It is Google's official assessment of a website SEO rating, this should not be unfamiliar to everyone. Since it is officially given, there is of course an official interface to get it. We use the official interface to get Google PR here.

Copy Code code as follows:

Gpr_hash_seed = "Mining PageRank is against GOOGLE ' S TERMS of SERVICE. Y\
Es, I ' m talking to you, scammer. "

def google_hash (value):
Magic = 0x1020345
For i in Xrange (len (value)):
Magic ^= Ord (gpr_hash_seed[i% len (gpr_hash_seed)]) ^ ord (value[i))
Magic = (Magic >> | Magic << 9) & 0xFFFFFFFF
Return "8%08x"% (Magic)

def GETPR (WWW):
Try
url = ' Http://toolbarqueries.google.com/tbr? ' \
' client=navclient-auto&ch=%s&features=rank&q=info:%s '% (Google_hash (www), www)
Response = requests.get (URL)
Rex = Re.search (. *?:.*?:) (\d+) ', Response.text)
Return Rex.group (2)
Except:
Return None

How to: Pass in the domain name, return the PR value

Google_hash This function is just an algorithm to work out a domain name similar to the hash value of a thing and return. We can not control how it is implemented, we mainly look at GETPR this function. Our official Google-given interface is this: Http://toolbarqueries.google.com/tbr?client=navclient-auto&ch={hash}&features=rank &q=info:{Domain Name}

{Hash} Here we use the Google_hash () function to pass in the domain name and return its hash value. For example, our parting song domain name www.leavesongs.com, its Google hash is 8b1e6ad00, then constructs the consultation website is: http://toolbarqueries.google.com/tbr?client= Navclient-auto&ch=8b1e6ad00&features=rank&q=info:www.leavesongs.com

Access it, get rank_1:1:0. The number behind the second quotation mark is PR, because my station is not PR, so PR is 0.

So, we use Requests.get () to visit our constructed URL and get a result like rank_1:1:0, and finally get PR value 0 by regular or otherwise.

The above is the process of GETPR this function. See the process of gaining Baidu weight again.

Baidu is not the weight of the official Baidu to a standard, is a number of Third-party Web site calculation of a value, so there is no such as the PR interface. So we need to crawl the information in these third party websites. Here is the function to get the weight of Baidu:

Copy Code code as follows:

def GETBR (WWW):
Try
url = ' http://mytool.chinaz.com/baidusort.aspx?host=%s&sortType=0 '% (www,)
Response = requests.get (URL)
data = Response.text
Rex = Re.search (R ' (<div class= "Siteinfo" >.+?<font.+?>) (\d*?) (</font>) ', Data,re. I)
Return Rex.group (2)
Except:
Return None

The use method is also the incoming domain name, which returns the weight value.

I crawl is webmaster Tools a weight Consulting page: http://mytool.chinaz.com/baidusort.aspx?host={domain}&sorttype=0

My regular Is it: (<div class= "Siteinfo" >.+?<font.+?>) (\d*?) (</font>), you can look at the source code, you know how to write.

OK, let's get the PR and weight of these websites in bulk:

Look directly at the result:

A single process sweep will be slightly slower, and it should be quicker to open 10 or 20 threads for bulk acquisition.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.