Python to achieve website PR and Baidu weight

Last Update:2016-06-06 Source: Internet

Author: User

Tags ord

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The last time I used the requests library to write a crawl page link in the simple code, extension, we can also use it to get our website PR and Baidu weight. The principle is similar. Finally, we can even write a loop to query the site of the bulk of the relevant information.

First talk about Googlepr, full name PageRank. It is Google's official assessment of a website SEO rating, this should not be unfamiliar. Since it is officially given, of course there is an official interface to get it. We use the official interface to get Google Pr.

The code is as follows:

Gpr_hash_seed = "Mining PageRank is against GOOGLE ' S TERMS of SERVICE. Y\
Es, I ' m talking to you, scammer. "

def google_hash (value):
Magic = 0x1020345
For i in Xrange (len (value)):
Magic ^= Ord (gpr_hash_seed[i% len (gpr_hash_seed)) ^ ord (Value[i])
Magic = (Magic >> | Magic << 9) & 0xFFFFFFFF
Return "8%08x"% (Magic)

def GETPR (WWW):
Try
url = ' Http://toolbarqueries.google.com/tbr? ' \
' client=navclient-auto&ch=%s&features=rank&q=info:%s '% (Google_hash (www), www)
Response = requests.get (URL)
Rex = Re.search (R ' (. *?:.*?:) (\d+) ', Response.text)
Return Rex.group (2)
Except:
Return None

How to use: Incoming domain name, return PR value

Google_hash This function is just an algorithm that calculates a domain name that resembles a hash value and returns. We can not control how it is implemented, we mainly look at GETPR this function. Our official Google interface is this: Http://toolbarqueries.google.com/tbr?client=navclient-auto&ch={hash}&features=rank &q=info:{Domain}

{Hash} Here we use Google_hash () This function, passed in the domain name, return its corresponding HASH value. For example, our farewell song domain name www.leavesongs.com, its Google hash is 8b1e6ad00, so the construction of the consultation site is: http://toolbarqueries.google.com/tbr?client= Navclient-auto&ch=8b1e6ad00&features=rank&q=info:www.leavesongs.com

Access it and get rank_1:1:0. The number after the second quotation mark is PR, because my station is no PR, so the PR is 0.

So, we use Requests.get () to access the constructed URL, and then get a result like rank_1:1:0, and finally get the PR value of 0 by regular or other means.

The above is the execution of the GETPR function. Then see the process of acquiring Baidu weight.

Baidu weight is not the official Baidu to give a standard, is a number of third-party website calculation of a value, so there is no interface like PR. So we need to crawl the information in these third-party websites. Here is the function to get Baidu weight:

The code is as follows:

def GETBR (WWW):
Try
url = ' http://mytool.chinaz.com/baidusort.aspx?host=%s&sortType=0 '% (www,)
Response = requests.get (URL)
data = Response.text
Rex = Re.search (R ' (. +?) (\d*?) () ', Data,re. I)
Return Rex.group (2)
Except:
Return None

The use method is also the incoming domain name, which returns the weight value.

I crawl is webmaster Tools a weight Consulting page: http://mytool.chinaz.com/baidusort.aspx?host={Domain name}&sorttype=0

My regular Is it: (. +?) (\d*?) (), you can see the source code to see, you know how to write the regular.

OK, let's get the PR and weights for these sites in bulk:

See the results directly:

A single process sweep words will be slightly slower, open 10 20 threads in bulk to get the words should be relatively fast.



This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More