Python and shell query of google Keyword ranking implementation code

Source: Internet
Author: User
This article mainly introduces python and the code used to query google Keyword rankings by shell. For more information about the code, see the company of my wife recently. Our company is engaged in seo, and there are a lot of keywords and websites to query. It is so distressing to watch our elders repeat the search work so hard. So I spent some time using python to write a py script based on the keyword search site ranking.

Before writing this script, I also searched the website for the script for ranking in google. Many of them use google's APIs. But I tested it. No. So, write it yourself.

The script content is as follows: (I found a few keywords on the website. For testing)

#vim keyword.py import urllib,urllib2,cookielib,re,sys,os,time,random cj = cookielib.CookieJar() vibramkey=['cheap+five+fingers','vibram+five+fingers'] beatskey=['beats+by+dre','beats+by+dre+cheap'] vibramweb=['vibramforshoes.com','vibramfivetoeshoes.net','vibramfivefingersshoesx.com '] beatsweb=['beatsbydre.com','justlovebeats.com'] allweb=['vibramweb','beatsweb'] def serchkey(key,start):     url="http://www.google.com/search?hl=en&q=%s&revid=33815775&sa=X&ei=X6CbT4GrIoOeiQfth43GAw&ved=0CIgBENUCKAY&start=%s" %(key,start)     try:         opener=urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))         opener.addheaders = [('User-agent', 'Opera/9.23')]         urllib2.install_opener(opener)         req=urllib2.Request(url)         response =urllib2.urlopen(req)         content = response.read()         f=open('google','w')         f.write(content)         tiqu=os.popen("grep -ioP '(?<=).*?(?=)' google|sed -r 's/(<*\/*cite>|<\/*b>)//g'").readlines()     except:         changeip()     else:         for yuming in pinpai:                 a=1                 for shouyuming in tiqu:                     real=shouyuming.find(yuming)                     if real>0:                         if start==0:                             page=1                         elif start==10:                             page=2                         elif start==20:                             page=3                         elif start==30:                             page=4                         else:                             page=5                         lastkey=key.replace("+"," ")                         xinxi="%s\t\t %s\t\t page%s,%s
\n" %(yuming,lastkey,page,a) xinxifile=open('index.html','a') xinxifile.write(xinxi) xinxifile.close() a=a+1 def changeip(): ip=random.randint(0,2) de="route del -host google.com" add="route add -host google.com eth1:%s" %ip os.system(de) os.system(add) print "changip to %s" %ip pinpaiid=0 for x in vibramkey,beatskey: if pinpaiid == 0: pinpai=vibramweb elif pinpaiid == 1: pinpai=beatsweb pinpaiid=pinpaiid+1 for key in x: for start in 0,10,20,30,40: serchkey(key,start) changeip() os.system("sh paiban.sh")

#vim paiban.sh #! /bin/bash sort index.html -o index.html line=`wc -l index.html|awk '{print $1}'` yuming2=`sed -n 1p index.html|awk '{print $1}'` for i in `seq 2 $line` do yuming=`sed -n "$i"p index.html|awk '{print $1}'` if [ $yuming == $yuming2 ];then sed -i ""$i"s/"$yuming"/\t\t/g" index.html else yuming2=$yuming fi done 

This script is divided into two parts. The first part is that python uses keywords to search for google pages. The old lady said that only the first five pages of each keyword can be used. Therefore, only the first five pages are queried.
The second part is to typeset the query results. That is, what paiban. sh is called at the bottom, and the final result is in the following format:

Website 1 keyword 1 page number
Keyword 2 page number
Keyword 3 page number

Website 2 keyword 1 page number
Keyword 2 page number
Keyword 3 page number
The following describes the program.

Import urllib, urllib2, cookielib, re, sys, OS, time, random # Load module cj = cookielib. cookieJar () vibramkey = ['cheap + five + Fingers', 'vibram + five + Fingers'] # define the keyword group 1 to be queried, the single quotes in it are the keywords to be queried. Beatskey = ['beats + by + rd', 'beats + by + DH + cheap '] # Same as above, define the keyword group 2. This is another set of keywords. Vibramweb = ['vibramforshoes. com ', 'vibramfivetoeshoes.. net ', 'vibramfivefingersshoesx. com '] # define Guan Jian phrase 1 the website to be queried beatsweb = ['beatsbydre.com', 'justlovebeats.com '] # define Guan Jian phrase 2 the website allweb to be queried = ['vibramweb ', 'beatsweb'] # A group of all websites is defined here. Def serchkey (key, start): # define a function here. The key is the key of the query, and the start is the page. On the google query page, we can see that each page has only 10 records except ads, when start is set to 0, the first to tenth records are displayed on the first page. When start is set to 10, the first to ten records on the second page are displayed, and so on. Url =" http://www.google.com/search?hl=en&q=%s&revid=33815775&sa=X&ei=X6CbT4GrIoOeiQfth43GAw&ved=0CIgBENUCKAY&start=%s "% (Key, start) # This defines the query URL try: opener = urllib2.build _ opener (urllib2.HTTPCookieProcessor (cj) opener. addheaders = [('user-agent', 'Opera/100')] # simulate browser access to urllib2.install _ opener (opener) req = urllib2.Request (url) # Use urllib2 to access response = urllib2.urlopen (req) content = response. read () # This is a page that simulates the browser to access the url and read the source code f = open ('Google ', 'w') f. write (content) # Save the read content to a google Page. Tiqu = OS. popen ("grep-ioP '(? <=).*? (? =) 'Google | sed-r's/(<* \/* cite >|< \/* B>) // G '"). readlines () # The system command is used here. Use the regular zero-width assertion to extract the first to tenth domain names. Failed T: changeip () # This is a fear of too many accesses being blocked by google. So here is a function for changing ip addresses, which is defined below. If the try fails, the ip address change will be executed. Else: for yuming in pinpai: # cyclically read the website a = 1 for shouyuming in tiqu: # cyclically read the website real = shouyuming. find (yuming) # Compare the searched website with the desired website if real> 0: if start = 0: page = 1 elif start = 10: page = 2 elif start = 20: page = 3 elif start = 30: page = 4 else: page = 5 # here, you can check which page of the domain name is searched by google. Lastkey = key. replace ("+", "") # Remove the plus sign in the middle of the defined keyword. Print yuming, lastkey, page, a xinxi = "% s \ t page % s, ranking % s \ n" % (yuming, lastkey, page, a) xinxifileappsopen('index.html ', 'A') xinxifile. write (xinxi) xinxifile. close () writes the retrieved information to the index.html file aa = a + 1 def changeip (): # defines the function for changing the ip address during query. If the machine has only one ip address, you do not need this ip address. Ip = random. randint () # generate a random number of del = "route del-host google.com" # Delete the route command add = "route add-host google.com eth1: % s "% ip # Add routing command OS. system (del) # Run the delete route command OS. system (add) # execute the add route command print "changip to % s" % ip # print and change the route information pinpaiid = 0 for x in vibramkey, beatskey: # loop all the keyword groups if pinelist id = 0: # the corresponding keyword group and the website group to be queried pinpai = vibramweb elif pinelist id = 1: pinpai = beatsweb pinpaiidpinpaiid = pinpaiid + 1 for key in x: # Loop Keywords in the keyword group: for start in, 20, 30, 40: # define the google Page serchkey (key, start) changeip () # change the ip function. After each set of keywords is queried, change the ip address.

After the above command is executed, we will download the content of the index.html file. As follows:

The Code is as follows:


# Cat index.html
Vibramforshoes.com cheap five fingers page 1, rank 3
Vibramfivetoeshoes.net cheap five fingers page 5, rank 5
Vibramforshoes.com vibram five fingers page 1, rank 6
Vibramfivetoeshoes.net vibram five fingers page 5, rank 10
Beatsbydre.com beats by de page 1, rank 1
Justlovebeats.com beats by de page 5, rank 7
Beatsbydre.com beats by De cheap page 2, rank 2
Beatsbydre.com beats by De cheap page 2, rank 3
Beatsbydre.com beats by De cheap page 5, rank 10

This is messy, so how can we achieve the format corresponding to multiple keywords behind the station mentioned above? Here we will use the paiban. sh script. We put paiban. sh at the end of The py program. After the execution of The py program is completed, the execution of paiban. sh is added to the py program, and no additional execution is required. Let me take a look at the differences here. All are commented out in The py program.

#sh  paiban.sh #cat index.html beatsbydre.com          beats by dre cheap       page 2,rank 2                  beats by dre cheap       page 2,rank 3                  beats by dre cheap       page 5,rank 10                  beats by dre          page 1,rank 1 justlovebeats.com        beats by dre          page 5,rank 7 vibramfivetoeshoes.net      cheap five fingers       page 5,rank 5                  vibram five fingers       page 5,rank 10 vibramforshoes.com        cheap five fingers       page 1,rank 3                  vibram five fingers       page 1,rank 6 

In this way, the above results can be achieved. The layout is also clear, which site corresponds to which keyword. The number of digits on the page is clear at a glance.

We will also explain the script paiban. sh.

# Vim paiban. sh #! /Bin/bash sort index.html-o index.html ← first sort the index.html file and then write it into index.html line = 'wc-l index.html | awk '{print $1} ''# yuming2 = 'sed-n 1 p index.html | awk '{print $1} ''# obtain the domain name of the first line to yuming2 for I in 'seq 2 $ line' # Start from the second line and get the domain name do yuming = 'sed -n "$ I" p index.html | awk '{print $1} ''if [$ yuming = $ yuming2]; then sed-I "" $ I "s/" $ yuming "/\ t/g" index.html # If the domain name of the next line is the same as that of yuming2, replace the domain name of the next row with an empty else yuming2 = $ yuming # if they are not the same, the domain name of the next row is assigned to the fi done variable of yuming2.


Okay. This script is very useful, and is used every day. Reduced her workload. I can do anything else... haha .. If you cannot understand it, please join QQ for discussion. QQ: 410018348

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.