python brushes web clicks via proxy
Update exception handling conditions
@time 2013-0803 Update Cycle counting problems and random wait time issues
#!/usr/bin/python
#-*-Coding:utf-8-*-
‘‘‘
This script mainly implements the page click, in addition to implementing the sub-function point, there are three knowledge points:
1, randomly get proxy IP, through proxy IP access to the designated site, the purpose is to prevent IP is blocked
2, visit a page, random rest a few seconds, re-visit, the purpose is to prevent the site in front of the 4-7-layer filter device interception
3. Modify the User Agent field for HTTP, and some websites and 4-7-tier devices will check
Created on 2013-7-14
@author: QQ136354553
‘‘‘
Import urllib2,re,time,urllib,proxyip,random,user_agents
def gethtml (URL):
Proxy_ip =random.choice (proxyip.proxy_list) #在proxy_list中随机取一个ip
Print Proxy_ip
Proxy_support = Urllib2. Proxyhandler (PROXY_IP)
Opener = Urllib2.build_opener (proxy_support,urllib2. HttpHandler)
Urllib2.install_opener (opener)
Request = Urllib2. Request (URL)
User_agent = Random.choice (user_agents.user_agents) #在user_agents中随机取一个做user_agent
Request.add_header (' user-agent ', user_agent) #修改user-agent Field
Print User_agent
html = urllib2.urlopen (Request). Read ()
Return PROXY_IP
URLs = [' http://www.25shiyan.com/?fromuid=16 ', ' http://www.25shiyan.com/forum.php?mod=viewthread&tid=37840 &extra=page%3d1 ', ' http://www.25shiyan.com/forum.php?mod=viewthread&tid=36786&extra=page%3D1 ']
Count_true,count_false,count= 0,0,0
While True:
For URL in URLs:
Count +=1
Try
proxy_ip=gethtml (URL)
Except Urllib2. Urlerror:
print ' urlerror! The bad proxy is%s '%proxy_ip
Count_false + = 1
Except Urllib2. Httperror:
print ' httperror! The bad proxy is%s '%proxy_ip
Count_false + = 1
Except
print ' Unknown errors! The bad proxy is%s '%proxy_ip
Count_false + = 1
Randomtime = Random floating-point number between the Random.uniform (1,3) #取1-10
Time.sleep (randomtime) #随机等待时间
print '%d eroors,%d OK, total%d '% (count_false,count-count_false,count)
######################
The above modules are introduced:Proxyip,user_agents content as follows:
######################
proxyip.py
#!/usr/bin/python
#-*-Conding:utf-8-*-
Proxy_list = [
{' http ': ' http://59.53.67.215:80 '},
{' http ': ' http://60.161.14.77:8001 '},
{' http ': ' http://61.144.14.68:80 '},
{' http ': ' http://61.144.68.180:9999 '},
{' http ': ' http://61.164.108.84:8844 '},
{' http ': ' http://61.166.55.153:11808 '}
]
###########################
user_agents.py
#!/usr/bin/python
#-*-coding:utf-8-*-
Import random
user_agents = [
' mozilla/5.0 ( Windows; U Windows NT 5.1; It rv:1.8.1.11) gecko/20071127 firefox/2.0.0.11 ',
' opera/9.25 (Windows NT 5.1; U EN) ',
' mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;. NET CLR 1.1.4322;. NET CLR 2.0.50727) ',
' mozilla/5.0 (compatible; konqueror/3.5; Linux) khtml/3.5.5 (like Gecko) (Kubuntu) ',
' mozilla/5.0 (X11; U Linux i686; En-us; rv:1.8.0.12) gecko/20070731 ubuntu/dapper-security firefox/1.5.0.12 ',
' LYNX/2.8.5REL.1 libwww-FM/ 2.14 ssl-mm/1.4.1 gnutls/1.2.9 '
]
####################################
1, proxy IP is currently just static list, want to make dynamic acquisition, not yet realized, follow-up consideration
2, URLs did not handle well, initially want to from a main battle point, crawl sub-link, and now has not realized
Python brushes web clicks via proxy