Python Get IP Proxy list crawler

Source: Internet
Author: User
Tags get ip

Recently practice writing crawler, originally climbed a few mm chart to do the test, but climbed to dozens of pieces of time will return 403 error, this is the site server found, I was blocked.

Therefore, you need to use proxy IP. In order to facilitate later use, I intend to write an automatic crawling IP agent crawler, is so-called, Ax, after reading High school again work!

First look at the results of the operation:

  

  function returns a list

Talk less, put the code out:

  

#-*-coding:utf-8-*-ImportUrllibImportUrllib2ImportReImport Time#obtain some IP and port for spider from a site,xicidaili.com.classObtainproxy:def __init__(Self,region ='Domestic General'): Self.region= {'Domestic General':'nt/','Domestic High Stealth':'nn/','Foreign General':'wt/','Foreign High Stealth':'wn/','SOCKS':'qq/'} self.url='http://www.xicidaili.com/'+Self.region[region] Self.header={} self.header['user-agent'] ='mozilla/5.0 (Windows NT 6.1; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/31.0.1650.63 safari/537.36'    defGet_prpxy (self): req= Urllib2. Request (self.url,headers =self.header) Resp=Urllib2.urlopen (req) content=resp.read () self.get_ip= Re.findall (r'(\d+\.\d+\.\d+\.\d+) </td>\s*<td> (\d+) </td>', content) Self.pro_list= []         foreachinchSelf.get_ip:a_info= Each[0] +':'+ each[1] Self.pro_list.append (a_info)returnself.pro_listdefSave_pro_info (self): with open ('Proxy','W') as F: foreachinchSelf.get_ip:a_info= Each[0] +':'+ each[1] +'\ n'f.writelines (a_info)if __name__=='__main__': Proxy=Obtainproxy ()PrintProxy.get_prpxy ()

This thing is still very good.

  

Python Get IP Proxy list crawler

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.