Python get instance sharing of proxy IP

Last Update:2018-05-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article mainly introduces the Python get proxy IP instance sharing, has a certain reference value, now share to everyone, the need for friends can refer to

Usually when we need to crawl some of the data we need, always some sites prohibit duplicate access to the same IP, this time we should use proxy IP, each visit before the disguise themselves, so that the "enemy" can not be detected.

Ooooooooooooooook, let us have a pleasant start!

This is the file that gets the proxy IP, and I'm going to modularize them into three functions

Note: The text will be some English comments, is to write code convenient, after all, English one or two words will be OK

#!/usr/bin/python#-*-coding:utf-8-*-"" "Author:dasuda" "" Import urllib2import reimport socketimport ThreadingfindIP = [] #获取的原始IP数据IP_data = [] #拼接端口后的IP数据IP_data_checked = [] #检查可用性后的IP数据findPORT = [] #IP对应的端口available_table = [] #可用IP的索引d EF GetIP (url_target): Patternip = Re.compile (R ' (?<=<td>) [\d]{1,3}\.[ \d]{1,3}\. [\d] {1,3}\. [\d] {1,3} ') Patternport = Re.compile (R ' (?<=<td>) [\d]{2,5} (?=</td>) ') print "Now,start to refresh proxy IP ... "For page in range (1,4): url = ' http://www.xicidaili.com/nn/' +str (page) headers = {" User-agent ":" Mozilla/5.0 (Windows NT 10.0; WOW64) "} request = Urllib2. Request (Url=url, headers=headers) response = Urllib2.urlopen (request) content = Response.read () Findip = Re.findall (Pat TERNIP,STR (content)) Findport = Re.findall (patternport,str (content)) #assemble the IP and port for I in range (Len (Findi P)): findip[i] = Findip[i] + ":" + findport[i] ip_data.extend (findip) print (' Get page ', page) print "Refresh done!!! "#use MultithrEading Mul_thread_check (url_target) return ip_data_checkeddef Check_one (url_check,i): #get Lock lock = threading. Lock () #setting timeout socket.setdefaulttimeout (8) try:ppp = {"http": ip_data[i]} proxy_support = Urllib2. Proxyhandler (PPP) Openercheck = Urllib2.build_opener (proxy_support) Urllib2.install_opener (openercheck) request = Urllib2. Request (Url_check) request.add_header (' User-agent ', "mozilla/5.0 (Windows NT 10.0; WOW64) ") HTML = urllib2.urlopen (Request). Read () Lock.acquire () print (Ip_data[i], ' is OK ') #get available IP index Avai Lable_table.append (i) lock.release () except Exception as E:lock.acquire () print (' Error ') lock.release () def Mul_thread _check (url_mul_check): Threads = [] for i in range (len (ip_data)): #creat thread ... thread = Threading.  Thread (Target=check_one, Args=[url_mul_check,i,]) threads.append (thread) Thread.Start () print "New thread start", I for Thread in Threads:thread.join () #get the ip_data_checked[] for error_cnt in range (Len (available_table)): aseemble_ip = {' http ': ip_data[available_table[error_cnt]]} ip_data_checked.append (aseemble_ip) print " Available proxy IP: ", Len (available_table)

First, GetIP (Url_target): The main function passed in parameter is: Verify proxy IP availability URL, recommended Ipchina

Get proxy IP, from the http://www.xicidaili.com/nn/website, it is a free proxy IP site, but the inside of the IP is not all available, and combined with your actual location, network conditions, access to the target server, etc., can use less than 20 %, at least my case is like this.

To access the http://www.xicidaili.com/nn/Web site using the normal way, the returned page content through regular queries to obtain the required IP and corresponding port, the code is as follows:

Patternip = Re.compile (R ' (?<=<td>) [\d]{1,3}\.[ \d]{1,3}\. [\d] {1,3}\. [\d] {1,3} ') Patternport = Re.compile (R ' (?<=<td>) [\d]{2,5} (?=</td>) ') ... Findip = Re.findall (patternIP,str (content)) Findport = Re.findall (patternport,str (content))

You can refer to other articles about how to construct regular expressions:

The obtained IP is saved in Findip, the corresponding port is in Findport, the two are indexed by index, and the normal number of one page IP is 100.

Next IP and Port stitching

Final usability Check

Second, Check_one (url_check,i): Thread function

This access Url_check still use normal access, when the access page has returned, then the proxy IP is available, the current index value is recorded, used for the subsequent removal of all available IP.

Third, Mul_thread_check (Url_mul_check): Multi-threaded generation

This function turns on multithreading to check proxy IP availability, and each IP opens a thread for checking.

This project calls GetIP () directly and passes in the URL for checking availability to return a list of IP lists that have been checked for availability, in the format

[' Ip1:port1 ', ' Ip2:port2 ',....]

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python get instance sharing of proxy IP

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Python get instance sharing of proxy IP

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support