When using Python Crawlers, we sometimes use IP proxies. Inadvertently found a free proxy IP site:http://www.xicidaili.com/nn/. However, the discovery of a lot of IP is not used. Therefore , a script is written in Python, which can detect the proxy IP that can be used . The script is as follows:
#encoding =utf8import urllib2from bs4 import beautifulsoupimport urllibimport socketuser_agent = ' mozilla/5.0 (Windows NT 6.3; WOW64; rv:43.0) gecko/20100101 firefox/43.0 ' header = {}header[' user-agent '] = User_agent ' Get all proxy IP address ' def getproxyip (): Pro XY = [] for i in range: Try:url = ' http://www.xicidaili.com/nn/' +str (i) req = Urllib 2.Request (url,headers=header) res = Urllib2.urlopen (req). Read () soup = beautifulsoup (res) ips = Soup.findall (' tr ') for X in range (1,len (IPS)): IP = ips[x] TDs = Ip.findal L ("td") Ip_temp = tds[1].contents[0]+ "\ t" +tds[2].contents[0] Proxy.append (ip_temp) ex Cept:continue return proxy ' verifies that the obtained proxy IP address is available ' def validateip: url = ' Http://ip.chinaz.com/getip. aspx "f = open (" E:\ip.txt "," W ") Socket.setdefaulttimeout (3) for I in range (0,len (proxy)): try: ip = proxy[i].Strip (). Split ("\ t") Proxy_host = "/http" +ip[0]+ ":" +ip[1] proxy_temp = {"http":p Roxy_host} res = Urllib.urlopen (url,proxies=proxy_temp). Read () f.write (proxy[i]+ ' \ n ') print Proxy[i] Except Exception,e:continue f.close () if __name__ = = ' __main__ ': Proxy = Getproxyip () validateip (proxy )
After successful operation, open the file under E, you can see the following proxy IP address and port available:
This is just the IP address of the first page crawled, and you can crawl a few more pages if you need to. At the same time, the site is always updated, it is recommended to crawl only the first few pages.
"Python script"-python find available proxy IPs