Python gets a free agent available

Source: Internet
Author: User

Python gets a free agent available

When using crawlers to crawl the same site, often by the site's IP anti-crawler mechanism to be banned, this can be resolved by using a proxy. There are a lot of sites on the Internet that offer the latest free proxy lists. There are a number of proxy hosts that are available in these lists, but some are unavailable, so you need to filter them further. Python makes it very easy to filter out the list of available proxies.

To provide free proxy information for the site IPCN country free agent For example, here is a crawl to the site provided by the agent information and filter available proxy host program. Mainly used in requests and lxml, the detailed code is:

#-*-Coding:utf-8-*-ImportRequests fromlxmlImportEtree def get_proxies_from_site():URL =' http://proxy.ipcn.org/country/'XPath ='/html/body/div[last ()]/table[last ()]/tr/td/text () 'r = requests.get (URL) tree = etree. HTML (r.text) results = Tree.xpath (XPath) proxies = [Line.strip () forLineinchResultsreturnProxies#使用http://LWONS.COM/WX Web page to test if the agent host is available def get_valid_proxies(proxies, Count):URL =' HTTP://LWONS.COM/WX 'results = [] cur =0     forPinchProxies:proxy = {' http ':'/http '+ P} succeed =False        Try: R = Requests.get (URL, proxies=proxy)ifR.text = =' Default ': Succeed =True        exceptException, E:Print ' ERROR: ', p succeed =False        ifSUCCEED:Print ' succeed: ', p Results.append (p) cur + =1            ifCur >= Count: Breakif__name__ = =' __main__ ':Print ' Get '+ STR (len (get_valid_proxies (), Get_proxies_from_site (), -))) +' proxies '

Python gets a free agent available

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.