Chapter 1.7 Use of IP proxy scrapy

Source: Internet
Author: User
Tags get ip json redis socket python scrapy

1 Proxy middleware
Code core is to randomly select an agent's IP and port, as for the proxy IP and port source, can be purchased IP proxy, can also be crawled from the Internet.

#-*-coding:utf-8-*-"' Created on June 14, 2017 @author: Dzm ' from eie.middlewares import u Df_config from eie.service.EieIpService import eieipservice logger = Udf_config.logger Eieipservice = Eieipservice () CLA
                        SS Proxymiddleware (Object): ' IP broker middleware ' def process_request (self,request,spider): ' Add IP Proxy "if Request.url.startswith (' http://') on the request, Proxy = Eieipservice.sel
            Ect_rand (' HTTP ') if proxy:proxies = ' http://{}:{} '. Format (proxy[' IP '],proxy[' Port ']) Else:raise Exception (' No suitable IP proxy found ') Else:proxy = Eieipservice.select_rand (' HTTPS
                ' If proxy:proxies = ' https://{}:{} '. Format (proxy[' IP '],proxy[' port ') Else: Raise Exception (' No suitable IP proxy found ') request.meta[' proxy ' = proxies 

2 Settings Configuration
Here need to pay attention to the order between Retrymiddleware and Proxymiddleware, otherwise Python scrapy will not retry the connection timeout, Scrapy will be consistent time-out reconnection, I searched QQ Group, no one answered my question.
The order of the middleware is important, and the detailed description can be consulted: configuration order of Downloader_middlewares middleware in Scrapy

Downloader_middlewares = {
    ' scrapy.downloadermiddlewares.useragent.UserAgentMiddleware ': None,
    ' Eie.middlewares.random_user_agent. Randomuseragent ':,
    ' Eie.middlewares.proxy_middleware. Proxymiddleware ':, '
    scrapy.downloadermiddlewares.retry.RetryMiddleware ':
    Scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware ': ","

Available in the Order of reference

downloader_middlewares_base {' Scrapy.contrib.downloadermiddleware.robotstxt.RobotsTxtMiddleware ': +, ' scrapy.co Ntrib.downloadermiddleware.httpauth.HttpAuthMiddleware ': 300, ' Scrapy.contrib.downloadermiddleware.downloadtimeout.DownloadTimeoutMiddleware ': 350, ' Scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware ': 400, ' Scrapy.contrib.downloadermiddleware.retry.RetryMiddleware ': 500, ' Scrapy.contrib.downloadermiddleware.defaultheaders.DefaultHeadersMiddleware ': 550, ' Scrapy.contrib.downloadermiddleware.redirect.MetaRefreshMiddleware ': 580, ' Scrapy.contrib.downloadermiddleware.httpcompression.HttpCompressionMiddleware ': 590, ' Scrapy.contrib.downloadermiddleware.redirect.RedirectMiddleware ': 600, ' Scrapy.contrib.downloadermiddleware.cookies.CookiesMiddleware ': 700, ' Scrapy.contrib.downloadermiddleware.httpproxy.HttpProxyMiddleware ': 750, ' Scrapy.contrib.downloadermiddleware.chunked.ChunkedTransferMiddleware ': 830, ' scrapy. Contrib.downloadermiddleware.stats.DownloaderStats ': 850, ' Scrapy.contrib.downloadermiddleware.httpcache.HttpCacheMiddleware ': 900,}

3 Check that the IP proxy is successful
from the response in the following figure, it is true that the proxy address

4 agent
IP Proxy and its role and usage are br> the pros and cons of using proxy IP.
5 Get random proxy

def select_rand_zhimadaili (self): "Get IP Proxy list from sesame Agent @see: Http://http.zhimadaili. com/index/api/newapi.html "try:proxy_keys_list = Self.redis.keys () if Proxy_ke 
Ys_list:key_list = Range (1, Len (proxy_keys_list)) key = Key_list[random.choice (key_list)] # proxy_list = Json.load (Self.redis.get (Proxy_keys_list[key])) STRs = Self.redis.get (ProX Y_keys_list[key]) return eval (strs) Else:socket = Urllib2.urlopen (' Http://htt P.zhimadaili.com/index/api/new_get_use_ips.html?num=20&type=2&pro=0&city=0&yys=0&port=1 &time=1 ') result = socket.read () result = Json.loads (result) proxy_lis t = result[' data ' for proxy in proxy_list:name = ' {}:{} '. Format (proxy[' IP '],proxy[' p Ort ']) SELF.REDIS.SEtex (name, proxy, socket.close) key_list = range (1, Len (proxy_list)) Key = Key_list[random.choice (key_list)] return Proxy_list[key] except Exception,e:lo Gger.error (' request zhimadaili.com get agent failed, reason for failure:%s '% e)

The following approach is to randomly find a database from the database is MySQL

def select_rand_db (self,types=none):
        If types:
            sql = "Select Ip,port,types from eie_ip where types= ' {} ' ORDER by R and () Limit 1 ". Format (types)
        else:
            sql =" Select Ip,port,types from Eie_ip ORDER by rand () limit 1 "
        df = PD.R Ead_sql (sql,self.engine)
        results = Json.loads (Df.to_json (orient= ' Records '))
        if results and Len (results) ==1:
            return results[0]
        return None

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.