Python crawler DNS resolution cache Method Instance analysis, python instance analysis
This article describes the DNS resolution cache Method for Python crawlers. We will share this with you for your reference. The details are as follows:
Preface:
This is the core code in the DNS resolution cache module of Python crawlers. It was the code of last year. If you are interested, please take a look.
Generally, the DNS resolution time for a domain name is between 10 and 10 ~ Within 60 milliseconds, this seems insignificant, but it cannot be ignored for large crawlers. For example, if we want to crawl Sina Weibo, there are 10 million requests under the same domain name (this is not too much), it takes 10 ~ Between 0.6 million seconds, only 86400 seconds a day. That is to say, it takes several days for a single DNS resolution item. In this case, the DNS resolution cache is added, and the effect is obvious.
Put the Code directly below, which is explained later.
Code:
# Encoding = UTF-8 # ------------------------------------- # version: 0.1 # Date: 2016-04-26 # Author: Nine tea <bone_ace@163.com> # development environment: win64 + Python 2.7 # ------------------------------------- import socket # from gevent import socket_dnscache ={} def _ setDNSCache (): "DNS cache" "def _ getaddrinfo (* args, ** kwargs): if args in _ dnscache: # print str (args) + "in cache" return _ dnscache [args] else: # print str (args) + "not in cache" _ dnscache [args] = socket. _ getaddrinfo (* args, ** kwargs) return _ dnscache [args] if not hasattr (socket, '_ getaddrinfo'): socket. _ getaddrinfo = socket. getaddrinfo socket. getaddrinfo = _ getaddrinfo
Note:
In fact, there is no difficulty, that is, saving the cache in the socket to avoid repeated acquisition.
You can put the above Code in a dns_cache.py file, and call this in the crawler framework._setDNSCache()
Method.
Note that if you use the gevent coroutine and usemonkey.patch_all()
Note that the crawler has switched to the socket in gevent, And the DNS resolution cache module should also use the socket of gevent.