Proxy: transparent proxy anonymous proxy obfuscation proxy and high-concurrency proxy here write some knowledge about using a python crawler proxy, and a proxy pool class to help you deal with the proxy type (proxy ): transparent proxy anonymous proxy obfuscation proxy and high-risk proxy. here I will write some knowledge about using the proxy for python crawlers and a class for the proxy pool. this makes it easy for you to cope with various complicated crawling problems in your work.
Use proxy for the urllib module
Urllib/urllib2 is troublesome to use proxies. you need to first build a ProxyHandler class, then use this class to build the opener class opened by the web page, and then install this opener in the request.
The proxy format is "http: // 127.0.0.1: 80", if you want the account password is "http: // user: password@127.0.0.1: 80 ".
Proxy = "http: // 127.0.0.1: 80"
# Create a ProxyHandler object proxy_support = urllib. request. proxyHandler ({'http ': proxy}) # create an opener object opener = urllib. request. build_opener (proxy_support) # load openerurllib to the request. request. install_opener (opener) # Open a urlr = urllib. request. urlopen ('http: // youtube.com ', timeout = 500)
Use proxy for the requests module
Requests using proxy is much simpler than urllib... The following uses a single proxy as an example. if multiple proxies are used, you can use session to construct one class.
To use a proxy, you can configure a single request by providing the proxies parameter for any request method:
import requestsproxies = { "http": "http://127.0.0.1:3128", "https": "http://127.0.0.1:2080",}r=requests.get("http://youtube.com", proxies=proxies)print r.text
You can also configure the proxy through the environment variables HTTP_PROXY and HTTPS_PROXY.
export HTTP_PROXY="http://127.0.0.1:3128"export HTTPS_PROXY="http://127.0.0.1:2080"python>>> import requests>>> r=requests.get("http://youtube.com")>>> print r.text
If your proxy needs to use HTTP Basic Auth, you can use http: // user: password @ host/syntax:
proxies = { "http": "http://user:pass@127.0.0.1:3128/",}
Python proxy is very simple to use. The most important thing is to find a stable and reliable proxy. if you have any questions, please leave a message.