Proxy type: Transparent proxy anonymous agent obfuscation agent and high stealth proxy. Here are some Python crawlers using the agent knowledge, there is a proxy pool class. It is easy to handle all kinds of complex crawl problems in the work.
Urllib Module Use proxy
Urllib/urllib2 using proxies is cumbersome, you need to build a Proxyhandler class, then use that class to build the class of opener that the Web page opens, and then install the opener in the request.
The proxy format is "http://127.0.0.1:80", if the account password is "http://user:password@127.0.0.1:80".
Proxy= "HTTP://127.0.0.1:80" # Create a Proxyhandler object Proxy_support=urllib.request.proxyhandler ({' http ':p Roxy}) # Create a Opener Object opener = Urllib.request.build_opener (proxy_support) # Load Openerurllib.request.install_opener for request ( Opener) # Open a urlr = Urllib.request.urlopen (' http://youtube.com ', timeout = 500)
Requests module Use Proxy
Requests using proxies is much simpler than urllib ... Here is an example of a single agent. Multiple words can be constructed using the session class.
If you need to use a proxy, you can configure a single request by providing the proxies parameter for any request method:
Import requestsproxies = { "http": "http://127.0.0.1:3128", "https": "http://127.0.0.1:2080",}r= Requests.get ("http://youtube.com", proxies=proxies) print R.text
You can also configure proxies with environment variables http_proxy and Https_proxy.
Export http_proxy= "http://127.0.0.1:3128" Export https_proxy= "http://127.0.0.1:2080" python>>> Import Requests>>> r=requests.get ("http://youtube.com") >>> print R.text
If your proxy needs to use HTTP Basic Auth, you can use the http://user:password@host/syntax:
Proxies = { "http": "http://user:pass@127.0.0.1:3128/",}
Python proxy use is very simple, the most important thing is to find a network stable and reliable agent, there are questions welcome message questions