In front of the Urllib2 simple introduction, the following collation of a part of the use of urllib2 details.
setting of 1.Proxy
URLLIB2 uses environment variable HTTP_PROXY to set HTTP proxy by default.
If you want to explicitly control the proxy in your program without being affected by the environment variables, you can use the proxy.
Create a new test14 to implement a simple proxy demo:
Import urllib2
enable_proxy = True
Proxy_handler = urllib2. Proxyhandler ({"http": ' http://some-proxy.com:8080 '})
Null_proxy_handler = Urllib2. Proxyhandler ({})
if enable_proxy:
opener = Urllib2.build_opener (Proxy_handler)
else:
opener = Urllib2.build_opener (Null_proxy_handler)
Urllib2.install_opener (opener)
One detail to note here is that using Urllib2.install_opener () sets the URLLIB2 global opener.
This will be convenient to use later, but can not do more detailed control, such as in the program to use two different Proxy settings.
It is a good practice to change the global setting without using Install_opener, instead of simply calling the opener open method instead of the global Urlopen method.
2.Timeout settings
In the old Python (Python2.6), the Urllib2 API did not expose Timeout settings, and to set the Timeout value, only the Socket's global Timeout value could be changed.
Import urllib2
Import socket
Socket.setdefaulttimeout (10) # 10 seconds later timeout
urllib2.socket.setdefaulttimeout ( 10) # Another way
After Python 2.6, timeouts can be set directly through the timeout parameters of Urllib2.urlopen ().
Import urllib2
response = Urllib2.urlopen (' http://www.google.com ', timeout=10)
3. Add a specific Header to the HTTP Request
To join the header, you need to use the Request object:
Import urllib2
request = Urllib2. Request (' http://www.baidu.com/')
request.add_header (' user-agent ', ' fake-client ')
response = Urllib2.urlopen (Request)
print response.read ()
For some headers to pay special attention, the server will check for these headers
User-agent: Some servers or proxies will use this value to determine whether the browser is making a request
Content-type: When using the REST interface, the server checks the value to determine how the contents of the HTTP body are parsed. The common values are:
Application/xml: Used when XML RPC, such as a restful/soap call
Application/json: Used when JSON RPC calls
application/x-www-form-urlencoded: Use when browsers submit Web forms
When using a server-supplied RESTful or SOAP service, content-type Setup errors can cause the server to deny service