There are three main ways to access a Web page using python: Urllib, Urllib2, Httplib
Urllib simple, relatively weak function, Httplib simple and powerful, but does not support session
1. The simplest page access (get the server-side response package)
Res=urllib2.urlopen (URL)
Print Res.read ()
2. Plus the data to get or post
data={"name": "Hank", "passwd": "HJZ"}
Urllib2.urlopen (URL, urllib.urlencode (data))
3. Add the HTTP header
header={"user-agent": "mozilla-firefox5.0"}
Urllib2.urlopen (URL, urllib.urlencode (data), header)
#转载其他的:
Using opener and Handler
Opener = Urllib2.build_opener (handler)
Urllib2.install_opener (opener)
4. Add session
CJ = Cookielib. Cookiejar ()
Cjhandler=urllib2. Httpcookieprocessor (CJ)
Opener = Urllib2.build_opener (Cjhandler)
Urllib2.install_opener (opener)
5. Plus BASIC Certification
Password_mgr = Urllib2. Httppasswordmgrwithdefaultrealm ()
Top_level_url = "http://www.163.com/"
Password_mgr.add_password (None, top_level_url, username, password)
Handler = Urllib2. Httpbasicauthhandler (Password_mgr)
Opener = Urllib2.build_opener (handler)
Urllib2.install_opener (opener)
6. Using proxies
Proxy_support = Urllib2. Proxyhandler ({"http": "http://1.2.3.4:3128/"})
Opener = Urllib2.build_opener (Proxy_support)
Urllib2.install_opener (opener)
7. Set timeout
Socket.setdefaulttimeout (5)
Libraries handling HTTP protocols in Python: URLLIB2