[Python crawler learning notes (1)] urllib2 database knowledge point summary, pythonurllib2
1. Concepts of opener and handler in urllib2 1.1Openers:
When you get a URL, you use an opener (an instance of urllib2.OpenerDirector ). Normally, we use the default opener: urlopen. But you can create personalized openers. You can use build_opener to create opener objects. It can be used in applications that need to process cookies or do not require redirection (You will want to create openers if you want to fetch URLs with specific handlers installed, for example to get an opener that handles cookies, or to get an opener that does not handle redirections .)
The following describes how to use handler and opener to simulate logon with a proxy ip address (cookie processing required.
1 self. proxy = urllib2.ProxyHandler ({'HTTP ': self. proxy_url}) 2 self. cookie = cookielib. LWPCookieJar () 3 self. cookie_handler = urllib2.HTTPCookieProcessor (self. cookie) 4 self. opener = urllib2.build _ opener (self. cookie_handler, self. proxy, urllib2.HTTPHandler)
1.2 Handles:
Openers uses the processor handlers, and all the "heavy" Work is handled by handlers. Each handlers knows how to enable URLs through a specific protocol, or how to handle various aspects of URL opening. For example, HTTP redirection or HTTP cookies.