URLLIB2 Basic Operation 1, open Web page (urlopen)
Open a Web page
Import urllib2response = Urllib2.urlopen (' http://www.baidu.com ') html= response.read () print HTML
Urlopen commonly used has three parameters, its parameters are as follows:
Urllib.requeset.urlopen (Url,data,timeout)
Use of the data parameter (GET)
Import urllib Import urllib2 data = {' email ': ' myemail ', ' Password ': ' Password '}
response= Urllib.urlopen ("%s?%s"% (URI, params))
use of the data parameter (POST)
Import urllib Import urllib2 data = {' email ': ' myemail ', ' Password ': ' Password '}
So if we add the data parameter as a POST request, if there is no data parameter is the GET request way
Use of the timeout parameter
In some cases of poor network conditions or server-side exceptions, the request is slow, the request set a time-out
Import urllib2response = Urllib2.urlopen (' http://www.baidu.com ', timeout=1) print (Response.read ())
2. Open Web page (request)
Open a Web page
Import urllib.requestrequest = urllib.request.Request (' https://www.baidu.com ') response = Urllib.request.urlopen ( Request) print (Response.read (). Decode (' Utf-8 '))
Specify the request header
Import urllib2# make request Header headers = {"User-agent": "mozilla/5.0 (Windows NT 10.0; WOW64) "}# Package Request requests = Urllib2. Request (Url=url, headers=headers) response = Urllib2.urlopen (request) content = Response.read (). Decode (' Utf-8 ') print Content
3. Advanced
Add Agent
# custom headersheaders = { ' Host ': ' www.dianping.com ', ' Cookie ': ' jsessionid=f1c38c2f1a7f7bf3bcb0c4e3ccdbe245 aburl=1; cy=2, " " User-agent ': "mozilla/5.0 (Windows; U Windows NT 6.1; En-US) applewebkit/532.5 (khtml, like Gecko) chrome/4.0.249.0 safari/532.5 ", }proxy_handler = urllib2. Proxyhandler ({' http ': ' Http://host:port '}) opener = Urllib2.build_opener (Proxy_handler) Urllib2.install_opener ( Opener) Request = Urllib2. Request (URL, headers=headers) response = Urllib2.urlopen (request) content = Response.read (). Decode (' Utf-8 ')
Manipulating Cookies
Import urllib2import Cookielibimport Jsoncookie = cookielib. Cookiejar () cookie_s = Urllib2. Httpcookieprocessor (Cookie) # Create cookie Processor opener = Urllib2.build_opener (cookie_s) # Build Openerurllib2.install_ Opener (opener) response= Urllib2.urlopen (' http://www.dianping.com '). Read () # reads the contents of the specified Web site CJ = urllib2. Httpcookieprocessor (cookie) Print Response # page html# view Cookieprint cookie, type (cookie) for item in cookie: print ' name: ' + item.name + '-value: ' + item.value
Save cookies
Def savecookie (): # Sets the file that holds the cookie filename = ' cookie.txt ' # declares a Mozillacookiejar object to save the cookie, and then writes the file Cookie = Cookielib. Mozillacookiejar (filename) # Creates a cookie processor handler = Urllib2. Httpcookieprocessor (Cookie) # Build opener opener = Urllib2.build_opener (handler) # Create request res = Opener.open (' http://www.baidu.com ') # Save cookies to file # Ignore_discard means that even if cookies are discarded, they will be saved. Ignore_expires means that if the cookie already exists in the file, overwrite the original file to write Cookie.save (ignore_discard=true, Ignore_expires=true)
Remove a cookie from a file
Def getcookie (): # Create a Mozillacookiejar object cookie = cookielib. Mozillacookiejar () # Read cookie content from file to variable cookie.load (' Cookie.txt ', Ignore_discard=true, ignore_expires= True) # Print the cookie content to prove that the cookie was successful for the item in cookie: print ' name: ' + item.name + '-value: ' + item.value # Lee Create a opener handler = Urllib2 with the cookie you acquired . Httpcookieprocessor (cookie) opener = Urllib2.build_opener (handler) res = opener.open ('/HTTP/ Www.baidu.com ') print res.read ()
An example.
Def my_cookie_test (): headers = {' user-agent ': "mozilla/5.0 (Windows; U Windows NT 6.1; En-US) applewebkit/532.5 (khtml, like Gecko) chrome/4.0.249.0 safari/532.5 ", ' Accept ': ' text/html,application/xhtml +xml,application/xml;q=0.9,*/*;q=0.8 ', ' accept-language ': ' zh-cn,zh;q=0.8,en;q=0.6,zh-tw;q=0.4 ', ' Connection ': ' keep-alive ', ' Cookie ': ' cy=2; _LXSDK_CUID=16000A1A16CC8-0629D2CA3B9F7-40544230-100200-16000A1A16DC8; _LXSDK=16000A1A16CC8-0629D2CA3B9F7-40544230-100200-16000A1A16DC8; _lxsdk_s=16000a1a16f-c56-870-2aa%7c%7c23; _hc.v=44792549-7147-7394-ac0a-eefed1fa19a2.1511839081; S_viewtype=10 ', ' Host ': ' www.dianping.com ', ' Referer ': ' Http://www.dianping.com/shop ', ' upgrade-insec Ure-requests ': 1} # Request Cookie cj_a = Cookielib. Cookiejar () cj_s = Urllib2. Httpcookieprocessor (cj_a) proxy_s = Urllib2. Proxyhandler ({' http ': ' 0.0.0.0:8080 '}) opener = Urllib2.build_opener (proxy_s, cj_s) Urllib2.install_opener (opener) TrY:request = Urllib2. Request ("Http://www.dianping.com/shop/000000/", headers=headers) response = Urllib2.urlopen (Request) content = Response.read (). Decode (' Utf-8 ') # HTML print content cookie_data = {} for item in CJ_A: # print ' Request: Name: ' + item.name + '-value: ' + item.value cookie_data[item.name] = Item.value Co Okie_str = Json.dumps (Cookie_data) with open (' Cookie.txt ', ' W ') as F:f.write (cookie_str) print (" Cookie information has been saved to local ") except Exception as E:print E
Web Information Extraction ... Waiting for the next period ...
Urllib module Use