Web crawling : The network resources specified in the URL address are read from the network stream and saved locally.
In Python, use urllib2 to crawl Web pages. provides a very simple interface in the form of a urlopen function
function: Urlopen (url, data , timeout url: URL data: To be transferred when the URL is accessed timeout: Set time-out
1 Import Urllib2 2 response=urllib2.urlopen ('http://www.hao123.com') # Called the Urlopen method inside the Urllib2 library, passing in a URL3 html=response.read () #Response object has a Read method, You can return to the page content you obtained 4 print (HTML)
URLLIB2 uses a Request object to map the HTTP request you made.
In its simplest form of use, you will create a request object with the address you want,
By calling Urlopen and passing in the request object, a related request response object is returned.
This response object is like a file object, so you can call. Read () in response.
1 Import Urllib2 2 req=urllib2. Request ('http://www.hao123.com')3 response=Urllib2.urlopen ( REQ) # Return information is stored inside the response object 4 the_page=response.read ()5 Print (The_page)
Python Notes-Crawler 2