Urlopen
Import urllib2 2 # sends a request to the specified URL and returns the server Response class file object 4 response = Urllib2.urlopen ("http/ www.baidu.com/") 5 # 7 html = Response.read ( ) 8 print htm
The above code is to open the Baidu homepage "view Source code" content, or relatively simple.
Request
In the last example, the parameter of Urlopen () is a URL address;
However, if you need to perform more complex operations, such as adding HTTP headers, you must create a request instance as a parameter to Urlopen (), and the URL address you need to access as a parameter to the Request instance.
# -*-coding:utf-8-*- Import urllib2 3 "http://www.baidu.com/" 5 # URL as a parameter of the request () method, and constructs a Request object 7 request = Urllib2. Request (URL) 8 # Request object as a parameter of the Urlopen () method, sent to the server to respond to the response = html = print html
Same as the previous run-time
To create a new request instance, you can set two additional parameters in addition to the URL parameter:
Data (default NULL): is a file submitted with the URL (such as the data to post), and the HTTP request will be changed from "GET" mode to "POST" mode.
Headers (default NULL): is a dictionary that contains the key-value pairs of the HTTP headers that need to be sent.
User-agent
Reptiles need to disguise themselves, disguised as a recognized browser
1#-*-coding:utf-8-*-2ImportUrllib23 4 URL ="http://www.baidu.com/"5 headers = {"user-agent":"user-agent:mozilla/5.0 (compatible; MSIE9.0; Windows NT6.1; trident/5.0)"} 6 7 8#URL and headers together as parameters to the request () method and construct a request object9 request = Urllib2. Request (URL, headers=headers)10 11#the Request object is used as a parameter to the Urlopen () method to send the server a responseResponse =Urllib2.urlopen (Request)HTML =Response.read ()15 16PrintHtmlAdd more header information to a specific header
1#-*-coding:utf-8-*-2ImportUrllib23 4 URL ="http://www.baidu.com/"5 headers = {"user-agent":"user-agent:mozilla/5.0 (compatible; MSIE9.0; Windows NT6.1; trident/5.0)"} 6 7 8#URL and headers together as parameters to the request () method and construct a request object9 request = Urllib2. Request (URL, headers=headers)10#call Request.add_header () to add or modify a specific headerRequest = Add_header ("Connectin","keep-alive") 12 13#the Request object is used as a parameter to the Urlopen () method to send the server a responseResponse =Urllib2.urlopen (Request)HTML =Response.read ()17 18PrintHtmlRandomly Add/Modify User-agent
1 ImportUrllib22 ImportRandom3 4URL ="http://www.itcast.cn"5 6Ua_list = [7 "mozilla/5.0 (Windows NT 6.1;) Apple ....",8 "mozilla/5.0 (X11; CrOS i686 2268.111.0) ...",9 "mozilla/5.0 (Macintosh; U PPC Mac OS X ....",Ten "mozilla/5.0 (Macintosh; Intel Mac OS ..." One ] A -User_agent =Random.choice (ua_list) - theRequest =Urllib2. Request (URL) - - #you can also add/modify a specific header by calling Request.add_header () -Request.add_header ("user-agent", User_agent) + - #first letter uppercase, followed by all lowercase +Request.get_header ("user-agent") A atResponse =Urllib2.urlopen (Request) - -HTML =Response.read () - PrintHtml
Basic use of URLLIB2