First, the most basic application
Import urllib2
url = R ' http://www.baidu.com '
html = urllib2.urlopen (URL). Read ()
print HTML
Client and server side through request and response to communicate, the client first sends request to the service side, then receives the service side return response
The URLLIB2 provides a request class that allows the user to construct an object of requests prior to sending it, and then send the request through the Urllib2.urlopen method
Import urllib2
url = R ' http://www.baidu.com '
req = urllib2. Request (URL)
html = Urllib2.urlopen (req). Read ()
print HTML
The previous example uses the
req = Urllib2. Request (URL)
Instantiate a Resquest object, and then use the
To open this page.
We noticed that when the request object was instantiated, the team URL was required and there were several default parameters
Kizhong data and header are also used more, some need to log in to browse the site often need these two parameters
Import urllib
import urllib2
url = ' http://www.baidu.com/'
values = {' name ': ' Michael foord ', ' Location ': ' N Orthampton ', ' language ': ' Python '}
data = Urllib.urlencode (values)
req = Urllib2. Request (url,data)
response = Urllib2.urlopen (req)
the_page = Response.read ()
print The_page
This example is to send a few data to Baidu, this example will return an error page, very normal, because we visit Baidu when we do not need to post what information, post it will be wrong
Baidu is unable to find the corresponding page will be an error.
Of course this is post data, can also be used in the Get method, slightly the above code to transform
Baidu is through Http://www.baidu.com/s?wd=XXX to query, so we need to {' WD ': ' XXX '} This dictionary for UrlEncode
#coding: utf-8
import urllib
import urllib2
url = ' http://www.baidu.com/s '
values = {' WD ': ' Yang Yanxing '}
data = Urllib.urlencode (values)
print data
url2 = url+ '? ' +data
response = Urllib2.urlopen (url2)
the_page = Response.read ()
print The_page
The following is a mock login renren and then display the contents of the first page as an example to detail the use of cookies, the following is the example given in the document, we have to change this example to achieve the functionality we want
Import Cookielib, urllib2
CJ = Cookielib. Cookiejar ()
opener = Urllib2.build_opener (urllib2. Httpcookieprocessor (CJ))
r = Opener.open ("http://example.com/")
#coding: utf-8
Import Urllib2,urllib
import cookielib
url = R ' http://www.renren.com/ajaxLogin '
#创建一个cj的cookie的容器
CJ = Cookielib. Cookiejar ()
opener = Urllib2.build_opener (urllib2. Httpcookieprocessor (CJ)
#将要POST出去的数据进行编码
data = Urllib.urlencode ({"Email": Email, "Password":p ass})
r = Opener.open (url,data)
print CJ
When you see CJ, you have access to the login page, whether the normal login you can not see it now, access to http://www.renren.com/home to view
The above code has two points to explain, I also watched for a long time to understand
Why use the opener object to open instead of utllib2,urlopen? It's not just in the examples that we write, we can also use Urllib2.urlopen by retrofitting, In fact, because opener is created by Urllib2.bulid_opener, but you can understand, he built out, but did not install the use of it, there is no its properties and methods, if you want to make URLLIB2 also have opener properties and methods, You can use Urllib2.install_opener (opener) to "install" This opener, after installation can use URLLIB2 to operate the
#coding: utf-8
import urllib2,urllib
import cookielib
url = R ' http://www.renren.com/ajaxLogin '
# Create a container of CJ's Cookies
CJ = Cookielib. Cookiejar ()
opener = Urllib2.build_opener (urllib2. Httpcookieprocessor (CJ))
Urllib2.install_opener (opener)
#将要POST出去的数据进行编码
data = Urllib.urlencode ({" Email ": Email," Password ":p ass})
#r = Opener.open (url,data) If you do not have the above Urllib2.install_opener method, you must write
r = Urllib2.urlopen (url,data)
html = urllib2.urlopen (' Http://www.renren.com/home '). Read ()
print HTML
Similarly URLLIB2 also has the proxy related handle, the basic idea and this is similar.
Ii. Exception Handling
A urlerror exception is caused when urlopen () cannot handle the response. The Httperror exception is a subclass of Urlerror that is only caused when the URL of the HTTP type is accessed.
1. Urlerror Anomaly
The reason for Urlerror is that there is no network connection (no route to the destination server), and the destination server that is being accessed does not exist. In this case, the exception object has the reason attribute (a tuple of the error code, the cause of the error).
#! /usr/bin/env python
#coding =utf-8
import urllib2
url= "http://www.baidu.com/"
try:
response= Urllib2.urlopen (URL)
except Urllib2. Urlerror,e:
Print E.reason
2, Httperror
each HTTP response returned from the server has a status code. Among them, some status codes indicate that the server can not complete the corresponding request, the default handler can handle some such status code for us (such as the return of the response is redirected, URLLIB2 will automatically for us to get information from the redirected page). Some state codes, URLLIB2 modules can not help us deal with, then the Urlopen function will cause httperror anomalies, which typically have 404/401.
An instance of the Httperror exception has the code attribute of the integer type that represents the error status code returned by the server.
The default handler for the URLLIB2 module can handle redirection (the status code is 300 range), and the status code is successful in the 100-299 range. Therefore, the range of state codes that can cause httperror anomalies is: 400-599.
When an error is caused, the server returns an HTTP error code and an error page. You can use the Htperror instance as a return page, which means that the Httperror instance has not only the code attribute, but also read, Geturl, and info.
#! /usr/bin/env python
#coding =utf-8
import urllib2
url= "Http://cs.scu.edu.cn/~duanlei"
try:
Response=urllib2.urlopen (URL)
except Urllib2. Httperror,e:
print e.code
print e.read ()
3, Summary
if you want to handle Urlerror and Httperror in your code in two ways, the code is as follows:
#! /usr/bin/env python
#coding =utf-8
import urllib2
url= "xxxxxx" #需要访问的URL
try:
response= Urllib2.urlopen (URL)
except Urllib2. Httperror,e: #HTTPError必须排在URLError的前面
print "The server couldn ' t fulfill the request"
print "Error code:", E.code
print "Return content:", E.read ()
except URLLIB2. Urlerror,e:
print "Failed to reach" the server "
print" The Reason: ", E.reason
else:
#something Should do pass
#其他异常的处理
#!/usr/bin/env python
#coding =utf-8
import urllib2
url= "http:// xxx "#需要访问的URL
try:
response=urllib2.urlopen (URL)
except Urllib2. Urlerror,e:
If Hasattr (E, "Reason"):
print "Failed to reach the server"
print "Reason:", E.reason
elif hasattr (E, "code"):
print "The server couldn ' t fulfill the request"
print "Error code:", E.code
print ' return content: ', E.read ()
else: Pass
#其他异常的处理
In comparison, the second method of exception handling is more excellent.