Python's Web Module learning--URLLIB2

Source: Internet
Author: User

The following continues to learn the Python Web module--- urllib2, a module that originates from urllib and is higher than urllib.


1 Urllib2 Introduction

URLLIB2 is a library of Python-brought access to Web pages and local files.

with the Urllib In contrast, the notable difference is:

1) urllib2 can accept an instance of a request class to set the headersof the URL request,urllib Only URLs can be accepted. This means that it is not possible to disguise the user agent string when using Urllib.

2) Urllib provides the UrlEncode method to encode the datasent, and Urllib2 does not. This is why Urllib often uses it with URLLIB2.


2 Urllib2 Common methods

2.1 Urllib2.urlopen

Urlopen () is the simplest way to request, open a URL and return a class file object, and use that object to read the returned content

Urllib2.urlopen (url[, data][, timeout]) parameter: URL: Can be a string that contains a URL, or it can be an instance of a Urllib2.request class.    Data: is a coded post (typically encoded using Urllib.urlencode ()). A GET request when no data parameter is set, the data parameter is a POST request timeout: is an optional timeout period (in seconds), sets the time-out for request blocking, and, if not set, the global default timeout parameter, which is used only for HTTP, HTTPS, FTP in effect

Suppose Urlopen () returns the File Object U, which supports the following common methods:

    • U.read ([nbytes]) reads nbytes data as a byte string

    • U.readline () reads a single line of text as a byte string

    • U.readlines () reads all input rows and returns a list

    • U.close () Close link

    • U.getcode () returns an integer HTTP response code, such as a successful return of 200, and 404 when no file is found

    • U.geturl () returns the actual URL of the returned data, but takes into account the redirection issue

    • U.info () Returns the mapping object with the information associated with the URL, and the server response that is returned contains the HTTP header for HTTP. For FTP, the returned header contains ' Content-length '. For local files, the returned header contains the ' Content-length ' and ' Content-type ' fields.

Attention:

The class file object U operates in binary mode. If you need to process the response data as text, you need to decode the data using the codecs module or a similar method.

Attached code:

>>> Import urllib2>>> res=urllib2.urlopen (' http://www.51cto.com ') >>>res.read () ... (a bunch of source code) >>>res.readline () ' <! DOCTYPE HTML PUBLIC "-//w3c//dtdxhtml 1.0 transitional//en" "Http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd ">\r\n ' >>>res.readlines () ... (List form of a bunch of source code) >>>res.info () 

2.2 Urllib2.request

Create a new request instance

    request  (url [data,headers[,origin_ Req_host ,[unverifiable]]):     for a relatively simple request, the parameter URL of Urlopen () is a proxy URL, but if more complex operations are required, If you modify the HTTP header, you can create a request instance and:    url:  it as a URL parameter parameter as a URI string,     data:   is the data that accompanies the URL submission (such as the data to post). Note, however, that when you provide the data parameter, it changes the HTTP request from ' GET ' to ' POST '.     headers:  is a dictionary that contains the key-value mappings that represent the HTTP header (that is, what is included in the header to be submitted).     origin_req_host:  is typically the name of the host that makes the request, if the request is a URL that cannot be verified (usually a URL that is not directly entered by the user, such as a URL that is embedded in the page that loads the image). The next parameter, unverifiable, is set to True 

Assumptions Request Example R , the following are some of the more important methods:

    • R.add_data adds data to the request. If the request is an HTTP request, the method changes to ' POST '. Data is submitted to the specified URL, and it is important to note that the method does not track data to any of the previous settings, but replaces the previous one with the current one.

    • R.add_header (Key, Val) adds header information to the request, key is the header name, Val is the header value, and two parameters are strings.

    • The R.addunredirectedheader (key,val) function is the same as above, but is not added to the redirect request.

    • R.set_proxy (host, type) prepares the request to the server. Replace the original host with host and replace the original request type with type.

Attached code:

1 submitting data to a Web page:

>>> Import urllib>>> Import urllib2>>> url= ' http://www.51cto.com ' >>> info={' name ': ' 51cto ', ' Location ': ' 51cto '} #info需要被编码为urllib2能理解的格式, here is urllib>>> Data=urllib.urlencode (info) > >> data ' Name=51cto&location=51cto ' >>> request=urllib2. Request (Url,data) >>> Response=urllib2.urlopen (Request) >>> The_page=response.read ()

2 Modifying page header information:

Sometimes, the program is also right, but the server denies your access. What is this for? The problem is the header information in the request (header). Some services have a neat, not like the program to touch it. At this point you need to disguise your program as a browser to make a request. The method of the request is included in the header.


When using the REST interface, the Server checks the Content-type field to determine how the content in the HTTP Body should be parsed.


>>> Import urllib>>> Import urllib2>>> url= ' http://www.51cto.com ' # will user_agent write header information > >> user_agent= ' mozilla/4.0 (compatible; MSIE 5.5; windowsnt) ' >>>values={' name ': ' 51cto ', ' Location ': ' 51cto ', ' language ': ' Python '} >>> headers={' User-agent ':user_agent}>>> data=urllib.urlencode (values) >>> req=urllib2. Request (url,data,headers) >>> Response=urllib2.urlopen (req) >>> The_page=response.read ()

2.3 Exception Handling

Cannot handle a respons when Urlopen throws a Urlerror


Urllib2. Urlerror:urllib2. Httperror:

Httperror is a subclass of the urlerror that the HTTP URL is thrown in under special circumstances.


Urlerror:
Usually,Urlerroris thrown because there is no network  connection (no connection to a specific server) or a specific server does not exist. In this case, the inclusion ofreasonThe exception to the property is thrown, in a way that contains the error code  and text error messages.tupleform.


#!/usr/bin/env python#-*-coding:utf-8-*-import urllib2# wrote one more m (comm) req = urllib2. Request (' Http://www.51cto.comm ') try:urllib2.urlopen (req) except URLLIB2. Urlerror,e:print e Print E.reason

Results:


<urlopen error [Errno 11004] getaddrinfo failed>[errno 11004] getaddrinfo failed


The above is a simple usage of URLLIB2, if you want to drill down:

Http://zhuoqiang.me/python-urllib2-usage.html

The difference between Urllib and URLLIB2:

Http://www.cnblogs.com/yuxc/archive/2011/08/01/2124073.html



This article is from "a struggling small operation" blog, please be sure to keep this source http://yucanghai.blog.51cto.com/5260262/1697135

Python Web Module Learning--URLLIB2

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.