Python Crawler requests Module

Source: Internet
Author: User
Tags http authentication http post ssl certificate

Seven main methods of the requests library

1. Requests.requests (method, URL, **kwargs)

Constructs a request that supports the underlying methods of the following methods

Method: The request method, corresponding to the Get/put/post seven methods;

URL: The URL link to get the page;

**kwargs: Control access parameters, a total of 13;

Method: Request Method

Get: A resource that requests a URL location;

Head: Gets the header information of the resource;

Post: The request to the URL location of the resource after the addition of new data;

PUT: The request stores a resource to the URL location, overwriting the resource at the original URL location;

PATCH: A resource that requests a local update of the URL location, which changes some of the contents of the resource;

Delete: request to delete the resource stored in the URL location;

**kwargs: Parameters that control access are optional

Params: A dictionary or sequence of bytes added to the URL as a parameter;

Data: dictionary, byte sequence or file object, asthe content of the Reque sts;

Json:json format data, as the content of requests;

Headers: dictionary, HTTP custom header;

Cookies: Cookies in dictionaries or cookiejar,requests;

Auth: Tuple, support HTTP authentication function;

Files: dictionary type, transfer files;

Timeout: Sets the timeout time, in seconds;

Proxies: The dictionary type, sets the Access Proxy server, may increase the registration authentication;

Allow_redirests:true/flase, the default is True, redirect switch;

Stream:true/false, the default is True, gets the content to download the switch immediately;

Verify:true/false, the default is True, authentication SSL certificate switch;

Cert: Local SSL certificate path;

2. Request.get (URL, params=none, **kwargs)

Gets the main method of the HTML page, corresponding to the HTTP get;

Params:url additional parameters, dictionary or byte stream format, optional;

**kwargs:12 a control access parameter;

3.requests.head (URL, **kwargs)

Get HTML page header information method, corresponding to the head of HTTP;

**kwarge:12 a control access parameter;

4.requests.post (URL, Data=none, Json=none, **kwargs)

The method of submitting a POST request to an HTML page, corresponding to the HTTP post;

data; dictionary, byte sequence or file, requests content;

Json:json format data, requests content;

**kwargs:12 a control access parameter;

5.requests.put (URL, Data=none, **kwargs)

Submit the put request method to the HTML page, corresponding to the HTTP put;

Data: dictionary, byte sequence or file, requests content;

**kwargs:12 a control access parameter;

6. Requests.patch (URL, data=none, **kwargs)

Submit the local repair request to the HTML page, corresponding to the HTTP patch;

Data: dictionary, byte sequence or file, requests content;

**kwagrs:12 a control access parameter;

7. Requests.delete (Uel, **kwagrs)

Submit the delete request to the HTML page, and delete the corresponding HTTP;

**kwagrs:12 the parameters of the access control;

two important objects of the requests library

response properties of an object

1. R.status_code

The return status of the HTTP request, 200 indicates success of the link, 404 or other indicates failure;

2. R.text

The string form of the HTTP response content, that is, the URL corresponding to the page content;

3. r.encoding

The encoding of the response content guessed from the HTTP header;

If CharSet is not present in the header, the encoding is considered Iso-8859-1,r.text to display the Web page content according to R.encoding;

4. r.apparent_encoding

The Response content encoding method (alternative coding method) is analyzed from the content;

5. R.content

The binary form of the HTTP response content;

exceptions to the requests library

1. Requests. Connectionerror

  Network connection error exception, such as DNS query failure, reject links, etc.;

2. Requests. Httperror

HTTP error exception;

3. Requests. Urlrequired

URL missing exception;

4.requests. Toomanyredirects

Exceeding the maximum number of redirects, resulting in a redirect exception;

5. Requests. ConnectTimeout

Connection remote server timeout exception;

6. Requests. Timeout

Request URL timeout, resulting in timeout exception;

7.r.raise_for_status ()

If not 200, an abnormal requests is generated. Httperror;

Common code Framework for crawling Web pages
Import requestsdef gethtmltext (URL):    try:        r = requests.get (URL, timeout=30)        r.raise_for_status ()        # RU If the state is not 200, throw Httperror exception        r.encoding () = r.appearent_conding ()        return r.text    except:        return " Generate Exception "if __name__ = =" __name__ ":    url =" www.baidu.com "    print (Gethtmltext (URL))

Python Crawler requests Module

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.