Requests: Let HTTP serve humans __ Crawler tutorial

Source: Internet
Author: User
Tags auth connection pooling response code sessions ssl certificate

Requests supports HTTP connection retention and connection pooling, supports the use of cookies to hold sessions, supports file uploads, supports automatic coding of response content, and supports internationalized URL and POST data encoding automatically.

Requests's documentation is very complete and Chinese documents are pretty good. Requests can fully meet the needs of the current network, support Python 2.6-3.5, and can run perfectly under PyPy.

Open Source Address: https://github.com/kennethreitz/requests

Chinese document api:http://docs.python-requests.org/zh_cn/latest/index.html installation Method

Install with PIP or use Easy_install to complete the installation:

$ pip Install requests

$ easy_install Requests
Basic Get requests (headers parameters and Parmas parameters) 1. The most basic GET request can be directly used
Response = Requests.get ("http://www.baidu.com/")

# can also be written like this
# response = requests.request ("Get", "http:// www.baidu.com/")
2. Add headers and query parameters

If you want to add headers, you can pass in the headers parameter to increase the headers information in the request header. If you want to pass the parameter in the URL, you can take advantage of the params parameter.

Import requests

kw = {' WD ': ' Great Wall '}

headers = {"User-agent": "mozilla/5.0 (Windows NT 10.0; Win64; x64) applewebkit/537.36 (khtml, like Gecko) chrome/54.0.2840.99 safari/537.36 "}

# params receives a dictionary or string of query parameters, The dictionary type is automatically converted to URL encoding and does not require urlencode ()
response = Requests.get ("http://www.baidu.com/s?", params = kw, headers = headers)

# View response content, Response.text returns data
print Response.text # in Unicode format

view response content, Response.content returned byte stream data
Print Respones.content

# view full URL address
print response.url

# View response header character encoding
print response.encoding

# View Response Code
print Response.status_code

Run results

......

......

' Http://www.baidu.com/s?wd=%E9%95%BF%E5%9F%8E '

utf-8 '

200

When using Response.text, Requests automatically decodes response content based on the text encoding of the HTTP response, and most Unicode character sets are seamlessly decoded.

When using Response.content, the original binary byte stream of the server response data is returned, which can be used to save binary files such as pictures. Basic POST request (data parameter) 1. The most basic GET request can be directly using the Post method

Response = Requests.post ("http://www.baidu.com/", data = data)
2. Incoming Data

For POST requests, we generally need to add some parameters to it. Then the most basic method of passing parameters can use the parameter of data.

Import requests

Formdata = {
    "type": "AUTO",
    "I": "I Love Python",
    "doctype": "JSON",
    "XMLVersion" : "1.8",
    "Keyfrom": "Fanyi.web", "
    UE": "UTF-8",
    "action": "Fy_by_enter",
    "Typoresult": "True"
}

url = "http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule&smartresult=ugc& Sessionfrom=null "

headers={" user-agent ":" mozilla/5.0 (Windows NT 10.0; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/51.0.2704.103 safari/537.36 "}

response = Requests.post (URL, data = formdata, headers = headers)

print Response.text

# If the JSON file can display
print Response.json directly ()

Run results

{"type": "En2zh_cn", "ErrorCode": 0, "ElapsedTime": 2, "Translateresult": [[{"src": "I Love Python", "TGT": "I Like Python"}]] , "Smartresult": {"type": 1, "Entries": ["", "Wen", "Gartner"]}}

{u ' errorcode ': 0, U ' elapsedtime ': 0, U ' translateresult ': [[{ U ' src ': U ' I love python ', U ' TGT ': U ' \u6211\u559c\u6b22python '}], U ' smartresult ': {u ' type ': 1, U ' entries ': [u ', U ' \u8086 \u6587 ', U ' \u9ad8\u5fb7\u7eb3 ']}, U ' type ': U ' en2zh_cn '}
agent (proxies parameter)

If you need to use a proxy, you can configure a single request by providing proxies parameters for any of the request methods:

Import Requests

# Depending on the protocol type, select a different agent
proxies = {
  "http": "http://12.34.56.79:9527",
  "https": "http:// 12.34.56.79:9527 ",
}

response = Requests.get (" http://www.baidu.com ", proxies = proxies)
Print Response.text

You can also configure the agent through local environment variables http_proxy and Https_proxy:

Export http_proxy= "http://12.34.56.79:9527"
export https_proxy= "https://12.34.56.79:9527"
Private proxy authentication (specific format) and Web Client authentication (auth parameters) Private Agent
Import Requests

# If the proxy needs to use HTTP Basic Auth, you can use the following format:
proxy = {"http": "mr_mao_hacker:sffqry9r@61.158.163.130 : 16816 "}

response = Requests.get (" http://www.baidu.com ", proxies = proxy)

print Response.text
Web Client Authentication

If Web client authentication is required, add auth = (account name, password)

Import requests

auth= (' Test ', ' 123456 ')

response = Requests.get (' http://192.168.199.107 ', auth = auth)

Print Response.text

Urllib2 Tears Running ... Cookies and Sission Cookies

If a cookie is included in a response, then we can use the cookies parameter to get:

Import Requests

response = Requests.get ("http://www.baidu.com/")

# 7. Return Cookiejar object:
Cookiejar = Response.Cookies

# 8. Convert Cookiejar to Dictionary:
cookiedict = Requests.utils.dict_from_cookiejar (Cookiejar)

Print Cookiejar

print cookiedict

Run Result:

<requestscookiejar[<cookie bdorz=27315 for .baidu.com/>]>

{' Bdorz ': ' 27315 '}
sission

In requests, the session object is a very commonly used object that represents a user conversation: Starting with the client browser connecting to the server and disconnecting the client browser from the server.

Sessions allow us to maintain certain parameters across requests, such as keeping cookies between all requests issued by the same session instance. realize Renren Login

Import Requests

# 1. To create a Session object, you can save the cookie value
ssion = Requests.session ()

# 2. Process headers
headers = {" User-agent ":" mozilla/5.0 (Windows NT 10.0; Win64; x64) applewebkit/537.36 (khtml, like Gecko) chrome/54.0.2840.99 safari/537.36 "}

# 3. Username and password required to log in
data = {" Email " : "mr_mao_hacker@163.com", "Password": "Alarmchime"}  

# 4. Send a request with a username and password and get the cookie value after login, saved in Ssion
Ssion.post ("http://www.renren.com/PLogin.do", data = data)

# 5. Ssion contains the cookie value after the user logs in and can directly access those pages
that are logged in before they can be accessed Response = Ssion.get ("Http://www.renren.com/410043129/profile")

# 6 Printing response content print
Response.text

processing HTTPS request SSL certificate validation

Requests can also authenticate SSL certificates for HTTPS requests: To check the SSL certificate for a host, you can use the Verify parameter (or not)

Import Requests
response = Requests.get ("https://www.baidu.com/", Verify=true)

# can also be omitted not to write
# response = Requests.get ("https://www.baidu.com/")
print R.text

Run Result:

<! DOCTYPE html>
<!--STATUS ok-->
If SSL certificate authentication does not pass, or if the server's security certificate is not trusted, the Sslerror is reported, and 12306 certificates are said to have been made by themselves:

To test:

Import Requests
response = Requests.get ("https://www.12306.cn/mormhweb/")
print Response.text

Really

Sslerror: ("Bad Handshake:error" ([(' SSL routines ', ' ssl3_get_server_certificate ', ' certificate verify failed ')],)

If we want to skip 12306 certificate validation, set the verify to False to make a normal request.

r = Requests.get ("https://www.12306.cn/mormhweb/", verify = False)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.