Python crawler series (ii): Requests Basics

Source: Internet
Author: User

1. Send the request:

Import requests

# Get Data
#r是一个 the Response object. Contains the content returned by the request
r = Requests.get ('https://github.com/timeline.json')
Print (r.content)

Printing results:

B ' {"message": "Hello there, Wayfaring stranger. If You\xe2\x80\x99re Reading this and you probably didn\xe2\x80\x99t the blog post a couple of years back announcing That this API would go Away:github API V2:end of Life fear isn't, you should being able to get what's need from the shiny n EW events API instead. "," Documentation_url ": Events | GitHub Developer Guide "}"

The 4 way to send a request is the 4 method in the HTTP protocol:

r = Requests.put ("http./httpbin.org/put")
r = Requests.delete ("http./httpbin.org/delete")
r = Requests.head ("http./httpbin.org/get")
r = requests.options ("http./httpbin.org/get")

2. Passing URL parameters

The following two methods are passed through the URL of the parameter. parameter, must be a dictionary

Import requests

Payload1 = {' Key1 ': ' value1 ', ' key2 ': ' value2 '}
R1 = requests.get ("http//Httpbin.org/get", Params=payload1)
Print (R1.url)

Payload2 = {' Key1 ': ' value1 ', ' key2 ': [' value2 ', ' Value3 ']}
r2 = requests.get ('http//Httpbin.org/get ', params=payload2)
Print (R2.url)

Corresponding results:

HTTP/httpbin.org/get? Key1=value1&key2=value2

HTTP/httpbin.org/get? Key1=value1&key2=value2&key2=value3

Watch the difference.

3. Response Content

r = Requests.get ('https://github.com/timeline.json')
#获取响应结果
Print (R.text)
#获取内容编码
Print (r.encoding)
#修改内容编码方式. The new encoding will be used when the text is modified and then taken
r.encoding = ' iso-8859-1 '

Note that the symbols are coded differently

Edit content into binary

i = Bytesio (r.content)

Convert content to JSON object

Print (R.json ())

Note: A successful call to R.json () does not imply a successful response. Some servers include a JSON object (such as the error details of HTTP 500) in a failed response. This JSON will be decoded back. To check if the request was successful, use R.raise_for_status () or check to see if the R.status_code is the same as your expectations

Original response Content

What is the original content? The client and the server side build the socket on that layer to retrieve the content. You need to set Stream=true to get back and return the object that is urllib.

r = Requests.get ('https://github.com/timeline.json', stream=true)

#取回流中的100个字节的内容

R.raw.read (100)

However, if you want to save the returned data as a file, you should use the flow as follows:

with open (filename, ' WB ') as FD:
For chunk in R.iter_content (chunk_size):
Fd.write (Chunk)

Replace R.raw with Response.iter_content

4. Customizing the request Header

url = 'https://api.github.com/some/endPoint '
headers = {' user-agent ': ' my-app/0.0.1 '}

#说白了就给url传参数
r = Requests.get (URL, headers=headers)

There are the following points to note:

Note: Custom headers have a lower priority than some specific sources of information, such as:

If the user authentication information is set in the. NETRC, the authorization that is set with the headers= will not take effect. If the auth= parameter is set, the ". Netrc" setting is invalid.

If it is redirected to another host, the authorization header is removed.

The proxy authorization header is overwritten by the proxy identity provided in the URL.

When we can judge the length of the content, the content-length of the header will be rewritten.

Further, requests does not change its behavior based on the specifics of the custom header. Only in the final request, all header information will be passed in.

Note: All header values must be string, bytestring, or Unicode.

5. More complex POST requests

Import requests

# Pass-through tuples
Payload1 = ((' Key1 ', ' value1 '), (' Key1 ', ' value2 '))
R1 = requests.post ('http//Httpbin.org/post ', data=payload1)
# Pass-Through dictionary
Payload2 = {' Key1 ': ' value1 ', ' key2 ': ' value2 '}
r2 = requests.post ("http//Httpbin.org/post", data=payload2)
# Passing JSON strings
URL1 = 'https://api.github.com/some/endPoint '
Payload3 = {' Some ': ' Data '}
R3 = Requests.post (URL1, Data=json.dumps (PAYLOAD3))
# Passing JSON objects
Url2 = 'https://api.github.com/some/endPoint '
Payload4 = {' Some ': ' Data '}
R4 = Requests.post (Url2, JSON=PAYLOAD4)

6. File transfer

Import requests

url = '/httphttpbin.org/post'
# files = {' file ': Open (' Report.xls ', ' RB ')}
# explicitly set filename, file type and request header
# files = {' file ': (' Report.xls ', open (' Report.xls ', ' RB '), ' application/vnd.ms-excel ', {' Expires ': ' 0 '})}
# Send the string as a file
Files = {' file ': (' Report.xls ', ' some,data,to,send\nanother,row,to,send\n ')}
r = Requests.post (URL, files=files)
Print (R.text)
3rd Step Response Result

Note: The official recommendation is to use Requests-toolbelt to send multiple files. We'll show you further later

7. Response Status Code

r = Requests.get ('http//Httpbin.org/get ')
Print (R.status_code)
# Status Query object: Requests.codes
Print (R.status_code = = Requests.codes.ok)
Bad_r = Requests.get ('http//httpbin.org/status/404 ')
Print (Bad_r.status_code)
# When the request is in question, the Raise_for_status () method will start the exception manually
Bad_r.raise_for_status ()

Execution Result:

8. Response header

Import requests

r = Requests.get ('http//Httpbin.org/get ')
Print (R.status_code)
#获取响应头. The response header is a dictionary
Print (r.headers)
Print (r.headers[' Content-type ')
Print (R.headers.get (' Content-type '))

9.Cookie

Import requests

url = ' Http://example.com/some/cookie/setting/url '
r = Requests.get (URL)
# Get the cookies returned by the request
r.cookies[' Example_cookie_name ']

url = '/httphttpbin.org/cookies'
# bring your requests with cookies this thing is often used after a simulated login
r = Requests.get (URL, cookies=cookies)
R.text
# The return object of the Cookie is Requestscookiejar, which behaves like a dictionary and is suitable for cross-path use across domains
#妹的, is this cross-domain? It's an imitation. Login free
jar = Requests.cookies.RequestsCookieJar ()
Jar.set (' Tasty_cookie ', ' yum ', domain= ' httpbin.org ', path= '/cookies ')
Jar.set (' Gross_cookie ', ' Blech ', domain= ' httpbin.org ', path= '/elsewhere ')
url = '/httphttpbin.org/cookies'
r = Requests.get (URL, Cookies=jar)
R.text

10. Redirection and request history

By default, requests automatically handles all redirects except HEAD. You can use the history method of the response object to track the redirection.

What is redirection: Enter a address but automatically jump to B address

The following example: The 301 put back represents a permanent redirect. Don't dwell too much on it, just remember it.

It needs to be understood here: this example clearly accesses an address and why it is redirected. Because the domain name is accessed, DNS automatically turns to the actual server, where it redirects

Response.history is a list of Response objects that were created to complete the request. This list of objects is sorted by the most recent request.

r = Requests.get ('http//github.com ')
Print (R.url)
Print (r.history)

Disable redirection:

Using GET, OPTIONS, POST, PUT, PATCH, or DELETE, you can disable redirection with the allow_redirects parameter

r = Requests.get ('/http/github.com', Allow_redirects=false)
Print (R.status_code)
Print (r.history)

Use head to initiate redirection:

r = Requests.head ('/http/github.com', allow_redirects=true)
Print (r.history)

11. Timeout

R=requests.get ('http//github.com ', timeout=0.001)

Timeout: is very useful. If you do not set a timeout and do not return for a long time, the program will block. Timeout is only valid for the connection process and is not related to the download of the response body. Timeout is not a time limit for the entire download response, but if the server does not answer within timeout seconds, an exception will be thrown (more precisely, when no byte data is received from the underlying socket in timeout seconds)

12. Errors and exceptions

Requests throws a Connectionerror exception when encountering network problems such as DNS query failure, connection rejection, and so on.

If an HTTP request returns an unsuccessful status code, Response.raise_for_status () throws a Httperror exception.

If the request times out, a timeout exception is thrown.

If the request exceeds the maximum number of redirects set, a Toomanyredirects exception is thrown.

All exceptions that are explicitly thrown by requests inherit from Requests.exceptions.RequestException

Until now, we have a basic understanding of requests. Tomorrow, we will discuss further the requests advanced tricks.

I just hope that the company's new colleagues, Niu Mei can take a moment to look carefully, the code to run, see what effect.

Python crawler series (ii): Requests Basics

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.