Python Crawler's requests library

Last Update:2018-07-01 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Requests Library Introduction

Third-party libraries that send HTTP requests, compatible with Python2 and Python3

Installation:

PIP Install requests

Use:

Import requests
Send Request
Response = requests.get (URL)
Response = requests.post (URL)

Response Content
The value returned by the request is a response object, which is the encapsulation of the data returned by the HTTP protocol server
Response object main properties and methods:

Response.status_code	Return code
Response.headers	Header information returned, dictionary type
Response.content	The original data byte type of the response, picture, audio, and video are generally used in this
Response.encoding	Text data transcoding format, first set encoding, and then remove the text, solve the garbled problem
Response.text	The response page source code, the data is transcoded the string
Response.Cookies	Cookies returned by the server
Response.json ()	When the result is data in JSON format, turn it into a dictionary

Response = Requests.get ('http://www.baidu.com')Print(Response.status_code)# $Print(response.headers)#header information returned by the serverPrint(response.content)#raw data, byte typePrint(Response.content.decode ())#page source code has been transcodedPrint(Response.text)#Web page source code for transcoding iso-8859, Chinese garbled when the returned header information Content-type has charset attribute,                     #transcoding According to the value of charset, if there is no CharSet and text type, then follow iso-8859 toResponse.encoding ='Utf-8'Print(Response.text)#page source code to the Transcoding method to Utf-8, to solve the Chinese garbledPrint(Response.Cookies)

Query parameters

GET request to send a URL (url stitching)

 import   Requestspayload  = { " wd  " :  " python  "  Span style= "COLOR: #000000" >}response  = Requests.get ( " http://www.baidu.com/s?   ", Params=payload) response.encoding  = Span style= "COLOR: #800000" > " utf-8    print   (response.text)  print  (response.url) #   Print the URL of the final request Http://www.baidu.com/s?wd=python

Post Request submission Parameters

Import= {'user':'qqq'# parameter response = requests.post ('http://httpbin.org/post', data='  utf-8'print(response.text)

Timeout settings

Import= requests.get ('https://www.google.com'#isnot answered after 5 seconds, The error timeout will be followed by exception handling

Cookie processing

For example, after the login page, the cookies are saved, and then in subsequent requests, the cookies are passed in.

data = {    'account_name':'Asda',    'Password':'qwe123'}result= Requests.post ('https://qiye.163.com/login/', Data=data)ifResult.status_code = = 200: Cookies=result.cookies Response= Requests.get ('https://qiye.163.com/News', cookies=cookies)

This way, with the request of the cookie after login, you can access the log-in data normally.

Session

In order to maintain the communication status between client and server

session= requests.session () session.get ()  #The API of the session object is basically the same as the requests and is requested by the session. Cookies are automatically saved, and the next request will be taken on your own to facilitate

SSL Authentication: Verify=false

Requests.get ('https://www.12306.com', Verify=false)   #  Verify defaults to True when the certificate is not validated when set to False

Headers header Information

headers = {'user=agent':'mozilla/5.0 (Windows NT 10.0; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/65.0.3325.181 safari/537.36'= Requests.get ('https://www.zhihu.com'# adds header information to send requests, does not add will be known to deny access

Redirect: Allow_redirects=false

headers = {'user=agent':'mozilla/5.0 (Windows NT 10.0; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/65.0.3325.181 safari/537.36'= Requests.get ('https://www.zhihu.com'# off redirect

Set up Proxy

Proxies = {'http':'183.232.188.18:80','https '  ':'183.232.188.18:80' = requests.get (url=' ) www.baidu.com ' # request using a proxy

Convert JSON Format data

R = Requests.get ('http://httpbin.org/ip')print#  When the returned data is in JSON format, the JSON-formatted data can be converted to a dictionary directly through the JSON () method .

Example: Using requests to emulate GitHub login

" "idea: GitHub login needs to carry the cookies on the homepage and set up the UA in the header message, and a token parameter in the Post form is required to request the first page to get" "ImportReImportRequestsImporturllib3urllib3.disable_warnings ()#Cancel Warningdefget_params (): Start_url='Https://github.com/login' #get cookies and token parameters from the login pageResponse = Requests.get (Start_url,verify=false)#Turn off SSL authenticationcookies =Response.Cookies#print (Response.text)token = Re.findall (r'<input type= "hidden" name= "Authenticity_token" value= "(. *?)"/>', Response.text) [0]#regular take out token    returnCookies,tokendeflogin (): Post_url='https://github.com/session' #真正登录提交数据的页面Cookies,token=Get_params ()#headers inside attention to have referer, indicating is from the chain took over, anti-theft chainheaders = {        'Host':'github.com',        'Referer':'Https://github.com/login',        'user-agent':'mozilla/5.0 (Windows NT 10.0; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/65.0.3325.181 safari/537.36',        'accept-encoding':'gzip, deflate, BR',    }    #data is captured through a packet, which is the form parameter submitted at logindata = {        'Commit':' Sign In',        'UTF8':'?',        'Authenticity_token': Token,'Login':'xxxxxx',        'Password':'xxxxxxxx',} R= Requests.post (url=post_url,data=data,headers=headers,cookies=cookies,verify=False)Print(R.text)if __name__=='__main__': Login ()

Finally, in the output text, search for Start a project (we go to GitHub in the browser, this is on the home page)

Search for instructions to login successfully!

Python Crawler's requests library

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More