Python Crawler's requests library

Source: Internet
Author: User

Requests Library Introduction

Third-party libraries that send HTTP requests, compatible with Python2 and Python3


Installation:

PIP Install requests

Use:

Import requests
Send Request
Response = requests.get (URL)
Response = requests.post (URL)

Response Content
The value returned by the request is a response object, which is the encapsulation of the data returned by the HTTP protocol server
Response object main properties and methods:

Response.status_code Return code
Response.headers Header information returned, dictionary type
Response.content The original data byte type of the response, picture, audio, and video are generally used in this
Response.encoding Text data transcoding format, first set encoding, and then remove the text, solve the garbled problem
Response.text The response page source code, the data is transcoded the string
Response.Cookies Cookies returned by the server
Response.json () When the result is data in JSON format, turn it into a dictionary

Response = Requests.get ('http://www.baidu.com')Print(Response.status_code)# $Print(response.headers)#header information returned by the serverPrint(response.content)#raw data, byte typePrint(Response.content.decode ())#page source code has been transcodedPrint(Response.text)#Web page source code for transcoding iso-8859, Chinese garbled when the returned header information Content-type has charset attribute,                     #transcoding According to the value of charset, if there is no CharSet and text type, then follow iso-8859 toResponse.encoding ='Utf-8'Print(Response.text)#page source code to the Transcoding method to Utf-8, to solve the Chinese garbledPrint(Response.Cookies)

Query parameters

GET request to send a URL (url stitching)

 import   Requestspayload  = { " wd  " :  " python  "  Span style= "COLOR: #000000" >}response  = Requests.get ( " http://www.baidu.com/s?   ", Params=payload) response.encoding  = Span style= "COLOR: #800000" > " utf-8    print   (response.text)  print  (response.url) #   Print the URL of the final request Http://www.baidu.com/s?wd=python  

Post Request submission Parameters

Import= {'user':'qqq'# parameter response = requests.post ('http://httpbin.org/post', data='  utf-8'print(response.text)

Timeout settings

Import= requests.get ('https://www.google.com'#isnot answered after 5 seconds, The error timeout will be followed by exception handling

Cookie processing

For example, after the login page, the cookies are saved, and then in subsequent requests, the cookies are passed in.

data = {    'account_name':'Asda',    'Password':'qwe123'}result= Requests.post ('https://qiye.163.com/login/', Data=data)ifResult.status_code = = 200: Cookies=result.cookies Response= Requests.get ('https://qiye.163.com/News', cookies=cookies)

This way, with the request of the cookie after login, you can access the log-in data normally.

Session

In order to maintain the communication status between client and server

session= requests.session () session.get ()  #The API of the session object is basically the same as the requests and is requested by the session. Cookies are automatically saved, and the next request will be taken on your own to facilitate

SSL Authentication: Verify=false

Requests.get ('https://www.12306.com', Verify=false)   #  Verify defaults to True when the certificate is not validated when set to False

Headers header Information

headers = {'user=agent':'mozilla/5.0 (Windows NT 10.0; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/65.0.3325.181 safari/537.36'= Requests.get ('https://www.zhihu.com'# adds header information to send requests, does not add will be known to deny access

Redirect: Allow_redirects=false

headers = {'user=agent':'mozilla/5.0 (Windows NT 10.0; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/65.0.3325.181 safari/537.36'= Requests.get ('https://www.zhihu.com'# off redirect

Set up Proxy

Proxies = {'http':'183.232.188.18:80','https '  ':'183.232.188.18:80' = requests.get (url=' ) www.baidu.com ' # request using a proxy

Convert JSON Format data

R = Requests.get ('http://httpbin.org/ip')print#  When the returned data is in JSON format, the JSON-formatted data can be converted to a dictionary directly through the JSON () method .
Example: Using requests to emulate GitHub login
" "idea: GitHub login needs to carry the cookies on the homepage and set up the UA in the header message, and a token parameter in the Post form is required to request the first page to get" "ImportReImportRequestsImporturllib3urllib3.disable_warnings ()#Cancel Warningdefget_params (): Start_url='Https://github.com/login' #get cookies and token parameters from the login pageResponse = Requests.get (Start_url,verify=false)#Turn off SSL authenticationcookies =Response.Cookies#print (Response.text)token = Re.findall (r'<input type= "hidden" name= "Authenticity_token" value= "(. *?)"/>', Response.text) [0]#regular take out token    returnCookies,tokendeflogin (): Post_url='https://github.com/session' #真正登录提交数据的页面Cookies,token=Get_params ()#headers inside attention to have referer, indicating is from the chain took over, anti-theft chainheaders = {        'Host':'github.com',        'Referer':'Https://github.com/login',        'user-agent':'mozilla/5.0 (Windows NT 10.0; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/65.0.3325.181 safari/537.36',        'accept-encoding':'gzip, deflate, BR',    }    #data is captured through a packet, which is the form parameter submitted at logindata = {        'Commit':' Sign In',        'UTF8':'?',        'Authenticity_token': Token,'Login':'xxxxxx',        'Password':'xxxxxxxx',} R= Requests.post (url=post_url,data=data,headers=headers,cookies=cookies,verify=False)Print(R.text)if __name__=='__main__': Login ()

Finally, in the output text, search for Start a project (we go to GitHub in the browser, this is on the home page)

Search for instructions to login successfully!

Python Crawler's requests library

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.