Requests Library Introduction
Third-party libraries that send HTTP requests, compatible with Python2 and Python3
Installation:
PIP Install requests
Use:
Import requests
Send Request
Response = requests.get (URL)
Response = requests.post (URL)
Response Content
The value returned by the request is a response object, which is the encapsulation of the data returned by the HTTP protocol server
Response object main properties and methods:
| Response.status_code |
Return code |
| Response.headers |
Header information returned, dictionary type |
| Response.content |
The original data byte type of the response, picture, audio, and video are generally used in this |
| Response.encoding |
Text data transcoding format, first set encoding, and then remove the text, solve the garbled problem |
| Response.text |
The response page source code, the data is transcoded the string |
| Response.Cookies |
Cookies returned by the server |
| Response.json () |
When the result is data in JSON format, turn it into a dictionary |
Response = Requests.get ('http://www.baidu.com')Print(Response.status_code)# $Print(response.headers)#header information returned by the serverPrint(response.content)#raw data, byte typePrint(Response.content.decode ())#page source code has been transcodedPrint(Response.text)#Web page source code for transcoding iso-8859, Chinese garbled when the returned header information Content-type has charset attribute, #transcoding According to the value of charset, if there is no CharSet and text type, then follow iso-8859 toResponse.encoding ='Utf-8'Print(Response.text)#page source code to the Transcoding method to Utf-8, to solve the Chinese garbledPrint(Response.Cookies)
Query parameters
GET request to send a URL (url stitching)
import Requestspayload = { " wd " : " python " Span style= "COLOR: #000000" >}response = Requests.get ( " http://www.baidu.com/s? ", Params=payload) response.encoding = Span style= "COLOR: #800000" > " utf-8 print (response.text) print (response.url) # Print the URL of the final request Http://www.baidu.com/s?wd=python
Post Request submission Parameters
Import= {'user':'qqq'# parameter response = requests.post ('http://httpbin.org/post', data=' utf-8'print(response.text)
Timeout settings
Import= requests.get ('https://www.google.com'#isnot answered after 5 seconds, The error timeout will be followed by exception handling
Cookie processing
For example, after the login page, the cookies are saved, and then in subsequent requests, the cookies are passed in.
data = { 'account_name':'Asda', 'Password':'qwe123'}result= Requests.post ('https://qiye.163.com/login/', Data=data)ifResult.status_code = = 200: Cookies=result.cookies Response= Requests.get ('https://qiye.163.com/News', cookies=cookies)
This way, with the request of the cookie after login, you can access the log-in data normally.
Session
In order to maintain the communication status between client and server
session= requests.session () session.get () #The API of the session object is basically the same as the requests and is requested by the session. Cookies are automatically saved, and the next request will be taken on your own to facilitate
SSL Authentication: Verify=false
Requests.get ('https://www.12306.com', Verify=false) # Verify defaults to True when the certificate is not validated when set to False
Headers header Information
headers = {'user=agent':'mozilla/5.0 (Windows NT 10.0; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/65.0.3325.181 safari/537.36'= Requests.get ('https://www.zhihu.com'# adds header information to send requests, does not add will be known to deny access
Redirect: Allow_redirects=false
headers = {'user=agent':'mozilla/5.0 (Windows NT 10.0; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/65.0.3325.181 safari/537.36'= Requests.get ('https://www.zhihu.com'# off redirect
Set up Proxy
Proxies = {'http':'183.232.188.18:80','https ' ':'183.232.188.18:80' = requests.get (url=' ) www.baidu.com ' # request using a proxy
Convert JSON Format data
R = Requests.get ('http://httpbin.org/ip')print# When the returned data is in JSON format, the JSON-formatted data can be converted to a dictionary directly through the JSON () method .
Example: Using requests to emulate GitHub login
" "idea: GitHub login needs to carry the cookies on the homepage and set up the UA in the header message, and a token parameter in the Post form is required to request the first page to get" "ImportReImportRequestsImporturllib3urllib3.disable_warnings ()#Cancel Warningdefget_params (): Start_url='Https://github.com/login' #get cookies and token parameters from the login pageResponse = Requests.get (Start_url,verify=false)#Turn off SSL authenticationcookies =Response.Cookies#print (Response.text)token = Re.findall (r'<input type= "hidden" name= "Authenticity_token" value= "(. *?)"/>', Response.text) [0]#regular take out token returnCookies,tokendeflogin (): Post_url='https://github.com/session' #真正登录提交数据的页面Cookies,token=Get_params ()#headers inside attention to have referer, indicating is from the chain took over, anti-theft chainheaders = { 'Host':'github.com', 'Referer':'Https://github.com/login', 'user-agent':'mozilla/5.0 (Windows NT 10.0; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/65.0.3325.181 safari/537.36', 'accept-encoding':'gzip, deflate, BR', } #data is captured through a packet, which is the form parameter submitted at logindata = { 'Commit':' Sign In', 'UTF8':'?', 'Authenticity_token': Token,'Login':'xxxxxx', 'Password':'xxxxxxxx',} R= Requests.post (url=post_url,data=data,headers=headers,cookies=cookies,verify=False)Print(R.text)if __name__=='__main__': Login ()
Finally, in the output text, search for Start a project (we go to GitHub in the browser, this is on the home page)
Search for instructions to login successfully!
Python Crawler's requests library