Requests is a simple and easy-to-use HTTP library implemented by Python, which is much simpler than urllib.
Because it is a third-party library, CMD installation is required before use
PIP Install requests
Once the installation is complete, import it, and normal means you can start using it.
Basic usage:
Requests.get () is used to request the target site, and the type is a HttpResponse type
Import= requests.get ('http://www.baidu.com')
Print (Response.status_code) # Printing status code
Print request URL (response.url) #
Print (Response.headers) # printer header information
Print (response.cookies) # printing cookie information print (Response.text ) #以文本形式打印网页源码
Print (response.content) #以字节流形式打印
Operation Result:
Status code: 200
Url:www.baidu.com
Headers information
Various Request methods:
ImportRequestsrequests.get ('Http://httpbin.org/get') Requests.post ('Http://httpbin.org/post') Requests.put ('Http://httpbin.org/put') Requests.delete ('Http://httpbin.org/delete') Requests.head ('Http://httpbin.org/get') requests.options ('Http://httpbin.org/get')
Basic GET Request
Import= requests.get ('http://httpbin.org/get')Print (Response.text)
Results
Get request with Parameters:
The first one directly places the parameter inside the URL
Import= requests.get (http://httpbin.org/get?name=gemey&age=22)print( Response.text)
Results
The other parameter is filled in dict, and the params parameter is specified as dict when the request is initiated
Import= { 'name''Tom', ' Age ':requests.get ('http://httpbin.org/get', params= Data)print(response.text)
Results Ibid.
Parsing JSON
Import= requests.get ('http://httpbin.org/get')print (Response.text) Print (Response.json ()) # Response.json () method with Json.loads (Response.text) Print (Type (Response.json ()))
Results
Save a binary file simply
Binary content is response.content
Import= requests.get ('http://img.ivsky.com/img/tupian/pre/201708/30/ Kekeersitao-002.jpg'= response.contentwith open ('f://fengjing.jpg ','wb') as F: F.write (b)
Add header information to your request
Import Requests
Heads = {}
heads[' user-agent '] = ' mozilla/5.0 ' \
' (Macintosh; U Intel Mac OS X 10_6_8; En-us) applewebkit/534.50 ' \
' (khtml, like Gecko) version/5.1 safari/534.50 '
= Requests.get ('http://www.baidu.com', headers=headers)
Using proxies
With the addition of the headers method, the proxy parameter is also a dict
This uses the requests library to crawl IP and ports and types of IP proxy sites
Because it is free, the proxy address used soon becomes invalid.
ImportRequestsImportRedefget_html (URL): Proxy= { 'http':'120.25.253.234:812', 'HTTPS' '163.125.222.244:8123'} heads={} heads['user-agent'] ='mozilla/5.0 (Windows NT 10.0; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/49.0.2623.221 safari/537.36 SE 2.X METASR 1.0'req= Requests.get (URL, headers=heads,proxies=proxy) HTML=Req.textreturnHTMLdefget_ipport (HTML): Regex= R'<td data-title= "IP" > (. +) </td>'IPList=Re.findall (regex, HTML) regex2='<td data-title= "PORT" > (. +) </td>'portlist=Re.findall (regex2, HTML) regex3= R'<td data-title= "type" > (. +) </td>'typelist=Re.findall (regex3, HTML) sumray= [] forIinchIPList: forPinchportlist: forTinchtypelist:Pass Passa= t+','+i +':'+P Sumray.append (a)Print('High Stealth Agent') Print(Sumray)if __name__=='__main__': URL='http://www.kuaidaili.com/free/'get_ipport (get_html (URL))
Results:
Basic POST Request:
Import= {'name':'Tom','age ' ':' + ' = requests.post ('/http Httpbin.org/post', Data=data)
Get cookies
# Get Cookies Import = requests.get ('http://www.baidu.com')print( Response.Cookies)print(type (response.cookies)) for in Response.cookies.items (): Print(k +':'+v)
Results:
Session Maintenance
Import= requests. Session () session.get ('http://httpbin.org/cookies/set/number/12345'= Session.get ('http://httpbin.org/cookies')print(response.text)
Results:
Certificate Validation Settings
Import Requests from Import urllib3urllib3.disable_warnings () # remove warning from urllib3 response = requests.get ('https://www.12306.cn', Verify=false) # certificate validation set to Falseprint(response.status_code)
Printed results: 200
Timeout exception capture
Import Requests from Import ReadTimeout Try : = Requests.get ('http://httpbin.org', timeout=0.1) Print (Res.status_code) except readtimeout: Print (timeout)
Exception handling
Try to use try...except to catch exceptions when you are not sure what errors will occur
All requests Exception:
Exceptions
ImportRequests fromRequests.exceptionsImportreadtimeout,httperror,requestexceptionTry: Response= Requests.get ('http://www.baidu.com', timeout=0.5) Print(Response.status_code)exceptreadtimeout:Print('Timeout')exceptHttperror:Print('Httperror')exceptrequestexception:Print('Reqerror')
Python Crawler---Requests library usage