Python Crawler---Requests library usage

Last Update:2017-11-23 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Requests is a simple and easy-to-use HTTP library implemented by Python, which is much simpler than urllib.

Because it is a third-party library, CMD installation is required before use

PIP Install requests

Once the installation is complete, import it, and normal means you can start using it.

Basic usage:

Requests.get () is used to request the target site, and the type is a HttpResponse type

Import= requests.get ('http://www.baidu.com')
Print (Response.status_code)  # Printing status code
          Print request URL (response.url) #
Print (Response.headers)      # printer header information
Print (response.cookies)      # printing cookie information print (Response.text )  #以文本形式打印网页源码
Print (response.content) #以字节流形式打印

Operation Result:

Status code: 200

Url:www.baidu.com

Headers information

Various Request methods:

ImportRequestsrequests.get ('Http://httpbin.org/get') Requests.post ('Http://httpbin.org/post') Requests.put ('Http://httpbin.org/put') Requests.delete ('Http://httpbin.org/delete') Requests.head ('Http://httpbin.org/get') requests.options ('Http://httpbin.org/get')

Basic GET Request

Import= requests.get ('http://httpbin.org/get')Print (Response.text)

Results

Get request with Parameters:

The first one directly places the parameter inside the URL

Import= requests.get (http://httpbin.org/get?name=gemey&age=22)print( Response.text)

Results

The other parameter is filled in dict, and the params parameter is specified as dict when the request is initiated

Import= {    'name''Tom',     '  Age ':requests.get ('http://httpbin.org/get', params=  Data)print(response.text)

Results Ibid.

Parsing JSON

Import= requests.get ('http://httpbin.org/get')print (Response.text) Print (Response.json ())  # Response.json () method with Json.loads (Response.text) Print (Type (Response.json ()))

Results

Save a binary file simply

Binary content is response.content

Import= requests.get ('http://img.ivsky.com/img/tupian/pre/201708/30/ Kekeersitao-002.jpg'= response.contentwith open ('f://fengjing.jpg  ','wb') as F:    F.write (b)

Add header information to your request

Import Requests

Heads = {}
heads[' user-agent '] = ' mozilla/5.0 ' \
                          ' (Macintosh; U Intel Mac OS X 10_6_8; En-us) applewebkit/534.50 ' \
                          ' (khtml, like Gecko) version/5.1 safari/534.50 '

= Requests.get ('http://www.baidu.com', headers=headers)

Using proxies

With the addition of the headers method, the proxy parameter is also a dict

This uses the requests library to crawl IP and ports and types of IP proxy sites

Because it is free, the proxy address used soon becomes invalid.

ImportRequestsImportRedefget_html (URL): Proxy= {        'http':'120.25.253.234:812',        'HTTPS' '163.125.222.244:8123'} heads={} heads['user-agent'] ='mozilla/5.0 (Windows NT 10.0; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/49.0.2623.221 safari/537.36 SE 2.X METASR 1.0'req= Requests.get (URL, headers=heads,proxies=proxy) HTML=Req.textreturnHTMLdefget_ipport (HTML): Regex= R'<td data-title= "IP" > (. +) </td>'IPList=Re.findall (regex, HTML) regex2='<td data-title= "PORT" > (. +) </td>'portlist=Re.findall (regex2, HTML) regex3= R'<td data-title= "type" > (. +) </td>'typelist=Re.findall (regex3, HTML) sumray= []     forIinchIPList: forPinchportlist: forTinchtypelist:Pass            Passa= t+','+i +':'+P Sumray.append (a)Print('High Stealth Agent')    Print(Sumray)if __name__=='__main__': URL='http://www.kuaidaili.com/free/'get_ipport (get_html (URL))

Results:

Basic POST Request:

Import= {'name':'Tom','age '  ':' + ' = requests.post ('/http Httpbin.org/post', Data=data)

Get cookies

# Get Cookies Import  = requests.get ('http://www.baidu.com')print( Response.Cookies)print(type (response.cookies)) for in Response.cookies.items ():    Print(k +':'+v)

Results:

Session Maintenance

Import= requests. Session () session.get ('http://httpbin.org/cookies/set/number/12345'= Session.get ('http://httpbin.org/cookies')print(response.text)

Results:

Certificate Validation Settings

Import Requests  from Import urllib3urllib3.disable_warnings ()   # remove warning from urllib3 response = requests.get ('https://www.12306.cn', Verify=false)  # certificate validation set to Falseprint(response.status_code)

Printed results: 200

Timeout exception capture

Import Requests  from Import ReadTimeout Try :     = Requests.get ('http://httpbin.org', timeout=0.1)    Print (Res.status_code) except readtimeout:     Print (timeout)

Exception handling

Try to use try...except to catch exceptions when you are not sure what errors will occur

All requests Exception:

Exceptions

ImportRequests fromRequests.exceptionsImportreadtimeout,httperror,requestexceptionTry: Response= Requests.get ('http://www.baidu.com', timeout=0.5)    Print(Response.status_code)exceptreadtimeout:Print('Timeout')exceptHttperror:Print('Httperror')exceptrequestexception:Print('Reqerror')

Python Crawler---Requests library usage

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python Crawler---Requests library usage

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support