Python Requests installation and simple application, pythonrequests

Source: Internet
Author: User

Python Requests installation and simple application, pythonrequests

Requests is an HTTP client library of python. It is similar to urllib and urllib2. Why does it use requests instead of urllib2? This is illustrated in the official document:

The standard python library urllib2 provides most of the HTTP functions required, but the API is too backward and a simple function requires a lot of code.

I also read the requests documentation, which is really simple and suitable for lazy people like me. Below are some simple guidelines.

Insert good news! I just saw requests has a Chinese translation version, it is recommended to look at the English is not good, the content is much better than my blog, the specific link is: http://cn.python-requests.org/en/latest/ (but v1.1.0 version, another sorry, previously, the error link was pasted ).

1. Install

The installation is simple. I am a Windows system. I downloaded the installation package here (download the zipball link on the webpage), and then installed $ python setup. py install.

Of course, friends with easy_install or pip can directly use: easy_install requests or pip install requests for installation.
For linux users, there are other installation methods on this page.

Test: enter import requests in IDLE. If no error is displayed, the installation is successful!

2. Test the knife

>>> Import requests >>> r = requests. get ('HTTP: // www.zhidaow.com ') # send a request> r. status_code # return code 200 >>> r. headers ['content-type'] # Return header information 'text/html; charset = utf8'> r. encoding # encoding information 'utf-8'> r. text # Content section (PS. We recommend that you use r. content) U' <! DOCTYPE html> \ n 

Is it easy? What is simpler and more intuitive than urllib2 and urllib ?! See the Quick Guide.

3. Quick Guide

3.1 send request

Sending a request is simple. First import the requests module:

>>>import requests

Next, let's get a webpage, such as the homepage of my personal blog:

>>>r = requests.get('http://www.zhidaow.com')

Next, we can use various r methods and functions.

In addition, there are many types of HTTP requests, such as POST, PUT, DELETE, HEAD, and OPTIONS. It can also be implemented in the same way:

>>> r = requests.post("http://httpbin.org/post")>>> r = requests.put("http://httpbin.org/put")>>> r = requests.delete("http://httpbin.org/delete")>>> r = requests.head("http://httpbin.org/get")>>> r = requests.options(http://httpbin.org/get)

Because I haven't used this yet, so I haven't studied it in depth.

3.2 PASS Parameters in URLs

Sometimes we need to pass parameters in the URL. For example, when collecting Baidu search results, we use the wd parameter (search term) and rn parameter (search result quantity), you can manually form a URL, requests also provides a method that looks like NB:

>>> Payload = {'wd ': 'zhang Yanan', 'rn ': '000000'} >>> r = requests. get ("http://www.baidu.com/s", params = payload)> print r. urlu 'HTTP: // www.baidu.com/s? Rn= 100 & wd = % E5 % BC % A0 % E4 % BA % 9A % E6 % A5 % A0'

The above wd = garbled code is the transcoding form of "Zhang Yanan. (It seems that the parameters are sorted by the first letter .)

3.3 get response content

You can use r. text to obtain the webpage content.

>>> r = requests.get('https://www.zhidaow.com')>>> r.textu'<!DOCTYPE html>\n

The document says that requests will automatically transcode the content. Most unicode fonts are seamlessly transcoded. However, when I use it in cygwin, The UnicodeEncodeError error always occurs, which is depressing. It is completely normal in the python IDLE.
In addition, you can use r. content to obtain the page content.

>>> r = requests.get('https://www.zhidaow.com')>>> r.contentb'<!DOCTYPE html>\n

In this document, r. content is displayed in bytes, so it starts with B in IDLE. But I didn't use it in cygwin. The download page is just right. Therefore, the urllib2.urlopen (url). read () function of urllib2 is replaced. (This is basically one of the most commonly used functions .)

3.4 get webpage code

You can use r. encoding to obtain the webpage code.

>>> r = requests.get('http://www.zhidaow.com')>>> r.encoding'utf-8'

When you send a request, requests guesses the webpage Encoding Based on the HTTP header. When you use r. text, requests uses this encoding. You can also modify the requests encoding format.

>> r = requests.get('http://www.zhidaow.com')>>> r.encoding'utf-8'>>>r.encoding = 'ISO-8859-1'

As in the preceding example, after the encoding is modified, the modified code is used to obtain the webpage content.

3.5 json

For example, urllib and urllib2. If json is used, a new module, such as json and simplejson, must be introduced. However, the built-in function r. json () has been provided in requests (). Take the IP address query API for example:

>>> R = requests. get ('HTTP: // ip.taobao.com/service/getIpInfo.php? Ip = 122.88.60.28 ')> r. json () ['data'] ['country'] 'China'

3.6 webpage status code

We can use r. status_code to check the webpage status code.

>>>r = requests.get('http://www.mengtiankong.com')>>>r.status_code200>>>r = requests.get('http://www.mengtiankong.com/123123/')>>>r.status_code404>>>r = requests.get('http://www.baidu.com/link?url=QeTRFOS7TuUQRppa0wlTJJr6FfIYI1DJprJukx4Qy0XnsDO_s9baoO8u1wvjxgqN')>>>r.urlu'http://www.zhidaow.com/>>>r.status_code200

The first two examples are normal. If 200 is enabled, 404 is returned. If is not enabled, is returned. But the third one is a bit strange. It is the 302 jump address in Baidu's search results, but the status code is 200. Then I used a trick to make him look like this:

>>>r.history(<Response [302]>,)

We can see that he uses 302 redirection. Some people may think that the jump status code can be obtained through judgment and regular expressions. In fact, there is a simpler method:

>>>r = requests.get('http://www.baidu.com/link?url=QeTRFOS7TuUQRppa0wlTJJr6FfIYI1DJprJukx4Qy0XnsDO_s9baoO8u1wvjxgqN', allow_redirects = False)>>>r.status_code302

If you add the allow_redirects parameter and disable the jump, the jump status code will appear directly. Is it easy to use? I also made a simple small application in the last step to get the webpage status code. The principle is this.

3.7 response header content

You can use r. headers to obtain the response header content.

>>>r = requests.get('http://www.zhidaow.com')>>> r.headers{'content-encoding': 'gzip','transfer-encoding': 'chunked','content-type': 'text/html; charset=utf-8';...}

We can see that all the content is returned in the form of a dictionary, and we can also access part of the content.

>>> r.headers['Content-Type']'text/html; charset=utf-8'>>> r.headers.get('content-type')'text/html; charset=utf-8'

3.8 set timeout

We can use the timeout attribute to set the timeout time. Once the response content is not received after this time, an error is prompted.

>>> requests.get('http://github.com', timeout=0.001)Traceback (most recent call last):File "<stdin>", line 1, in <module>requests.exceptions.Timeout: HTTPConnectionPool(host='github.com', port=80): Request timed out. (timeout=0.001)

3.9 proxy access

Proxy is often used to avoid blocked IP addresses during collection. Requests also has the corresponding proxies attribute.

import requestsproxies = {"http": "http://10.10.1.10:3128","https": "http://10.10.1.10:1080",}requests.get("http://www.zhidaow.com", proxies=proxies)

If the proxy needs an account and password, you need:

proxies = {"http": "http://user:pass@10.10.1.10:3128/",}

3.10 request header content

You can use r. request. headers to obtain the request header content.

>>> r.request.headers{'Accept-Encoding': 'identity, deflate, compress, gzip','Accept': '*/*', 'User-Agent': 'python-requests/1.2.3 CPython/2.7.3 Windows/XP'}

3.11 custom request header

The request header in disguise is often used for collection. We can use this method to hide it:

r = requests.get('http://www.zhidaow.com')print r.request.headers['User-Agent']#python-requests/1.2.3 CPython/2.7.3 Windows/XPheaders = {'User-Agent': 'alexkh'}r = requests.get('http://www.zhidaow.com', headers = headers)print r.request.headers['User-Agent']#alexkh

3.12 persistent connection keep-alive

The keep-alive of requests is based on urllib3, and persistent connections in the same session are completely automatic. All requests in the same session will automatically use the appropriate connection.

In other words, you do not need to set any parameters. requests will automatically implement keep-alive.

4. Simple Application

4.1 obtain the webpage return code

def get_status(url):r = requests.get(url, allow_redirects = False)return r.status_codeprint get_status('http://www.zhidaow.com') #200print get_status('http://www.zhidaow.com/hi404/')#404print get_status('http://mengtiankong.com')#301print get_status('http://www.baidu.com/link?url=QeTRFOS7TuUQRppa0wlTJJr6FfIYI1DJprJukx4Qy0XnsDO_s9baoO8u1wvjxgqN')#302print get_status('http://www.huiya56.com/com8.intre.asp?46981.html')#500

The above is an introduction to Python Requests installation and simple application. I hope it will be helpful to you!

Recommended reading:

Get started with Python Requests

Articles you may be interested in:
  • A simple example of using the requests library to simulate login and capture data in python
  • How to Use the requests module in python
  • Python3 uses the requests package to capture and save the webpage source code
  • Get started with Python Requests

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.