The document says that requests will automatically transcode the content. Most Unicode fonts will be transcoded seamlessly. But I always appear in the Cygwin under the UnicodeEncodeError wrong, depressed. It's perfectly normal in Python's idle.
In addition, you can also r.content get the page content.
>>> r = requests.get (' https://www.zhidaow.com ') >>> r.contentb ' <! DOCTYPE html>\nThe document says that r.content it is displayed in bytes, so it begins in idle b . But I did not use it in Cygwin, download the page just right. So it replaces the function of Urllib2 urllib2.urlopen(url).read() . ( basically I use the most of a feature .) ) 3.4 Get page encoding
Can be used r.encoding to get the page encoding.
>>> r = requests.get (' http://www.zhidaow.com ') >>> r.encoding ' Utf-8 '
When you send a request, requests guesses the page encoding based on the HTTP header, and requests uses that code when you use it r.text . Of course you can also modify the requests encoding.
>>> r = requests.get (' http://www.zhidaow.com ') >>> r.encoding ' utf-8 ' >>>r.encoding = ' Iso-8859-1 '
Like the example above, the modified code will be used to get the content of the Web page directly after encoding. 3.5 JSON
Like Urllib and URLLIB2, if you use JSON, you need to introduce new modules, such as json and simplejson , but already have built-in functions in requests r.json() . Take the API for querying IP:
>>>r = Requests.get (' http://ip.taobao.com/service/getIpInfo.php?ip=122.88.60.28 ') >>>r.json () [' Data ' [' Country '] ' China '
3.6 Page Status CodeWe can use it r.status_code to check the status code of the Web page.
>>>r = Requests.get (' http://www.mengtiankong.com ') >>>r.status_code200>>>r = Requests.get (' http://www.mengtiankong.com/123123/') >>>r.status_code404>>>r = Requests.get (' Http://www.baidu.com/link?url=QeTRFOS7TuUQRppa0wlTJJr6FfIYI1DJprJukx4Qy0XnsDO_s9baoO8u1wvjxgqN ') >>> R.urlu ' http://www.zhidaow.com/>>>r.status_code200
The first two examples are normal, can normally open return 200, does not open normally return 404. But the third one is a bit strange, that is Baidu search results in the 302 jump address, but the status code display is 200, then I used a trick let him show his true colours:
>>>r.history (<response [302]>,)
Here it can be seen that he is using a 302 jump. Perhaps some people think that this can be judged and regular to get the status code of the jump, in fact, there is a simpler way:
>>>r = Requests.get (' http://www.baidu.com/link?url=QeTRFOS7TuUQRppa0wlTJJr6FfIYI1DJprJukx4Qy0XnsDO_ S9baoo8u1wvjxgqn ', allow_redirects = False) >>>r.status_code302
Just add a parameter allow_redirects , prohibit jump, directly appear jump status code, easy to use it? I also used this in the last one to do a simple to get the page status code small application, the principle is this. 3.7 Response Header Content
You can r.headers get the response header content by.
>>>r = Requests.get (' http://www.zhidaow.com ') >>> r.headers{ ' content-encoding ': ' gzip ', ' transfer-encoding ': ' chunked ', ' content-type ': ' text/html; Charset=utf-8 '; ...}
You can see that everything is returned as a dictionary , and we can also access some of the content.
>>> r.headers[' content-type '] ' text/html; Charset=utf-8 ' >>> r.headers.get (' content-type ') ' text/html; Charset=utf-8 '
3.8 Setting the time-out periodWe can set the time timeout -out through the property, and if we don't get the response at this time, we'll be prompted with an error.
>>> requests.get (' http://github.com ', timeout=0.001) Traceback (most recent call last): File "<stdin > ", Line 1, in <module>requests.exceptions.timeout:httpconnectionpool (host= ' github.com ', port=80): Request Timed out. (timeout=0.001)
3.9 Delegate AccessAt the time of collection to avoid being blocked IP, agent is often used. The requests also has corresponding proxies properties.
Import requestsproxies = { "http": "http://10.10.1.10:3128", "https": "http://10.10.1.10:1080",} Requests.get ("http://www.zhidaow.com", proxies=proxies)
This is required if the agent requires an account and password:
Proxies = { "http": "Http://user:[email protected]:3128/",}
3.10 Request Header ContentRequest header content can be used r.request.headers to obtain.
>>> r.request.headers{' accept-encoding ': ' Identity, deflate, compress, gzip ', ' Accept ': ' */* ', ' user-agent ': ' python-requests/1.2.3 cpython/2.7.3 windows/xp '}
3.11 Customizing the request headerThe disguise request header is often used when collecting, we can use this method to hide:
r = Requests.get (' http://www.zhidaow.com ') print r.request.headers[' user-agent '] #python-requests/1.2.3 cpython/ 2.7.3 windows/xpheaders = {' user-agent ': ' alexkh '}r = Requests.get (' http://www.zhidaow.com ', headers = headers) Print r.request.headers[' user-agent '] #alexkh
3.12 Persistent Connection Keep-alive
The keep-alive of requests is based on URLLIB3, and the persistent connection within the same session is completely automatic. All requests within the same session will automatically use the appropriate connection.
That is, you do not need any settings, requests will automatically implement Keep-alive. 4. Simple application 4.1 Get the page return code
def get_status (URL): r = requests.get (URL, allow_redirects = False) return r.status_codeprint get_status (' http //www.zhidaow.com ') #200print get_status (' http://www.zhidaow.com/hi404/') #404print get_status (' http:// Mengtiankong.com ') #301print get_status (' http://www.baidu.com/link?url= Qetrfos7tuuqrppa0wltjjr6ffiyi1djprjukx4qy0xnsdo_s9baoo8u1wvjxgqn ') #302print get_status (' http://www.huiya56.com/ Com8.intre.asp?46981.html ') #500
Postscript1. Official documents
Requests the specific installation process, see: Http://docs.python-requests.org/en/latest/user/install.html#install
Requests's official Guide document: http://docs.python-requests.org/en/latest/user/quickstart.html
Requests's Advanced Guide document: http://docs.python-requests.org/en/latest/user/advanced.html#advanced
2, the content of this part of the translation from the official documents, part of their own induction.
3, most use of the idle format, exhausted, the next time directly with the editor format, so more in line with my habits.
4, or that sentence, there is a question message or email.
5. Picture NOTE: Requests an old turtle on official documents.
Transferred from: http://www.zhidaow.com/post/python-requests-install-and-brief-introduction
Installation and simple application of Python requests