Python Requests Library Chinese garbled problem

Source: Internet
Author: User

When using the requests library, there will be the situation of Chinese garbled

Reference Code Analysis Python requests Library Chinese coding problem

Python HTTP Library Requests Chinese page garbled solution.

Analysis

According to the two articles,

The source code of the analysis requests found that text returned the processed Unicode data, and the content returned with the bytes type of raw data. That is to say, R.content saves computational resources relative to R.text, content is bytes return. and text is decode into Unicode. If headers does not have the CharSet character set, text () calls Chardet to compute the character set, which is the CPU-consuming thing.

1 2 3 4 Import Requests response = Requests.get (' http://www.dytt8.net/index.htm ') print (response.text[200:300])

Here to test the use of Movie Paradise page because the page is not very standard

Output is

  

The output is garbled

Response.encoding

From the second article you can see that the Reqponse header specifies only type, but does not specify the encoding (generally now the page encoding is directly in the HTML page), find the original page can be seen

Then find a standard point of the Web page to view, such as Blog Park web blog

Response Herders's content-type specifies the encoding type

The 16th chapter of the HTTP authoritative guide mentions that if the Content-type field in the HTTP response does not specify CharSet, the default page is ' iso-8859-1 ' encoding. This processing English page of course no problem, but the Chinese page, there will be garbled.

Solve

If you can use the r.encoding = ' xxx ' mode when you determine the use of text and you have already learned the character set encoding for that station, when you specify the encoding, requests will be converted according to the character set encoding you set when you assign the text.

Real code can be obtained using apparent_encoding

1 2 >>> response.apparent_encoding ' GB2312 '

It's the program itself, it's going to be slow.

You can also extract from the HTML meta

1 2 >>> requests.utils.get_encodings_from_content (response.text) [' gb2312 ']

Solving method

1 2 # response.encoding = response.apparent_encoding response.encoding = ' gb2312 '

The output at this time is

  

  

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.