The evil character encoding in Python 2

Source: Internet
Author: User

Python2 if the file exists in Chinese, you must specify #-*-Coding:utf8-*-or #coding:utf8, otherwise you will get an error. Then why is this?

We know that in the early days of computer development, the computer can only recognize letters, numbers and some basic symbols, which use 8-bit storage space to store all the content, that is, 2^8=256 a different result, this is the ASCII code. In the circumstances, and did not think of the future expansion of other languages, with the continuous development of computer use more and more widely, using 8-bit storage space has long been unable to meet people's daily needs, so the Unicode (universal code) was born. As the name implies, Unicode contains all the different languages of all countries, and its storage space is increased from 8 bits of storage space to at least 16 bits of storage space, that is, at least 2^16=65536 different results, which is enough to meet the daily needs of people, which is precisely the character encoding used in Python2, However, the problem followed, if the initial use of ASCII character encoding to save the number 1, then it is stored in 0000001, and the use of Unicode to save the number 1, then the storage method is expanded to 00000000 00000001, although the data stored there is no problem, But it will waste a lot of space in vain. Smart people have invented UTF-8, GBK, GB2312 and other character encodings for Unicode, which are further optimized for Unicode encoding, which are all at the same level of character encoding. in the case of UTF-8 character encoding, which can satisfy the encoding of ASCII characters (numbers, characters, letters) are stored in 8-bit storage space, when the European text is encountered will choose 16-bit storage for storage, that is, two bytes of storage, when encountered Chinese characters will use 24-bit storage space storage, That is, three bytes of storage. As you can see,UTF-8, GBK and so on are compared to Unicode it is an optimization, for storing different text, using different storage space digits.

From the above, in the Python2 file start to explicitly specify the character encoding, if you do not specify the system will directly error, and this involves the conversion of character encoding problem: when the conversion from UTF-8 character encoding to GBK character encoding is not able to directly convert, The Decode method needs to be decoded from the UTF-8 encoding, when the character encoding is converted to Unicode, and then the Unicode encoding is encoded using the Encode method and converted to GBK character encoding. the reason that Python2 character encoding is a headache, refers to the UTF-8 sibling character encoding conversion GBK sibling character encoding, need to use Unicode character encoding for conversion. Reference

  

By default, the UTF-8 character encoding is specified globally in Python3, even if no character encoding setting is specified, the printout is normal, and there is no need for decoding operations in Python, but encoding is still required.

Summary: If you need to convert between different character encodings in Python2, you need to use Unicode to do this, and you cannot convert between different character encodings, which is optimized in Python3, and all character encodings are set to UTF-8 by default. Completely solved because of the character coding headache problem.

The evil character encoding in Python 2

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.