A detailed description of the character coding problems that Python often appears

Last Update:2016-01-15 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The encoding error often occurs when Python is doing string processing or reading a file through the Open function: unicodedecodee-rror: ' ASCII ' codec can ' t decode byte 0xe6 in position 0: Ordinal not in

Range (128) This is due to the error that occurs when the codec is not compatible with Python during encoding and decoding. So now it is necessary to have a clear understanding of the character encoding, do not need to the internal content of the codec in-depth study, only need to understand the relevant coding rules, in the future encounter such a problem can be related to their own processing on the line.

First of all, we need to make it clear that the internal storage of the computer is 0,1 encoded, not in the form of the characters we see, such as the form of one-to-one, Chinese characters, etc., the computer does not know Chinese characters or character, only know 0 or 1, then how do these characters or characters appear? Is that we make a special code for each character, and then decode, in the terminal or file to display the familiar characters or Chinese characters, rather than 0 or 1.

1, now the character encoding format has ASCII encoding, Unicode encoding, UTF-8 encoding

1.1 ASCII encoding

This is the United States for English characters in an encoding format, a total of 128 characters including numbers, uppercase and lowercase characters, as well as arithmetic and logical operators, including 32 can not print special characters such as space, TAB key, line break and so on. They found that the number of one byte can represent the 128 characters, because a byte has 8 bits, 0 or 1 on the 8 bits, then the state can be represented by 2 8 powers altogether 256, then each state represents a character perfectly capable. Then they use the lower 7 bits to denote these characters, and the highest bits are expressed in%. For English, ASCII has been able to speed up the exchange of information through computers.

1.2 Unicode encoding

ASCII encoding only applies to English characters, but there are many languages in the world, ASCII is not competent, such as Chinese, ASCII is not possible to encode, so our country uses the code when GB2312, followed by a lot of standards. But there are a lot of languages in the world, people think that can adopt a unified coding method, all the world's encoding format is unique to the marking of Unicode code is born, yes, it is used to encode all the text in the world standard.

Unicode encoding is encoded in the format of UCS-8,UCS-16,UCS-32, which uses fixed bytes to encode characters.

1.3 UTF-8 Encoding

All called (Unicode transformation Format) by the full name we can find, in fact, the UTF-8 encoding is in a single byte encoding Unicode. One point to note is that the UTF-8 encoding and Unicode encoding are not the same encoding method.

The following table:

Unicode encoding (16 binary)	UTF-8 byte stream (binary)
000000–00007f	0xxxxxxx
000080–0007ff	110xxxxx 10xxxxxx
000800–00ffff	1110xxxx 10xxxxxx 10xxxxxx
010000–10ffff	11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

2. Python Encoding Processing

Python can be compared to a pool, which has an entrance and an exit, and this pool of strings used in the processing encoding is Unicode, then at the entrance needs to be entered in the character decoding operation, decoding using the decode () function, You can use Decode (this is required to fill in the encoding of this file), and then through the Unicode-related library functions to the string processing, at the output, we need to encode the output data into the format we want to store, the function is encode (parameter is the encoding format we want), For Unicode code and we want to output the encoding format has duplicate characters, can not be encoded operations, such as numbers, English characters, etc., can be directly stored.

3, the Python source file encoding format

That is xxx.py the encoding format of this file is ASCII encoding, you can get the default encoding format through the SYS module's getdefaultencoding () function, and when we want to change the encoding format of the source file, we need to enter # _*_ at the beginning of the source file coding : UTF-8 encoding operation to change the encoding format of the source file, or you can also set the encoding format of the source file through the Sys module's function setdefaultencoding ().

A detailed description of the character coding problems that Python often appears

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

A detailed description of the character coding problems that Python often appears

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

A detailed description of the character coding problems that Python often appears

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support