Little white Python path day1 character encoding

Source: Internet
Author: User

character encoding

The Python interpreter encodes the content when it loads the code in the. py file (default Ascill)

ASCII (American Standard Code for Information interchange, United States Standards Information Interchange Code) is a set of computer coding systems based on the Latin alphabet, mainly used to display modern English and other Western European languages, which can be used up to 8 Bit to represent (one byte), that is: 2**8 = 256-1, so the ASCII code can only represent a maximum of 255 symbols.

About Chinese

To deal with Chinese characters, programmers designed GB2312 for Simplified Chinese and big5 for traditional Chinese.

GB2312 (1980) included a total of 7,445 characters, including 6,763 Kanji and 682 other symbols. The inner code range of the Chinese character area is high byte from B0-f7, low byte from A1-fe, occupy code bit is 72*94=6768. 5 of these seats are d7fa-d7fe.

GB2312 supports too few Chinese characters. The 1995 Chinese character extension Specification GBK1.0 contains 21,886 symbols, which are divided into Chinese characters and graphic symbol areas. The Chinese character area consists of 21,003 characters. The 2000 GB18030 is the official national standard for replacing GBK1.0. The standard contains 27,484 Chinese characters, as well as Tibetan, Mongolian, Uyghur and other major minority characters. Now the PC platform must support GB18030, the embedded products are not required. So mobile phones, MP3 generally only support GB2312.

From ASCII, GB2312, GBK to GB18030, these coding methods are backwards compatible, meaning that the same character always has the same encoding in these scenarios, and the latter standard supports more characters. In these codes, English and Chinese can be handled in a unified manner. The method of distinguishing Chinese encoding is that the highest bit of high byte is not 0. According to the programmer, GB2312, GBK, and GB18030 belong to the double-byte character set (DBCS).

Some Chinese Windows default internal code or GBK, you can upgrade to GB18030 through the GB18030 upgrade package. But GB18030 relative GBK increases the character, the ordinary person is difficult to use, usually we still use the GBK to refer to the Chinese Windows inside code.

It is clear that the ASCII code cannot represent all the words and symbols in the world, so it is necessary to create a new encoding that can represent all the characters and symbols, namely:Unicode

Unicode (Uniform Code, universal Code, single code) is a character encoding used on a computer. Unicode is created to address the limitations of the traditional character encoding scheme, which sets a uniform and unique binary encoding for each character in each language, which specifies that characters and symbols are represented by at least 16 bits (2 bytes), that is: 2 **16 = 65536,
Note: Here is a minimum of 2 bytes, possibly more

UTF-8, which is compression and optimization of Unicode encoding, does not use a minimum of 2 bytes, but instead classifies all characters and symbols:The contents of the ASCII code are saved with 1 bytes, the characters in Europe are saved in 2 bytes, and the characters in East Asia are saved in 3 bytes ...

ASCII 255 1betys

-->1980 gb2312 7K Multi

-->1995 GBK1.0 2W Multi

-->2000 GB18030 27K Multi

-->unicode 2bytes

--->utf-8 en:1bytes, zh:3bytes

Therefore, when the code in the. py file is loaded by the Python interpreter 2.7, the content is encoded (default Ascill), if the following code is the case:

Error: ASCII code cannot be expressed in Chinese

123 #!/usr/bin/env python print"你好,世界"

Correction: It should be shown to tell the Python interpreter what code to use to execute the source code, i.e.:

1234 #!/usr/bin/env python# -*- coding: utf-8 -*- print"你好,世界"
now Python3.x has supported Chinese Notes

When line comment: # is commented content

Multiline Comment: "" "Annotated Content" ""

Little white Python path day1 character encoding

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.