On the coding rules in Python

Source: Internet
Author: User

Note: I use Python3.4 as a learning version, the following learning experience is only applicable to Python3.4.

Before reading the Golden Horn King Alex about the coding solution, a lot of harvest. I hereby thank you for your understanding of the code as a beginner.

The coding I know is broadly divided into two categories: the first is the set of encodings that support Chinese, and the second is the set of encodings that support English. As for the coding set of other countries, no discussion is made.

Common code: ascii;unicode;utf-8;big5,;gb2312;gbk;gb18030

Next, I classify the above encodings:

Only English and special character encodings are supported: ASCII

ASCII is a computer coding system based on the Latin alphabet, mainly used to display modern English and other Western European languages, using 8 bits to represent 1 bytes, which means that a 8 -bit binary number is required in English.

only supports encoding for Chinese and special characters: big5,;gb2312;gbk;gb18030

Big5 is a set of encodings used to store traditional Chinese, while gb2312;gbk;gb18030 is used to store Simplified Chinese encoding sets

The number of Chinese characters supported by gb2312;gbk;gb18030 is incremented sequentially, using 2 bytes to represent a Chinese character or special symbol, that is, using these three encoding sets to represent Chinese characters , it requires a binary number.

It should be explained that the current Chinese version of the Windows encoding set using GBK

Supports encoding of Chinese, English and special characters: Unicode;utf-8

Unicode can encode both Chinese and English. Represented by 2 bytes, which means that a character is required to be a binary digit.

UTF-8 can be easily understood as an upgraded version of Unicode. When speaking in English, it is represented by 1 bytes, which means that a character requires a 8 -bit binary number, and when the Chinese is represented, it is represented by 3 bytes, indicating that a character requires a binary number.

After a simple understanding of the common encoding set, we can implement the conversion between each encoding set in Python3.4, it is necessary to note that Python3.4 default to use UTF-8 encoding set (why look at the online data is said to be Unicode??? ).

The following are the validators:

# Author:lucas Import SYS Print (Sys.getdefaultencoding ())

Program Run Result:

Utf-8Process finished with exit code 0

When encoding conversion in Python, when the character encoding is not Unicode, we need to decode the word Mr. Foo through decode to Unicode, and then encode encoding to get the desired encoding.

It is important to note that when decode, we need to tell the computer what encoding we want to convert to Unicode. For example, you want to convert the UTF-8 encoding to Unicode, as follows:

Decode ("UTF-8")

Next we perform code conversion verification:

# Author:lucas # convert test to GBK encoding Import SYS Print (Sys.getdefaultencoding ()) s=" you "s_to_unicode=s.decode ("  utf-8")

This is embarrassing, the program error:

    S_to_unicode=s.decode ("utf-8"'str ' Decode ' UTF-8

It is assumed that the online Python default encoding should mean that when the Python interpreter runs the program, it runs with Unicode encoding, and the program save format should be UTF-8.

Next, assume that the test code is converted to GBK:

# Author:lucas # convert test to GBK encoding Import SYS Print (sys.getdefaultencoding ()) Test=" you "test_to_gbk=test.encode ( " GBK " )print(TEST_TO_GBK)

The results of the operation are as follows:

Utf-8b'\xc4\xe3'Process finished with exit code 0

As you can see, when the Python3 is running the program, the default encoding format is really Unicode, we successfully converted the test to GBK encoding, we can see that test takes 2 bytes in GBK encoding, 16 bits binary number, respectively C4,e3

Next we turn Test into utf-8, which requires a process of decoding and encoding. The procedure is as follows:

#Author:lucas#convert test to GBK encodingImportSYSPrint(sys.getdefaultencoding ()) test="You're"TEST_TO_GBK=test.encode ("GBK")Print(TEST_TO_GBK) Test_to_gbk_utf8=test_to_gbk.decode ("GBK"). Encode ("Utf-8")Print(Test_to_gbk_utf8)

The program runs as follows:

Utf-8b'\xc4\xe3'b'\xe4\xbd\xa0'  Process finished with exit code 0

It can be seen that the test was converted to utf-8 encoding, where test takes 3 bytes and 24 bits of binary number, respectively, e4,bd,a0.

At this point, we are basically familiar with the code conversion rules in the Python environment, the rest of the coding operation process is the same.

Summarize:

When transcoding, we need to know the default encoding format of the current version of the Python interpreter as it runs the program. On this basis, when we do the encoding conversion, remember: Unicode encoding is the intermediate encoding, the other encoding between the conversion, it is necessary to decode (decode) to Unicode, and then encode (encode) into the corresponding encoding. The Unicode moment is in the mind, and we can do it with the code conversion.

  

  

On the coding rules in Python

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.