Talking about the encoding rules in Python and the python encoding rules

Source: Internet
Author: User

Talking about the encoding rules in Python and the python encoding rules

Note: I use Python3.4 as the learning version. The following learning experiences are only applicable to Python3.4.

I read the answer to the code from Alex kingdom, and I have gained a lot. I would like to thank you for your understanding of Encoding as a beginner.

The codes I know are roughly divided into two types: the first class supports the Chinese character set, and the second class supports the English character set. We will not discuss the collections of other countries for the moment.

Common Coding: ASCII; Unicode; UTF-8; big5,; GB2312; GBK; GB18030

Next, I will classify the above codes:

Only English and special characters are supported: ASCII

ASCII is a computer coding system based on Latin letters. It is mainly used to display modern English and other Western European languages, expressed in 8 bits.1Bytes, that is, an English requirement.8Bit binary number.

Only Chinese characters and special characters are supported: big5, GB2312, GBK, and GB18030.

Big5 is used to store collections of Traditional Chinese, while GB2312, GBK, and GB18030 are used to store collections of Simplified Chinese.

The numbers of Chinese characters supported by GB2312, GBK, and GB18030 increase sequentially.2Bytes to represent a Chinese character or a special symbol, that is, to use these three character sets to represent Chinese characters, you need16Bit binary number.

It must be noted that the Chinese version of windows License set currently uses GBK

Supports Chinese, English and special character encoding: Unicode; UTF-8

Unicode supports both Chinese and English encoding. Use2Bytes, that is, a character needs to be16Bit binary number.

The UTF-8 can be simply understood as an upgraded version of Unicode. When it is in English, use1Bytes to indicate that a character needs to be8Binary Number.3Bytes to indicate that a character needs to be24Bit binary number.

After a simple understanding of the commonly used collections, we can achieve the conversion between the collection of Python3.4, which needs to be noted is Python3.4 default use of UTF-8 collections (Why look at the online information said Unicode ???).

The following are the validators:

# Author: Lucasimport sysprint(sys.getdefaultencoding())

Program running result:

utf-8Process finished with exit code 0

When encoding and conversion is performed in Python, when the character encoding is not Unicode, we need to first decode the character to Unicode through decode and then encode to get the expected encoding.

Note that when decode is used, we need to tell the computer what encoding we want to convert to Unicode. For example, if you want to convert the UTF-8 encoding to Unicode, write as follows:

decode(“UTF-8”)

Next we will verify the encoding transfer:

# Author: Lucas # convert test to gbk encoding import sysprint (sys. getdefaultencoding () s = "s_" s_to_unicode = s. decode (" UTF-8 ")

This is embarrassing. The program reports an error:

    s_to_unicode=s.decode("utf-8")AttributeError: 'str' object has no attribute 'decode'utf-8

From this conjecture, the default Python encoding on the Internet should refer to the use of Unicode encoding when the Python interpreter runs the program, and the program save format should be UTF-8.

Next, convert the test encoding to gbk according to the following assumptions:

# Author: Lucas # convert test to gbk encoding import sysprint (sys. getdefaultencoding () test = "you" test_to_gbk = test. encode ("gbk") print (test_to_gbk)

The running result is as follows:

utf-8b'\xc4\xe3'Process finished with exit code 0

We can see that the default encoding format of Python3 is indeed Unicode when running the program. We have successfully converted test to gbk encoding. We can see that test occupies 2 bytes in gbk encoding, 16-bit binary numbers: c4, e3

Next we will convert test to UTF-8, which requires decoding and encoding. The procedure is as follows:

# Author: Lucas # convert test to gbk encoding import sysprint (sys. getdefaultencoding () test = "you" test_to_gbk = test. encode ("gbk") print (test_to_gbk) test_to_gbk_utf8 = test_to_gbk.decode ("gbk "). encode ("UTF-8") print (test_to_gbk_utf8)

The program runs as follows:

utf-8b'\xc4\xe3'b'\xe4\xbd\xa0'Process finished with exit code 0

We can see that test is converted to UTF-8 encoding. test occupies three bytes and has 24-bit binary numbers, which are e4, bd, and a0.

So far, we are familiar with the encoding and conversion rules in the Python environment, and the rest of the encoding procedures are the same.

Summary:

When performing encoding and conversion, we need to know the default encoding format of the python interpreter in the current version when running the program. On this basis, when we perform encoding conversion, remember that Unicode encoding is an intermediate encoding. When other encodings convert each other, we must first decode it to Unicode, then encode (encode) into the corresponding encoding. Unicode is always in our minds, so we can easily convert the code.

 

 

 

 

  

  

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.