Python encode understanding

Source: Internet
Author: User

Basic knowledge

The computer can handle only 0 and 12 digits, so all data (text, images) must become numbers 0 and 1.

ASCII encoding

The computer was invented by the Americans, so only 127 characters were written into the computer, the common Arabic numerals, the letter case, and the symbols on the keyboard. This is known as ASCII encoding. For example, the ASCII encoding of A is 65,65 and then converted to 01000001, which is what the computer handles.

Each country's own code

China has developed a GB2312 encoding, compatible with ASCII encoding, then assume that 61,62,63 in GB2312 encoding corresponding to the MU class network three words, in the ASCII code table corresponds to A,b,c, in Shift_JIS (Japan's Code) corresponds to ハロー, So Chinese text with GB2312 encoding, stored in the computer for a string of 01 numbers, the Japanese use Shift_JIS decoding, read all is a string to read the Japanese garbled, and even can not parse the binary code.

Unicode encoding

Later, Unicode encoding, which summarizes almost all of the world's languages, uses the same encoding, so that everyone uses the same encoding to encode, decode, and eventually get the correct text.

UTF-8 encoding

However, because Unicode encoding is encoded using 16 digits, it is too resource-intensive, so there is the UTF-8 encoding, which uses 8 digits to store it, and the Chinese uses two 8-bit encoding to store it. Greatly avoids the wasted space

Text storage in the computer

So the text in Notepad (such as Chinese), encoded using Unicode, is encoded as UTF-8 when stored on the computer, and when we open it, it is converted from UTF-8 encoding to Unicode encoding and then to the encoding of the respective country from the Unicode encoding

Transmission of text between networks

After the Unicode data on the server is read out, it is converted to UTF-8 encoding (bandwidth saving), transmitted to the browser,

Python3

Python3 strings use Unicode encoding by default, so Python3 supports multiple languages;

The Unicode representation of STR through encode () can be encoded as a specified bytes

If bytes uses ASCII encoding, characters that are not present in the ASCII code table will be #表示 with \x#, which is decoded with ' \x## '. Decode (' corresponding code ').

>>>ImportChardet>>> str='China a'#utf-8 Encoding>>> Str.encode ('Utf-8') b'\xe4\xb8\xad\xe5\x9b\xbda'>>> Chardet.detect (Str.encode ('Utf-8')){'encoding':'Utf-8','confidence': 0.7525,'language':"'}>>> Str.encode ('Utf-8'). Decode ('GBK') Traceback (most recent): File"<stdin>", Line 1,inch<module>Unicodedecodeerror:'GBK'Codec can'T decode byte 0xad in position 2:illegal multibyte sequence>>> Str.encode ('Utf-8'). Decode ('Utf-8')'China a'#GB2312/GBK Encoding>>> Str.encode ('gb2312') b'\XD6\XD0\XB9\XFAA'>>> Str.encode ('GBK') b'\XD6\XD0\XB9\XFAA'>>> Chardet.detect ('China I love you ah ah ah ah ah haha haha ah haha haha'. Encode ('gb2312')){'encoding':'IBM855','confidence': 0.3697632002333717,'language':'Russian'}>>> Chardet.detect (Str.encode ('GBK')){'encoding':'IBM855','confidence': 0.6143757788492946,'language':'Russian'}#chardet recognition is correct only if the text has a certain length and a certain degree of complexity>>> Chardet.detect ('China I love you ah ah ah ah oh haha haha i'm a little bird'. Encode ('gb2312')){'encoding':'GB2312','confidence': 0.7142857142857143,'language':'Chinese'}>>> Str.encode ('GBK'). Decode ('IBM855')'Ол╣щa'>>> Str.encode ('GBK'). Decode ('Utf-8') Traceback (most recent): File"<stdin>", Line 1,inch<module>Unicodedecodeerror:'Utf-8'Codec can'T decode byte 0xd6 in position 0:invalid continuation byte>>> Str.encode ('GBK'). Decode ('GBK')'China a'

Python encode understanding

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.