About Python character encodings encode and decode

Last Update:2017-09-24 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

(Note: This article is part of the content from the Internet, due to the author's limited, shortcomings, but also hope to comment.) ）

Remember a few days ago, the department of a little girl asked me, how her python print out the Chinese information is garbled? I walked over, thinking about one or two, an instant to her fix, in fact, this is the character encoding conversion problem. At this time, I noticed that the little girl showed a gleam of worship. So I thought, if you can't even make the coding problem, how do you pick up the girls? May be a part of the people will enter this misunderstanding, I am the level of my pupils, my understanding of the combination of online information to write down.

Note : The Python3 default encoding is Unicode, while Python2 is an ASCII code. The Windows environment is GBK encoded by default .

Common coding error causes:

1. Default encoding of the Python interpreter

2. Python source file encoding

3. Encoding used by the terminal

4. Language settings for the operating system

One, the type of the code

I, ASCII accounts for 1 bytes, only English is supported

II, GB2312 account for 2 bytes, support 6700+ Kanji

III, GBK GB2312 upgrade version, support 21000+ kanji, Chinese 2 bytes .

IV, Unicode 2-4 bytes already included 136,690 characters

V, UTF-8: Use 1, 2, 3, 4 bytes to represent all characters, a priority of 1 characters, not enough to increase one byte, up to 4 bytes. English accounts for 1 words

section, European languages accounted for 2, East Asia accounted for 3, and other and special characters accounted for 4 . Chinese 3 bytes.

VI, UTF-16: Use 2, 4 bytes to represent all characters, 2 bytes is preferred, otherwise 4 bytes are used.

Second, the implementation process of Python3

1. The interpreter finds the code file, loads the code string into memory as defined by the file header, and turns it into Unicode

2. Interpret the code string as a grammatical rule,

3. All variable characters are declared in Unicode encoding

Py3 automatically converting the file encoding to Unicode,python2 does not automatically convert the file encoding to Unicode presence memory. Manual transcoding is required.

Third, manual transcoding rules

UTF-8--decode decode--Unicode

Unicode--Encode encoding--gbk/utf-8

Use the type to view the encoded form, Unicode is ' Unicode ', GBK and utf-8 are ' str or bytes '.

Instance:

ASCII code is default in Python2

#coding =utf-8#ASCII is the default in Python2 and is typically added to Utf-8 programmingA =' Code'                       #A is the Utf-8 typeb = A.decode ('Utf-8')#b is a Unicode typec = B.encode ('GBK')#c is the GBK typeD = C.decode ('GBK'). Encode ('Utf-8')#convert C to Unicode first and then to Utf-8PrintA, B,c,dPrintType (a), type (b), type (c), type (d)

Output results

Unicode is default in Python3

 a =  "  encode                         " #   A is a unicode type  b = A.encode ( " utf-8  " ) #   b is utf-8 type  c = a.encode ( " GBK          ") #  c is the GBK type  print   (A, b,c)  print   (type (a), type (b), type (c))  #  python3 default is the Unicode type

Output results

default GBK in Windows

>>> A ='Coding'>>> B = A.decode ('GBK')#windows defaults to GBK, first decoded to Unicode>>> C = B.encode ('Utf-8')#convert Unicode to Utf-8>>>a'\xb1\xe0\xc2\xeb'>>>BU'\u7f16\u7801'>>>C'\xe7\xbc\x96\xe7\xa0\x81'>>>Print(a,b,c) ('\xb1\xe0\xc2\xeb', u'\u7f16\u7801','\xe7\xbc\x96\xe7\xa0\x81')>>>type (a)<type'Str'>>>>type (b)<type'Unicode'>>>>type (c)<type'Str'>>>>

just write this.

Hey, we had a dream, but that year graduated, dream to another city to move bricks. Dream is my classmate.

About Python character encodings encode and decode

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

About Python character encodings encode and decode

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

About Python character encodings encode and decode

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support