About Python character encodings encode and decode

Source: Internet
Author: User

(Note: This article is part of the content from the Internet, due to the author's limited, shortcomings, but also hope to comment.) )

Remember a few days ago, the department of a little girl asked me, how her python print out the Chinese information is garbled? I walked over, thinking about one or two, an instant to her fix, in fact, this is the character encoding conversion problem. At this time, I noticed that the little girl showed a gleam of worship. So I thought, if you can't even make the coding problem, how do you pick up the girls? May be a part of the people will enter this misunderstanding, I am the level of my pupils, my understanding of the combination of online information to write down.

Note : The Python3 default encoding is Unicode, while Python2 is an ASCII code. The Windows environment is GBK encoded by default .

Common coding error causes:

1. Default encoding of the Python interpreter

2. Python source file encoding

3. Encoding used by the terminal

4. Language settings for the operating system

One, the type of the code

I, ASCII accounts for 1 bytes, only English is supported

II, GB2312 account for 2 bytes, support 6700+ Kanji

III, GBK GB2312 upgrade version, support 21000+ kanji, Chinese 2 bytes .

IV, Unicode 2-4 bytes already included 136,690 characters

V, UTF-8: Use 1, 2, 3, 4 bytes to represent all characters, a priority of 1 characters, not enough to increase one byte, up to 4 bytes. English accounts for 1 words

section, European languages accounted for 2, East Asia accounted for 3, and other and special characters accounted for 4 . Chinese 3 bytes.

VI, UTF-16: Use 2, 4 bytes to represent all characters, 2 bytes is preferred, otherwise 4 bytes are used.

Second, the implementation process of Python3

1. The interpreter finds the code file, loads the code string into memory as defined by the file header, and turns it into Unicode

2. Interpret the code string as a grammatical rule,

3. All variable characters are declared in Unicode encoding

Py3 automatically converting the file encoding to Unicode,python2 does not automatically convert the file encoding to Unicode presence memory. Manual transcoding is required.

Third, manual transcoding rules

UTF-8--decode decode--Unicode

Unicode--Encode encoding--gbk/utf-8

Use the type to view the encoded form, Unicode is ' Unicode ', GBK and utf-8 are ' str or bytes '.

Instance:

ASCII code is default in Python2

#coding =utf-8#ASCII is the default in Python2 and is typically added to Utf-8 programmingA =' Code'                       #A is the Utf-8 typeb = A.decode ('Utf-8')#b is a Unicode typec = B.encode ('GBK')#c is the GBK typeD = C.decode ('GBK'). Encode ('Utf-8')#convert C to Unicode first and then to Utf-8PrintA, B,c,dPrintType (a), type (b), type (c), type (d)

Output results

Unicode is default in Python3

 a =  "  encode                         " #   A is a unicode type  b = A.encode ( " utf-8  " ) #   b is utf-8 type  c = a.encode ( " GBK          ") #  c is the GBK type  print   (A, b,c)  print   (type (a), type (b), type (c))  #  python3 default is the Unicode type  

Output results

default GBK in Windows

>>> A ='Coding'>>> B = A.decode ('GBK')#windows defaults to GBK, first decoded to Unicode>>> C = B.encode ('Utf-8')#convert Unicode to Utf-8>>>a'\xb1\xe0\xc2\xeb'>>>BU'\u7f16\u7801'>>>C'\xe7\xbc\x96\xe7\xa0\x81'>>>Print(a,b,c) ('\xb1\xe0\xc2\xeb', u'\u7f16\u7801','\xe7\xbc\x96\xe7\xa0\x81')>>>type (a)<type'Str'>>>>type (b)<type'Unicode'>>>>type (c)<type'Str'>>>>

just write this.

Hey, we had a dream, but that year graduated, dream to another city to move bricks. Dream is my classmate.

About Python character encodings encode and decode

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.