(Note: This article is part of the content from the Internet, due to the author's limited, shortcomings, but also hope to comment.) )
Remember a few days ago, the department of a little girl asked me, how her python print out the Chinese information is garbled? I walked over, thinking about one or two, an instant to her fix, in fact, this is the character encoding conversion problem. At this time, I noticed that the little girl showed a gleam of worship. So I thought, if you can't even make the coding problem, how do you pick up the girls? May be a part of the people will enter this misunderstanding, I am the level of my pupils, my understanding of the combination of online information to write down.
Note : The Python3 default encoding is Unicode, while Python2 is an ASCII code. The Windows environment is GBK encoded by default .
Common coding error causes:
1. Default encoding of the Python interpreter
2. Python source file encoding
3. Encoding used by the terminal
4. Language settings for the operating system
One, the type of the code
I, ASCII accounts for 1 bytes, only English is supported
II, GB2312 account for 2 bytes, support 6700+ Kanji
III, GBK GB2312 upgrade version, support 21000+ kanji, Chinese 2 bytes .
IV, Unicode 2-4 bytes already included 136,690 characters
V, UTF-8: Use 1, 2, 3, 4 bytes to represent all characters, a priority of 1 characters, not enough to increase one byte, up to 4 bytes. English accounts for 1 words
section, European languages accounted for 2, East Asia accounted for 3, and other and special characters accounted for 4 . Chinese 3 bytes.
VI, UTF-16: Use 2, 4 bytes to represent all characters, 2 bytes is preferred, otherwise 4 bytes are used.
Second, the implementation process of Python3
1. The interpreter finds the code file, loads the code string into memory as defined by the file header, and turns it into Unicode
2. Interpret the code string as a grammatical rule,
3. All variable characters are declared in Unicode encoding
Py3 automatically converting the file encoding to Unicode,python2 does not automatically convert the file encoding to Unicode presence memory. Manual transcoding is required.
Third, manual transcoding rules
UTF-8--decode decode--Unicode
Unicode--Encode encoding--gbk/utf-8
Use the type to view the encoded form, Unicode is ' Unicode ', GBK and utf-8 are ' str or bytes '.
Instance:
ASCII code is default in Python2
#coding =utf-8#ASCII is the default in Python2 and is typically added to Utf-8 programmingA =' Code' #A is the Utf-8 typeb = A.decode ('Utf-8')#b is a Unicode typec = B.encode ('GBK')#c is the GBK typeD = C.decode ('GBK'). Encode ('Utf-8')#convert C to Unicode first and then to Utf-8PrintA, B,c,dPrintType (a), type (b), type (c), type (d)
Output results
Unicode is default in Python3
a = " encode " # A is a unicode type b = A.encode ( " utf-8 " ) # b is utf-8 type c = a.encode ( " GBK ") # c is the GBK type print (A, b,c) print (type (a), type (b), type (c)) # python3 default is the Unicode type
Output results
default GBK in Windows
>>> A ='Coding'>>> B = A.decode ('GBK')#windows defaults to GBK, first decoded to Unicode>>> C = B.encode ('Utf-8')#convert Unicode to Utf-8>>>a'\xb1\xe0\xc2\xeb'>>>BU'\u7f16\u7801'>>>C'\xe7\xbc\x96\xe7\xa0\x81'>>>Print(a,b,c) ('\xb1\xe0\xc2\xeb', u'\u7f16\u7801','\xe7\xbc\x96\xe7\xa0\x81')>>>type (a)<type'Str'>>>>type (b)<type'Unicode'>>>>type (c)<type'Str'>>>>
just write this.
Hey, we had a dream, but that year graduated, dream to another city to move bricks. Dream is my classmate.
About Python character encodings encode and decode