Unicode characters can be processed as a generalization, starting with python1.6.
one, several common coding formats.
1.1, ASCII, expressed in 1 bytes.
1.2, UTF-8, expressed in 1 to three bytes, the ASCII code occupies only 1 bytes, ASCII encoding is a subset of UTF-8.
1.3, UTF-16, expressed in 2 bytes, in Python, the meaning of Unicode is UTF-16.
Second, the Python source file encoding and decoding, we write the Python program from generation to execution of the process as follows:
Editor----> Source code----> Interpreter----> Output results
2.1. The editor determines the encoding format of the source code (set in the editor)
2.2. It is also necessary for the interpreter to know the encoding format of the source code (unfortunately it is difficult to learn the encoding format of the source file from the encoded data)
2.3, add: Under Windows when the source code in the UltraEdit to UTF-8, the document will be recorded in the BOM mark (no need to investigate) so the ActivePython interpreter will automatically identify the source file is UTF-8 format, but if you edit the source file with Eclipse, Although the file encoding in the editor is UTF-8, but because the BOM flag is not entered, you must add #coding=utf-8 at the beginning of the source file, and it is interesting to note that the source file of the interpreter is encoded in comments.
2.4, Examples: for example, we want to output to the terminal "I am Chinese."
Copy CodeThe code is as follows:
#coding =utf-8 tells the Python interpreter that the Utf-8 code is used, and I'm using Eclipse+pydev.
Print "I am Chinese" #源文件本身也要存成UTF-8 code
three, the conversion of the code, two kinds of code conversion to use UTF-16 as a transit station.
Example: If there is a text file Jap.txt, there is the content "private は Chinese ですので." ", the encoding format is Japanese encoded SHIFT_JIS,
There is also a text file Chn.txt, the content is "People's Republic of China", the encoding format is Chinese encoding GB2312.
How do we combine the contents of two files and store them in utf.txt and do not display garbled, you can use the contents of the two files into the UTF-8 format, because the UTF-8 contains both Chinese and Japanese encoding.
Copy CodeThe code is as follows:
#coding =utf-8
Try
Jap=open ("E:/jap.txt", "R")
Chn=open ("E:/chn.txt", "R")
Utf=open ("E:/utf.txt", "W")
Jap_text=jap.readline ()
Chn_text=chn.readline ()
#先decode成UTF-16, then encode into UTF-8
Jap_text_utf8=jap_text.decode ("Shift_JIS"). Encode ("UTF-8") #不转成utf-8 can also
Chn_text_utf8=chn_text.decode ("GB2312"). Encode ("UTF-8") #编码方式大小写都行utf-8 as well
Utf.write (Jap_text_utf8)
Utf.write (Chn_text_utf8)
Except Ioerror,e:
Print "Open File Error", E
Iv. TK Library supports Ascii,utf-16,utf-8
Copy CodeThe code is as follows:
#coding =utf-8
From Tkinter Import *
Try
Jap=open ("E:/jap.txt", "R")
Str1=jap.readline ()
Except Ioerror,e:
Print "Open File Error", E
ROOT=TK ()
Label1=label (Root,text=str1.decode ("Shift_JIS")) #如果没有decode则显示乱码
Label1.grid ()
Root.mainloop ()
The above is the basic process of learning python to deal with Python coding, and we hope to help you.