To summarize, you can handle Unicode characters from the python1.6 start.
one, several common coding formats.
1.1, ASCII, expressed in 1 bytes.
1.2, UTF-8, with 1 to three bytes, representing ASCII code for only 1 bytes, ASCII encoding is a subset of UTF-8.
1.3, UTF-16, in 2 bytes, in Python, the meaning of Unicode is UTF-16.
Second, the Python source file encoding and decoding, we write the Python program from the production to the implementation of the process is as follows:
Editor----> Source code----> Interpreter----> Output
2.1. The editor determines the encoding format of the source code (set in the editor)
2.2, also must have the interpreter to know the source code format (unfortunately it is very difficult to learn from the encoded data source file encoding format)
2.3. Add: In Windows when using UltraEdit to save the source code into UTF-8, the document will be recorded in the BOM mark (not to be traced) so the ActivePython interpreter automatically recognizes that the source file is UTF-8 format, but if you edit the source file with Eclipse, Although specifying the file encoding as UTF-8 in the editor, but because the BOM flag is not recorded, you must add #coding=utf-8 at the beginning of the source file, it is interesting to use annotations to prompt the interpreter source file encoding.
2.4, for example: we want to output to the terminal "I am Chinese."
Copy Code code as follows:
#coding =utf-8 told the Python interpreter to use the Utf-8 code, I used Eclipse+pydev.
Print "I am Chinese" #源文件本身也要存成UTF-8 code
Three, the conversion of the encoding, two kinds of coding conversion to use UTF-16 as a transit point.
For example: If there is a text file Jap.txt, there is content "private は Chinese です." "The encoding format is a Japanese-encoded shift_jis,
There is also a text file Chn.txt, the content is "People's Republic of China", the encoding format is Chinese encoding GB2312.
How do we merge the contents of two files together and store them in utf.txt and not display garbled text, you can use the contents of the two files into the UTF-8 format, because the UTF-8 contains the Chinese encoding and Japanese encoding.
Copy Code code as follows:
#coding =utf-8
Try
Jap=open ("E:/jap.txt", "R")
Chn=open ("E:/chn.txt", "R")
Utf=open ("E:/utf.txt", "W")
Jap_text=jap.readline ()
Chn_text=chn.readline ()
#先decode成UTF-16, then encode into UTF-8.
Jap_text_utf8=jap_text.decode ("Shift_JIS"). Encode ("UTF-8") #不转成utf-8 can also
Chn_text_utf8=chn_text.decode ("GB2312"). Encode ("UTF-8") #编码方式大小写都行utf-8 also the same
Utf.write (Jap_text_utf8)
Utf.write (Chn_text_utf8)
Except Ioerror,e:
Print "Open File Error", E
Four, TK library support Ascii,utf-16,utf-8
Copy Code code as follows:
#coding =utf-8
From Tkinter Import *
Try
Jap=open ("E:/jap.txt", "R")
Str1=jap.readline ()
Except Ioerror,e:
Print "Open File Error", E
ROOT=TK ()
Label1=label (Root,text=str1.decode ("Shift_JIS")) #如果没有decode则显示乱码
Label1.grid ()
Root.mainloop ()
The above is the basic process of learning Python to handle Python coding, and I hope it will help.