Cloth with Python Chinese problem solving method (summed up a number of previous experience, beginners must SEE)

Source: Internet
Author: User
Because Python is a self-documenting document, you can use the Help function to query the usage explanations for each system function. In general, the key usage and attention points are clearly stated in the documentation for this system. I tried to find the Chinese version of the system documentation on the Internet function function explained, but did not find, so I decided to use the English version of the system comes with the function of interpretation to learn.

If you want to do Tkinter and Wxpython programming, want to know the general use of widgets and property introduction, English is not too good, I recommend you, you can go to see the "Python and Tkinter Programming" this book, There are 392 pages to 538 pages of Appendix B and Appendix C selected common functions and nearly all the properties to introduce, wonderful not to be missed.

The tool I mentioned above is ready soon. can query the function without query, and save the keyword key and query results info, so that the next time directly from the list to look out, if found not checked, then manually added to the list list--is such a simple gadget. Everything seems to be going well. But the problem also came: the English info opened, explaining that some words do not know the meaning of the word after the search to write in info, after saving can be opened directly from the hard drive to see. But in the English info input Chinese, the preservation process there is a problem that can not be decoded, that is, decoding to the Chinese part of the following error pops up:

Unicodeencodeerror: ' ASCII ' codec can ' t encode character U ' \u6211 ' in position 61:ordinal not in range (128)

61 of the positions are elastic, that is, the location of the info added to the Chinese language. This error basically always exists, that is, when I want to write the modified info to the file:
Copy CodeThe code is as follows:


fp = open (' Tt.txt ', ' W ')
Fp.write (Info.encode ("UTF-8")) # Error here
Fp.close ()


The three rows themselves appear to be free of errors. But there was an error in the middle of the line of code. Is encode the wrong way? I have tried many kinds of coding, such as ANSI, UTF-8, Shift_JIS, GB2312, GBK, and so on, found no. So I was confused.

Now I know why it's wrong. The problem is that the modified info is the string variable. The data in info is a string of strings that I got from the system through the Help function (that is, the original plain English info) plus the Chinese I entered manually. When I queried the system documentation from the system, I saved the original info as follows:
Copy CodeThe code is as follows:


fp = open (' Tt.txt ', ' W ')
Fp.write (Info)
Fp.close ()


Note that it is wrong to directly write the original info directly to the file. Do you know what it's like to write this code? You open the Tt.txt and look at the encoding method will know that it is encoded in ANSI format. So the error is generated: I query the keyword key, the ANSI format of the string info read into the control display, and then I have to manually add the UTF-8 format of the Chinese characters, so Tong Lian up the formation of the string info, is a chaotic and has a variety of encoding method of the string info, The system cannot write any more than just one encoding to add this mixed string info to Tt.txt again.

So, the conclusion is: when you operate in memory, you can arbitrarily regardless of the encoding method, the system will automatically according to the specific circumstances of the judgment. But if you want to use the Chinese characters, and also to file the way to temporarily save data or strings, please be sure to write in the first time the file in utf-8 format, that is, the following way:
Copy CodeThe code is as follows:


fp = open (' Tt.txt ', ' W ')
Fp.write (Info.encode ("UTF-8"))
Fp.close ()


This will ensure that the next time you read it, you can print and display it without converting it, even as the text of the control. Be sure to pay attention to this.

The problem is found, and there are some other discussions below.

Some people say, as long as the use of the #-*-Coding:utf-8-*-do not do it? actually otherwise

Pass my test (I use the idle (Python2.5.4 GUI) compiler. "1" Whether I start with no #-*-Coding:utf-8-*-, or whether the software is set to use the default Utf-8 encoding, the use of Chinese between the control and the file is not a problem. "2" info= ' Chinese '; Such operations are possible. When you read it, use the normal reading method. The reason I think is because the compiler upgrade, solve the problem of Chinese display and use, the early Chinese language is not able to use the situation now no longer exists.
Copy CodeThe code is as follows:


#coding =utf-8
Try
Jap=open ("Jap.txt", "R")
Chn=open ("Chn.txt", "R")
Utf=open ("Utf.txt", "W")

Jap_text=jap.readline ()
Chn_text=chn.readline ()
#先decode成UTF-16, then encode into UTF-8
Jap_text_utf8=jap_text.decode ("Shift_JIS"). Encode ("UTF-8")
#不转成utf-8 can also
Chn_text_utf8=chn_text.decode ("GB2312"). Encode ("UTF-8")
#编码方式大小写都行utf-8 is the same
Utf.write (Jap_text_utf8)
Utf.write (Chn_text_utf8)
Utf.close ()
Except Ioerror,e:
Print "Open File Error", E


This is the code I extracted from http://www.jb51.net/article/26542.htm in the Learning Python processing Python encoding article. Here to explain, the above Jap_text_utf8 and Chn_text_utf8 to ensure that the machine is the default encoding, or Utf-8 encoding method, the most important thing is to keep consistent. After a unified encoding for UTF-8, you can write to a file and read it again to use without problems. When reading, use the following normal method:
Copy CodeThe code is as follows:


Filen = open (' Tt.txt ')
info = Filen.read ()
Print Info


Other than that. Someone uses the following method to encode and convert:
Copy CodeThe code is as follows:


Import Sys
Reload (SYS)
Sys.setdefaultencoding (' UTF8 ')

def CONVERTCN (s):
Return S.encode (' GB18030 ')

def printfile (filename):
f = file (filename, ' R ')
For F_line in F.readlines ():
Print CONVERTCN (f_line)
F.close ()

if __name__ = = "__main__":
Printfile (' 1.txt ')
Print CONVERTCN ("\n****** Press any key to exit! ******")
Print Sys.stdin.readline ()


Through my tests, this approach is not feasible. If the second line is removed, the setdefaultencoding function of the third row will be invalid, and if the second row is left, the third and future codes are not executed (although not an error). If this is possible, please try it.
In addition, "Python Chinese garbled problem in-depth analysis," http://www.jb51.net/article/26543.htm article on a lot of text how to encode the question, I have an eye-opener. The principle of text encoding: The original is to add the appropriate annotation at the beginning of the text to represent the internal encoding, so the interpreter will be a certain corresponding rules to follow a certain step of the byte or flexible way to translate bytes, to get the original text, the length of the translation and the rules are exactly the beginning of the description of the corresponding. So, if your text is a single byte encoding, then you can add an appropriate rule at the top of your code to tell others how to translate your encoded text. where Bom_utf_8 and other text at the end of the knowledge is also very interesting, similar to the bom_utf_16 and so on, different coding style at the end of the symbol is different, we can pay attention to.
  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.