Before explaining the Python Chinese issue, let's first talk about What Python is, and there was a strong interest in Python in the past. Who knows this old friend once again has an unexpected problem with Chinese? In the code, the problem of Chinese Python is always bothering us ..
It is no wonder that we are not the Chinese who invented computers. Otherwise, computers all over the world now support and must support GBK. I am not the one who writes this article, but a kingfa programmer on the other side of the ocean, and the title is changed to studying the english problem in 'python '".
Let's face the real problems. Compared with java, the performance of Chinese problems in Python is more intense. "Fierce" means not to say it is more serious or difficult to solve. Only Python uses strict by default for decode and encode errors, that is, an error is reported directly, while java uses replace to handle them, therefore, a lot "?? ".
In addition, Python's default encoding is ASCII, while java's default encoding is consistent with the operating system's encoding. At this point, I think java is more reasonable. This is more friendly to programmers and reduces the frustration of newbies at the beginning, which is conducive to language promotion.
However, Python also has its own principle. After all, ASCII is the only character set supported by all platforms in the world, and the problem always occurs, it is better to face it earlier than to escape it. Okay. Now, let's talk about the symptoms of Chinese problems in Python. Before that, we should first understand that Python has two types of strings, each of which is a general string and each character is represented by 8 bits) and the Unicode string is represented by one or more bytes ).
They can be converted to each other and have a more comprehensive description. I will not talk about them here. Let's look at the following code:
- #-*-Coding: gb2312-*-# It must be in the first or second line.
- Print "------------- code 1 ----------------"
- A="A. I love you"
- Print
- Print a. find ("I ")
- B=A. Replace ("love", "like ")
- Print B
- Print "-------------- code 2 ----------------"
- X="A. I love you"
- Y=Unicode(X, "gb2312 ")
- Print y. encode ("gb2312 ")
- Print y. find (u "I ")
- Z=Y. Replace (u "", u "")
- Print z. encode ("gb2312 ")
- Print "--------------- code 3 ----------------"
- Print y
It is a non-ASCII character, and let's refer to pep-0263. PEP-0263Python Enhancement Proposal) The above is very clear, Python is aware of the international problem, and proposed a solution. According to the requirements above, we have the following code:
- ------------- Code 1 ----------------
- A. I love you
- 5
- A. I like you
- -------------- Code 2 ----------------
- A. I love you
- 3
- A. I like you
- --------------- Code 3 ----------------
- Traceback (most recent call last ):
- File "G: \ Downloads \ eclipse \ workspace \ p \ src \ hello. py", line 16, in<Module>
- Print y
- UnicodeEncodeError: 'ascii 'codec can't encode characters in position 0-1: ordinal not in range (128)
We can see that by introducing the Python Chinese statement, we can normally use Chinese, and in code 1 and 2, the console can correctly print Chinese. However, it is obvious that the above Code also reflects many problems:
1. code 1 and 2 use different print methods. 1 is direct print, and 2 is encoded Before print.
2. In code 1 and 2, find the same character "I" in the same string and the results are different: 5 and 3)
3. An error occurs when unicode string y is directly printed in code 3. This is why code 2 must be encoded first)
- Introduction to Python system files
- How to correctly use Python Functions
- Detailed introduction and analysis of Python build tools
- Advantages of Python in PythonAndroid
- How to Use the Python module to parse the configuration file?