A summary of solutions to Python coding problems

Source: Internet
Author: User

Here is a summary of several cases that will lead to coding problems, and explain one by one ...


Case one: Chinese output is garbled?

    1. # Python Version: 2.7.6
    2. >>> string1 = "I love Fish C Studio"
    3. >>> string1
    4. ' \xe6\x88\x91\xe7\x88\xb1\xe9\xb1\xbcc\xe5\xb7\xa5\xe4\xbd\x9c\xe5\xae\xa4 '
    5. >>> Print string1
    6. I love Fish C Studio
    7. >>> string2 = "I Love FISHC"
    8. >>> string2
    9. ' I Love FISHC '


Q: Why can't I display Chinese strings directly?

Analysis:

Because the default encoding for the python2.x version is ASCII,ASCII, the default is to use only one byte to hold the data. Because Chinese characters are profound, one byte is not enough to store all the Chinese characters. Therefore, string1 can only print out the Chinese string in memory data, which is not an error.

Solution:

Use Python3, because Python3 uses UTF-8 encoding by default.

Extended Knowledge:

1. You can obtain the current default encoding in the following ways:

    1. >>> Import Sys
    2. >>> sys.getdefaultencoding ()
    3. ' ASCII '


2. Character set and character set encoding detailed


Case two: concatenation of ordinary strings and Unicode strings throws a Unicodedecodeerror exception

    1. >>> string = "I love" + U "FISHC"
    2. Traceback (most recent):
    3. File "<stdin>", line 1, in <module>
    4. Unicodedecodeerror: ' ASCII ' codec can ' t decode byte 0xe6 in position 0:ordinal not in range (128)


Analysis:

Use the + sign for string stitching, a normal string on the left, and a Unicode string on the right. When two types of strings are stitched together, Python automatically converts the Chinese string on the left to a Unicode string, and then the concatenation operation. But because "I love" ASCII encoding for ' \xe6\x88\x91\xe7\x88\xb1 ', where hexadecimal ' \xe6 ' corresponds to a value of 230. Unicode and ASCII are compatible when the encoded value is 0 to 127, and there is no problem with the conversion. However, ASCII encoding cannot be converted directly to Unicode when the value is greater than 128. Therefore, throw unicodedecodeerror.

Solution:

1. Using Python3

2. Specify the decoding method to convert to Unicode:

    1. >>> string = "I Love". Decode (' utf-8 ') + u "FISHC"
    2. >>> Print String
    3. I love FISHC.


3. Encode the Unicode string part:

    1. >>> string = "I love" + U "FISHC". Encode ("Utf-8")
    2. >>> Print String
    3. I love FISHC.


Extended Knowledge:

The invention of the Unicode encoding system is to unify the coding of the national characters, so it is called the universal code. Unicode sets a unique binary encoding representation for each language, meaning that the corresponding code can be found on Unicode regardless of the language of the country. Therefore, when different coding systems are converted to each other, Unicode can be used as an "intermediary".

The conversion process of other encoding systems to Unicode is called decoding (decode), and the process of converting Unicode to another encoding system is called encoding (encode). For example A encoding needs to be converted to B encoding, the process is as follows:

Encode (b), Unicode, Decode (a), a-coded


Case three: File encoding differs from Python encoding

The test.txt content is as follows and is saved as GB2312 encoding:

    1. I love fish c studio, really!


test.py content is as follows:

    1. F1 = open ("Test.txt")
    2. Print (F1.read ())
    3. F1.close


When the code executes, it will error:

    1. >>>
    2. Traceback (most recent):
    3. File "/users/fishc/documents/python/test.py", line 4, <module>
    4. Print (F1.read ())
    5. File "/library/frameworks/python.framework/versions/3.4/lib/python3.4/encodings/ascii.py", line +, in decode
    6. return Codecs.ascii_decode (input, self.errors) [0]
    7. Unicodedecodeerror: ' ASCII ' codec can ' t decode byte 0xce in position 0:ordinal not in range (128)


Analysis:

If the front of the content can be understood, then solve such a coding problem is no longer difficult to live you ~ ~ ~

The encoding format for opening files using open depends on the system (which can be obtained through locale.getpreferredencoding), reading the wrong information carefully, and the system uses ASCII to decode the contents of the file, encountering errors ... Because we know that the file format is GB2312, we only need to set encoding= "gb2312" when opening the file to solve the problem:

    1. F1 = open ("Test.txt", encoding= "gb2312")
    2. Print (F1.read ())
    3. F1.close

A summary of solutions to Python coding problems

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.