Python coding issues when working with Chinese files, especially Utf-8 and GBK

Source: Internet
Author: User

Encoding of Python code files

The py file is ASCII encoded by default, and Chinese will make an ASCII-to-system-default-encoding conversion when displayed, and an error will occur: Syntaxerror:non-ascii character. You need to add an encoding indication on the first or second line of the code file:

    1. # coding=utf-8 ##以utf-8编码储存中文字符
    2. print ' Chinese ' as above directly input string is processed according to code file encoding, if Unicode encoding, there are the following 2 ways:
      1. S1 = U ' Chinese ' #u表示用unicode编码方式储存信息
      2. S2 = Unicode (' Chinese ', ' GBK ')

Unicode is a built-in function, and the second parameter indicates the encoding format of the source string.

Decode is any string that has a method that converts a string into Unicode format, and the parameter indicates the encoding format of the source string.

Encode is also a method of any string that converts a string into the format specified by the parameter.

Encoding of the Python string

The Unicode type is constructed with U ' kanji ', so it is not necessary to construct the STR type.

The coding of STR is related to the system environment, which is generally the value obtained by sys.getfilesystemencoding ().

So to go from Unicode to STR, use the Encode method

Turn Unicode from STR, so use decode

For example:

# coding=utf-8   #默认编码格式为utf-8= u' Chinese '#unicode编码的文字print s.  Encode(' utf-8 ')print#效果与上面相同, appears to be converted directly to the specified encoding by default       

My summary:

U=U Unicode encoded text ' g=u.< Span class= "PLN" >encode ( ' GBK ' )  #转换为gbk格式 print g # This is garbled, because the current environment is UTF-8,GBK encoded text garbled str=g.< Span class= "PLN" >decode ( ' GBK '  Encode ( ' utf-8 '  # Read g in GBK encoded format (because he is GBK encoded) and convert to utf-8 format output print str  #正常显示中文               

Secure method:

Because the Decode function prototype is decode([encoding], [errors=‘strict‘]) , you can use the second parameter to control the error handling policy, the default parameter is strict, which represents an exception when encountering illegal characters;

If set to ignore, illegal characters are ignored;
If set to replace, it will replace illegal characters;
If set to Xmlcharrefreplace, the character reference of the XML is used.

Python coding issues when working with Chinese files, especially Utf-8 and GBK

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.