Encoding of Python code files
The py file is ASCII encoded by default, and Chinese will make an ASCII-to-system-default-encoding conversion when displayed, and an error will occur: Syntaxerror:non-ascii character. You need to add an encoding indication on the first or second line of the code file:
# coding=utf-8 ##以utf-8编码储存中文字符
- print ' Chinese ' as above directly input string is processed according to code file encoding, if Unicode encoding, there are the following 2 ways:
- S1 = U ' Chinese ' #u表示用unicode编码方式储存信息
- S2 = Unicode (' Chinese ', ' GBK ')
Unicode is a built-in function, and the second parameter indicates the encoding format of the source string.
Decode is any string that has a method that converts a string into Unicode format, and the parameter indicates the encoding format of the source string.
Encode is also a method of any string that converts a string into the format specified by the parameter.
Encoding of the Python string
The Unicode type is constructed with U ' kanji ', so it is not necessary to construct the STR type.
The coding of STR is related to the system environment, which is generally the value obtained by sys.getfilesystemencoding ().
So to go from Unicode to STR, use the Encode method
Turn Unicode from STR, so use decode
For example:
# coding=utf-8 #默认编码格式为utf-8= u' Chinese '#unicode编码的文字print s. Encode(' utf-8 ')print#效果与上面相同, appears to be converted directly to the specified encoding by default
My summary:
U=U Unicode encoded text ' g=u.< Span class= "PLN" >encode ( ' GBK ' ) #转换为gbk格式 print g # This is garbled, because the current environment is UTF-8,GBK encoded text garbled str=g.< Span class= "PLN" >decode ( ' GBK ' Encode ( ' utf-8 ' # Read g in GBK encoded format (because he is GBK encoded) and convert to utf-8 format output print str #正常显示中文
Secure method:
Because the Decode function prototype is decode([encoding], [errors=‘strict‘])
, you can use the second parameter to control the error handling policy, the default parameter is strict, which represents an exception when encountering illegal characters;
If set to ignore, illegal characters are ignored;
If set to replace, it will replace illegal characters;
If set to Xmlcharrefreplace, the character reference of the XML is used.
Python coding issues when working with Chinese files, especially Utf-8 and GBK