In Python processing Chinese is often used in unicode, because it is easier to encounter the problem of string encoding, I generally turn the string into Unicode to handle
A Unicode string is defined in Python and can be preceded by a string of u:
Str=u"helloworld"
In python, you define a string that is not escaped, and you can precede the string with R:
Path=r"c:\programfile\test"
Decoding converts other string formats to Unicode:
Ret=str.decode ("gb2312") ret=str.decode ("ASCII" ) ret=str.decode ("utf-8")
Encoding converts Unicode characters to other string formats:
ret=str.encode ("gb2312") ret=str.encode ("ASCII") ret= Str.encode ("utf-8")
CHARDEF determines what encoding format the string is:
encode = chardef.detect (str)print encode['encoding']
String Formatting of%s
Print " test for%s, value is%d "% ("format", 123)
Usually at the beginning of the py file is added #encoding=utf-8, to avoid the file is garbled in Chinese
The most important thing in dealing with string problems is knowing the format of the string input, processing the string at the time of the input, and handling the process Well.
Python basic----string Unicode