Python string encoding in Windows
The Python language actually has three types of strings, namely str, Unicode, and abstract class basestring. The basestring cannot be instantiated.
In Windows, enter the following code for the CPython Interpreter:
>>> St1 = 'Chinese' >>> st1 'öð Î ä>>> type (st1)
>>> St2 = st1.decode ('gbk') >>> st2u 'Chinese' >>> type (st2)
>>> St3 = st2.encode ('utf-8') >>> st3 'tää ?? '>>> Type (st3)
>>> St4 = st2.encode ('gbk') >>>> st4 'ö~~ä>>> type (st4)
If you add a sentence to the Code:
>>> st5 = st1.decode('utf-8')
An error is reported.
From the above Code and output, we can draw the following conclusions:
1. The default Chinese encoding format for Windows command line input is gbk, and the input Chinese string type is str
2. Use the decode function to convert a Chinese string of the str type to the unicode type.
3. Use the encode function to convert a Chinese string of the unicode type to the str type.
Therefore, when writing a Python script, you need to add the following code at the beginning of the script:
#-*- coding:utf-8 -*-
You can also import the module sys and set the default encoding format:
import syssys.setdefaultencoding('utf-8')
PS: Python version: python2.7