>>> Teststr = 'my eclipse cannot correctly decode gbk! '
>>> Teststr
'\ Xe6 \ x88 \ x91 \ xe7 \ x9a \ x84eclipse \ xe4 \ xb8 \ x8d \ xe8 \ x83 \ xbd \ xe6 \ xad \ xa3 \ xe7 \ xa1 \ xae \ xe7 \ x9a \ x84 \ xe8 \ xa7 \ xa3 \ xe7 \ xa0 \ x81gbk \ xe7 \ xa0 \ x81 \ xef \ xbc \ x81'
>>> Tests2 = U' my eclipse cannot correctly decode gbk! '
>>> Test3 = tests2.encode ('gb2312 ')
>>> Test3
'\ Xce \ xd2 \ xb5 \ xc4eclipse \ xb2 \ xbb \ xc4 \ xdc \ xd5 \ xfd \ xc8 \ xb7 \ xb5 \ xc4 \ xbd \ xe2 \ xc2 \ xebgbk \ xc2 \ xeb \ xa3 \ xa1'
>>> Test3
'\ Xce \ xd2 \ xb5 \ xc4eclipse \ xb2 \ xbb \ xc4 \ xdc \ xd5 \ xfd \ xc8 \ xb7 \ xb5 \ xc4 \ xbd \ xe2 \ xc2 \ xebgbk \ xc2 \ xeb \ xa3 \ xa1'
>>> Teststr
'\ Xe6 \ x88 \ x91 \ xe7 \ x9a \ x84eclipse \ xe4 \ xb8 \ x8d \ xe8 \ x83 \ xbd \ xe6 \ xad \ xa3 \ xe7 \ xa1 \ xae \ xe7 \ x9a \ x84 \ xe8 \ xa7 \ xa3 \ xe7 \ xa0 \ x81gbk \ xe7 \ xa0 \ x81 \ xef \ xbc \ x81'
>>> Test3.decode ('gb2312'). encode ('utf-8 ')
'\ Xe6 \ x88 \ x91 \ xe7 \ x9a \ x84eclipse \ xe4 \ xb8 \ x8d \ xe8 \ x83 \ xbd \ xe6 \ xad \ xa3 \ xe7 \ xa1 \ xae \ xe7 \ x9a \ x84 \ xe8 \ xa7 \ xa3 \ xe7 \ xa0 \ x81gbk \ xe7 \ xa0 \ x81 \ xef \ xbc \ x81'
>>> Test3.decode ('gb2312'). encode ('utf-8') = teststr
True
As shown in the preceding figure, the test3 variable (gb2312 encoding) is decoded (converted to a unicode string) and then UTF-8 encoded to form a string with the same value as teststr.
Through the above example, we also found that the unicode string is a bridge between the gb2312 string (which is used in windows) and the UTF-8 string (used in python itself.