The same program runs normally in windows. If you read some files in Linux, the system prompts "UnicodeDecodeError: 'gbk' codec can't decode bytes in position 30664-30665: illegal multibyte sequence"
This is because of illegal characters-especially in some programs written in C/C ++, full-angle spaces often have different implementation methods, such as \ xa3 \ xa0, or \ xa4 \ x57. These characters are all full-width spaces, but they are not "valid" full-width spaces (the real full-width spaces are \ xa1 \ xa1 ), therefore, an exception occurs during transcoding.
The solution is as follows:
S. decode ('gbk', 'ignore'). encode ('utf-8 ')
Because the prototype of the decode function is decode ([encoding], [errors = 'strict ']), you can use the second parameter to control the error handling policy. The default parameter is strict, an exception is thrown when an invalid character is encountered;
If it is set to ignore, invalid characters are ignored;
If it is set to replace,? Replace invalid characters;
If it is set to xmlcharrefreplace, the XML character reference is used.
My solution is to directly ignore invalid characters:
S. decode ('gbk', 'ignore ')