UnicodeDecodeError: 'utf-8 'codec can' t decode byte 0xce in position 52: invalid continuation byte, continuation
Code:
df_w = pd.read_table( r'C:\Users\lab\Desktop\web_list_n.txt', sep=',', header=None)
This error occurs when I use the read_table method of pandas to read local files:
UnicodeDecodeError: 'utf-8 'codec can' t decode byte 0xce in position 52: invalid continuation byte
Query and find the two most common and stubborn errors in Python encoding:
UnicodeEncodeError: 'ascii 'codec can't encode characters in position 0-1: ordinal not in range (128)
UnicodeDecodeError: 'utf-8 'codec can' t decode bytes in position 0-1: invalid continuation byte
That is, the encoding and decoding problem. My error is that 'utf-8' cannot decode the byte at location 52 (0xce), that is, the byte exceeds the UTF-8 representation range, for more information about encoding and decoding, see https://segmentfault.com/a/1190000004625718
Solution:
df_w = pd.read_table( r'C:\Users\lab\Desktop\web_list_n.txt', encoding='ISO-8859-1', sep=',', header=None)
That is, when reading data, explicitly add the encoding method encoding = 'iso-8859-1 '. You can also try other encoding methods.