One, TXT file saved in utf-8 format, etc., how to remove BOM header
The TXT file of the Windows system is saved by default at the beginning of the file by inserting three invisible characters (0xEF 0xBB 0xBF), called the BOM header, which are already defined as constants in the Python codecs library (codecs.bom_. Utf-8). UTF8)
Method One: Utf8temp.txt save, select Utf-8 Save
1 ImportCodecs2Data=open ("Utf8temp.txt",'R', encoding='Utf-8'). Read ()3Data=data.encode (encoding='Utf-8')4 Print(data)5 #print ("Chinese". Encode (encoding= ' utf-8 '))6 Print(len (data))7 ifdata[:3]==codecs. Bom_utf8:8Data=data[3:]9 Print(Data.decode (encoding='Utf-8'))
The output is as follows:
B ' \xef\xbb\xbf\xe4\xb8\xad\xe6\x96\x87 '
9
Chinese
Method Two:
1With open ("./temp.txt","R", encoding='Utf-8') as F:2Flag=13 forLineinchF:4 ifFlag==1:5Line=line[1:]#Remove the TXT format utf-8 will be inserted at the beginning of the first line of the file6 #three invisible characters (0xEF 0xBB 0xBF)--('. ') )7flag=08 Print(line)
Method Three: Directly with the third-party software to remove the BOM header, open TXT file with notepad++, select encoding ... without BOM, click Save
Python Basics: Character encoding issues three invisible characters (0xEF 0xBB 0xBF, i.e. BOM)