Today in the production of bulk generation of ID number to create the data, there is a problem, one is the report can not be converted to int, after the search, found to be utf-8bom head problem.
What is a BOM?
In the Utf-8 encoded file, the BOM is in the file header, occupies three bytes, which is used to indicate that the file belongs to Utf-8 encoding. In fact, UTF-8 BOM to UFT-8 has no effect, is to support utf-16,utf-32 to add the Bom,bom signature means to tell the editor what encoding the current file, easy to identify the editor, but although the BOM is not displayed in the editor, but will produce output, It's like a blank line.
Software such as Windows-brought Notepad, when saving a UTF-8 encoded file, inserts three invisible characters (0xEF 0xBB 0xBF, or BOM) where the file begins. It is a string of hidden characters that allows editors such as Notepad to identify whether the file is encoded in UTF-8.
When reading TXT, once read the BOM header will be an error.
The processing method is as follows:
Import"R") as file: = file.read () "" ) = Data.split ('\ n') file.closed
How Python removes the BOM header