This problem occurs in the project, Program You need to use ADODB. the UTF-8 CSV file generated by stream is imported into the MySQL database. After the load data infile command is used to import the database, the first column of data in the first row of the imported table always has a question mark, before the import, you can use a text editor to open the file. After investigation, it is found that the file starts with three invisible characters, in the hexadecimal mode of ultraedit, we can see that it is ef bb bf three bytes (note that the correct format of the UTF-8 file can be seen only when the UE version is higher, at first, I used ue 10.2 to see two bytes of fffe, which was confused by the old version. Later, we can see the correct format in ue 14.2 .), After checking some information, we can see that this is the BOM information of the UTF-8 file. The UTF-8 file is divided into two types: BOM and non-bom, ADODB. the file generated by stream has a Bom, and these three characters are inserted into the table as common characters during Database Import, resulting in unwanted error results.
So I tried to solve this problem. Because of the UTF-8 format file, its storage sequence is consistent with the encoding sequence, therefore, the difference between BOM and BOM-free format is that ef bb bf is more than three bytes. If we read the file in binary mode, we can skip the first three bytes to remove this Bom. Perform operations in binary mode Composition You can only use the ADODB. Stream object, and write a simple test example as follows:
<textarea cols="50" rows="15" name="code" class="vb:nogutter">Set STM = Createobject ("ADODB. stream ") <br/> STM. type = 1 <br/> STM. open <br/> STM. loadfromfile "E:/tmp/apachelog.txt" <br/> STM. position = 3 <br/> bytearr = STM. read (-1) <br/> STM. close <br/> STM. open <br/> STM. write bytearr <br/> STM. savetofile "E:/tmp/apachelog22.txt", 2 <br/> STM. flush <br/> STM. close</textarea>
Http://blog.csdn.net/zmxj/archive/2009/02/27/3943742.aspx
When operating binary files, note that the type attribute must be set to 1, which is the binary method. Otherwise, the read method will fail. The value of the position attribute is set to 3, that is, the first three bytes are not read. The read method returns a byte array. to verify the content of the first three bytes, you do not need to set the position attribute first, when you break a breakpoint, you can see the content in the array. The first three bytes of ef bb bf are decimal 239,187,191. In this example, apachelog22.txt is the operated file. You can open the two files in hexadecimal mode to check whether they are correct.
For more information about the UTF-8 Bom, see:
Http://www.ruanyifeng.com/blog/2007/10/ascii_unicode_and_utf-8.html
Http://www.paulsadowski.com/WSH/getremotebinaryfile.htm