After php reads the csv file, the uft8bom displays the correct solution on the page.

Source: Internet
Author: User
The following is a detailed analysis of the solution to the problem that uft8bom shows on the page after php reads the csv file. if you need a friend, refer Date.csv:
"ID" "NAME" "EMAIL"
"1" "James" "xm@163.com"
"2" "Xiaodong" "xd@sina.com"
"3" "xiao shao" "shaozi@hotmai.com"

Read this csv file

The code is as follows:


Export handleappsfopen('date.csv ', 'r ');
While ($ data = fgetcsv ($ handle, 10000, "/t "))
{
Echo "$ data [0]". "$ data [1]". "$ data [2]";
}
?>


When it is displayed on the page after reading, it becomes as follows:
"ID" NAME EMAIL
1 James xm@163.com
2 little East xd@sina.com
3 Small less shaozi@hotmai.com
The field surround character of the fgetcsv function is double quotation marks by default,
Why are other fields well read by me, but the ID is enclosed by double quotation marks?

I checked it online. it turns out that the UTF-8 encoded bom cannot be identified in php.
The following information is found:
The Unicode specification has the concept of BOM. BOM -- Byte Order Mark, which is a Mark of the Byte Order. In
Here
Find a description about BOM:
There is a character named "zero width no-break space" in the UCS encoding, and its encoding is FEFF. FFFE does not exist in the UCS, so it should not appear in actual transmission. We recommend that you transmit the character "zero width no-break space" before transmitting the byte stream in the UCS specification ". In this way, if the receiver receives FEFF, it indicates that the byte stream is Big-Endian; if it receives FFFE, it indicates that the byte stream is Little-Endian. Therefore, the character "zero width no-break space" is also called BOM.

The UTF-8 does not need BOM to indicate the byte order, but BOM can be used to indicate the encoding method. The UTF-8 code for the character "zero width no-break space" is ef bb bf. So if the receiver receives a byte stream starting with ef bb bf, it will know that this is UTF-8 encoding.
Windows uses BOM to mark the encoding of text files.

In addition
FAQ-BOM
The BOM is described in detail. The natural authority of the official website is only in English, and it looks hard.
BOM occupies three bytes of the UTF-8-encoded file. If you use notepad to save a text file as a UTF-8 encoding method, open the file with UE, switch to the hexadecimal editing status, you can see the beginning of FFFE. This is a good way to identify the UTF-8 encoding file, the software through BOM to identify whether the file is UTF-8 encoding, many software also requires that the file to be read must carry BOM. However, there are still a lot of software that cannot recognize BOM. When I was studying Firefox, I knew that in earlier versions of Firefox, BOM was not available for extensions, but later versions of Firefox 1.5 began to support BOM. Now, PHP does not support BOM.

PHP did not consider the BOM issue during design, that is, he would not ignore the three characters at the beginning of the BOM in a UTF-8-encoded file. Because you must either convert to ASCII in the convert-> UTF-8, or select ASCII encoding in the Save. If it is a line tail character in DOS format, you can open it in Notepad, click save as, and select ASCII encoding. If it contains Chinese characters, you can use the save as function of UE, select "UTF-8 without BOM. See the following picture:


According to the Bo-Blog wiki Description: Editplus needs to be saved as gb first, then saved as UTF-8. But be careful when doing this. all characters not included in the GBK encoding will be lost. If there are some non-Chinese characters in the file, do not use this method. (From this small point of view, UE-UltraEdite-32 is indeed much better than Editplus, Editplus is too lightweight)

In addition, I found a way to use the file editor provided by Wordpress. This method is unrestricted and you don't need to download a dedicated editor. after all, everyone is using Wordpress. First open the write permission for the file to be edited in ftp, then go to the Wordpress background-> Management-> File Editor, enter the path of the file to be edited, and click Edit file. On the displayed editing page, you cannot see the three characters at the beginning, but it does not matter. Place the cursor before the first character of the entire file and press the Backspace key. Click "update file" and refresh the file in ftp. the file size is 3 bytes smaller.

Finally, this is a big problem. all plug-ins that need to be written by themselves and edited by others' plug-ins need to be modified (this article is estimated to be required by everyone ), it is best to understand the above knowledge, so as not to be overwhelmed when problems arise.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.