After PHP reads a CSV file, uft8 BOM causes the problem resolution to appear on the page _php tips

Source: Internet
Author: User

Date.csv:
"ID" "NAME" "EMAIL"
"1" "Xiaoming" "xm@163.com"
"2" "Small East" "Xd@sina.com"
"3" "Little Less" "shaozi@hotmai.com"

Read this CSV file

Copy Code code as follows:

<?php
$handle =fopen (' date.csv ', ' R ');
while ($data =fgetcsv ($handle, 10000, "/T"))
{
echo "$data [0]". $data [1] "." $data [2] ";
}
?>

when it is displayed on the page after reading, it becomes this way:
"ID" NAME EMAIL
1 Xiao Ming xm@163.com
2 small East Xd@sina.com
3 Little Shaozi@hotmai.com
The field wrap character for the Fgetcsv function is in double quotes by default.
Why are all the other fields fine when I read them, but the IDs are wrapped in double quotes?

Check the Internet, the original is UTF8 code of the BOM is not recognized under the PHP.
here is the information to be found:
There is a concept of a BOM in the Unicode specification. Bom--byte order mark, is the byte sequence mark. In
Over here
find a description of the BOM:
In the UCS code there is a character called ZERO WIDTH No-break Space, and its encoding is Feff. Fffe is not present in UCS, so it should not appear in the actual transmission. UCS specification recommended that we transfer the byte stream before the transmission of the character "ZERO WIDTH no-break space." This means that if the recipient receives the Feff, the byte stream is Big-endian, and if Fffe is received, it indicates that the byte stream is Little-endian. Therefore, the character "ZERO WIDTH No-break Space" is also called the BOM.

UTF-8 does not require a BOM to indicate byte order, but you can use a BOM to indicate how the encoding is encoded. The character "ZERO WIDTH no-break Space" UTF-8 code is the EF BB BF. So if the receiver receives the byte stream at the beginning of the EF BB BF, it will know that this is UTF-8 code.
Windows uses a BOM to mark the encoding of a text file.

In addition, the Unicode Web site
Faq-bom
The BOM is introduced in detail. The official nature of authority, but English, looks more laborious.
In UTF-8 encoded files, the BOM accounts for three bytes. If you use Notepad to save a text file as UTF-8 encoding, open the file with UE, switch to hexadecimal edit state to see the beginning of the Fffe. This is a good way to identify UTF-8 encoded files, the software through the BOM to identify whether the file is UTF-8 code, many software also requires that the document must be read into the BOM. However, there are still a lot of software can not identify the BOM. When I studied Firefox, I knew that in the early versions of Firefox, there was no BOM for extensions, but the Firefox 1.5 version has already started supporting the BOM. It is now found that PHP does not support BOM.

PHP does not consider the issue of the BOM at design time, that is, he will not ignore the three characters of the BOM at the beginning of the UTF-8 encoded file. Because you must convert->utf-8 to ASCII, or select ASCII encoding in the Save As. If it is a DOS-formatted end-of-line character, you can open it with Notepad, save the point as, and select the ASCII encoding. If you include Chinese characters, you can use the UE of the Save As function, select "UTF-8 no BOM" can be. Please refer to the following picture:


According to Bo-blog's wiki description: EditPlus need to save as GB first, and then save as UTF-8. Be careful, however, that all characters that are not included in the GBK code are lost. If there are some non-Chinese characters in the file, or do not use this method. (from this point of view, ue--ultraedite-32 is indeed much better than EditPlus, EditPlus is too lightweight)

In addition, I found a way to use the file editor provided by WordPress. This approach is unrestricted, do not need to download a special editor, after all, we are using WordPress. First in the FTP to edit the file write permission to open, and then into the WordPress background-> management-> file Editor, enter the path to edit the file, point edit file. In the display of the editing interface, you can not see the beginning of the three characters, but it does not matter, positioning the cursor in the entire file before the first character, click the Backspace key. OK, click to update the file, in the FTP refresh, you can see the file small 3 bytes, finished.

Finally, this is a big problem, all to write their own plug-ins, edit other people's plug-ins for their own use, need to modify the template (this estimate everyone needs it), it is best to understand the above knowledge, lest there is a problem when overwhelmed.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.