How to remove BOM header for UTF-8 coding?

Source: Internet
Author: User
Tags ord php file php code versions ultraedit


The Unicode specification has the concept of BOM. BOM -- Byte Order Mark, which is a Mark of the Byte Order. Here is a description of BOM:

There is a character named "zero width no-break space" in the UCS encoding, and its encoding is FEFF. FFFE does not exist in the UCS, so it should not appear in actual transmission. We recommend that you transmit the character "zero width no-break space" before transmitting the byte stream in the UCS specification ". In this way, if the receiver receives FEFF, it indicates that the byte stream is Big-Endian; if it receives FFFE, it indicates that the byte stream is Little-Endian. Therefore, the character "zero width no-break space" is also called BOM.

The UTF-8 does not need BOM to indicate the byte order, but BOM can be used to indicate the encoding method. The UTF-8 code for the character "zero width no-break space" is ef bb bf. So if the receiver receives a byte stream starting with ef bb bf, it will know that this is UTF-8 encoding.

Windows uses BOM to mark the encoding of text files.

In addition, the unicode website FAQ-BOM detailed BOM. The natural authority of the official website is only in English, and it looks hard.

In an UTF-8 encoded file, BOM occupies three bytes. If you use notepad to save a text file as a UTF-8 encoding method, open the file with UE, switch to the hexadecimal editing status, you can see the open FFFE. This is a good way to identify the UTF-8 encoding file, the software through BOM to identify whether the file is UTF-8 encoding, many software also requires that the file to be read must carry BOM. Yes, there are still a lot of software that cannot recognize BOM. When I was studying Firefox, I knew that in earlier versions of Firefox, BOM was not available for extensions, but later versions of Firefox 1.5 began to support BOM. Now, PHP does not support BOM.

PHP did not consider the BOM issue during design, that is, he would not ignore the three characters at the beginning of the BOM in a UTF-8-encoded file. Because it must be in the <? Or <? The code after php will be executed as PHP code, so these three characters will be directly lost. If you encounter problems such as header (), session (), and cookie (), it may cause garbled code or white screen display.


Remove BOM header?

The easiest way to remove the BOM header is to use software such as editplus or ultraedit. The details are as follows:

1. Use editplus to remove the BOM header
 
After the editor is adjusted to the UTF8 encoding format, a hidden character (BOM) is added before the saved file, which is used by the editor to identify whether the file is UTF-8 encoded.
Run Editplus, click the tool, select preference, select the file, select the UTF-8 ID always delete the signature, and then the php file after editing and saving the php file is without BOM.


2. Use ultraedit to remove the BOM header

After opening the file, select from the encoding format of the save as option (UTF-8 without BOM header), OK


3. Remove all file BOM headers in batches using php

The code is as follows: Copy code

<? Php

$ Auto = 1;
Checkdir ('C: projectweibo ');
Function checkdir ($ basedir ){
If ($ dh = opendir ($ basedir )){
While ($ file = readdir ($ dh ))! = False ){
If ($ file {0} = '.')
   {
Continue;
   }
If ($ file! = '.' & $ File! = '..'){
If (! Is_dir ($ basedir. "/". $ file )){
Echo "filename: $ basedir/$ file". checkBOM ("$ basedir/$ file"). "<br> ";
} Else {

$ Dirname = $ basedir. "/". $ file;
Checkdir ($ dirname );
    }
   }
  }
Closedir ($ dh );
}
}
Function checkBOM ($ filename ){
Global $ auto;
$ Contents = file_get_contents ($ filename );
$ Charset [1] = substr ($ contents, 0, 1 );
$ Charset [2] = substr ($ contents, 1, 1 );
$ Charset [3] = substr ($ contents, 2, 1 );
If (ord ($ charset [1]) = 239 & ord ($ charset [2]) = 187 & ord ($ charset [3]) = 191) {
If ($ auto = 1 ){
$ Rest = substr ($ contents, 3 );
Rewrite ($ filename, $ rest );
Return ("<font color = red> BOM found, automatically removed. </font> ");
} Else {
Return ("<font color = red> BOM found. </font> ");
  }
}
Else return ("BOM Not Found .");
}
Function rewrite ($ filename, $ data ){
$ Filenum = fopen ($ filename, "w ");
Flock ($ filenum, LOCK_EX );
Fwrite ($ filenum, $ data );
Fclose ($ filenum );
}
?>

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.