Unicode signature BOM (byte order mark) Problem for UTF-8 files

Source: Internet
Author: User
Tags ultraedit zen cart

From http://blog.csdn.net/thimin/archive/2007/08/03/1724393.aspx

 

Recently, when I tested a utf8-Encoded chinese Zen cart website, I encountered a strange problem. The text displayed on the webpage was normal. I checked the source file with IE (opened in Notepad) but found garbled characters. Firefox didn't solve this problem. After multi-party verification and multiple tests on the Internet, this problem is solved, in fact, it is the problem of Unicode signature BOM (byte order mark) in the UTF-8 file.

BOM (byte order mark) is the standard mark used in the UTF Encoding scheme to mark the encoding. In the UTF-16, It is FF Fe, and the UTF-8 becomes ef bb bf. This flag is optional because utf8 bytes are not sequential, so it can be used to detect whether a byte stream is UTF-8 encoded. Microsoft does this kind of detection, but some software does not do this kind of detection, and treats it as a normal character.

Microsoft added ef bb bf three bytes before its own text file in UTF-8 format, Notepad on Windows, etc.ProgramIt is based on the three bytes to determine whether a text file is ascii or UTF-8, but this is only a Microsoft mark, other platforms do not make such a mark on UTF-8 text files.

That is to say, a UTF-8 file may have Bom, there may be no Bom, so how to distinguish? Three methods. 1, open the file with a UltraEdit-32, switch to the hexadecimal editing mode, check whether the file header ef bb bf. 2. Open it with Dreamweaver and check the page properties to see if there is a check mark before "including Unicode signature Bom. 3, open with Windows notepad, select "Save as", see the default file encoding is UTF-8 or ANSI, if it is ANSI without Bom.

I found in the template file of Zen cart html_header.php, found that the file does not carry Bom, with the UltraEdit-32 to save the way to add Bom, then upload html_header.php, everything is normal.

Note that the default setting does not contain BOM when convertz is used to convert a gb2312 file to a UTF-8 file. If Bom is not included, garbled characters may occur. However, if Bom is included, you must be careful about the include file in PHP. ef bb bf will be added before the PHP byte stream, early output to the monitor may cause program errors. A solution is to save all included files as ANSI, and the main file can be a UTF-8. To remove the BOM from a file, use ulteredit to open the file, switch to the hexadecimal editing mode, and replace the first three bytes (that is, the damn ef bb bf) with 20, save (note that the automatic backup function is disabled during storage), switch to the default editing mode, and remove the first three spaces.

In addition, I also learned a little bit of coding knowledge: the so-called Unicode file is actually a UTF-16, but it just happens to be the same as the Unicode code, but in terms of concept, Unicode and UTF are two different things, unicode is a memory encoding representation scheme, while UTF is a solution for saving and transmitting Unicode. UTF-16 is also divided into two types: the top (LE) and the top (be. The official UTF Code also includes utf-32, which can be Le and be. Non-Unicode official UTF Encoding and utf-7, mainly used for mail transfer. The single-byte part of UTF-8 is compatible with the iso-8859-1, which is primarily forced out of some old systems and library functions that cannot properly handle the UTF-16, and for English characters, it also saves storage space (at the cost of non-English characters wasting space ). In the iso-8859-1, both utf8 and iso-8859-1 are represented in one byte, and when it represents other characters, UTF-8 uses two or three bytes.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.