Unicode signature BOM (byte order mark) Problem for UTF-8 files

Source: Internet
Author: User
Tags ultraedit
Recently, a Chinese Zen encoded by utf8 is being tested.
The cart website encountered a strange problem. The text displayed on the webpage was normal. I checked the source file (opened in Notepad) with IE but found garbled characters. Firefox didn't have this problem. After multi-party verification and multiple tests on the Internet, this problem is solved, in fact, it is the Unicode signature BOM (byte
Order mark.
BOM (byte order
Mark), is the standard mark used to mark the encoding in the UTF Encoding scheme, in the UTF-16 is FF
Fe, into the UTF-8 becomes EF bb
BF. This flag is optional because utf8 bytes are not sequential, so it can be used to detect whether a byte stream is UTF-8 encoded. Microsoft does this kind of detection, but some software does not do this kind of detection, and treats it as a normal character.
Microsoft added ef bb bf three bytes before its own text file in UTF-8 format,
The Notepad program on Windows is based on the three bytes to determine whether a text file is ascii or UTF-8,
However, this is only a mark by Microsoft,
Other platforms do not make such a mark on UTF-8 text files.
That is to say, a UTF-8 file may have Bom, there may be no Bom, so how to distinguish? Three methods. 1, open the file with a UltraEdit-32, switch to the hexadecimal editing mode, check whether the file header has ef
Bb
BF. 2. Open it with Dreamweaver and check the page properties to see if there is a check mark before "including Unicode signature Bom. 3. Open it in Windows notepad and select
"Save as" to see whether the default encoding of the file is UTF-8 or ANSI, if it is ANSI, it does not contain Bom.
I found zen
Cart template file in html_header.php, found that the file does not carry Bom, with the UltraEdit-32 to save the way to add Bom, and then upload html_header.php, everything is normal.
Note that the default setting does not contain BOM when convertz is used to convert a gb2312 file to a UTF-8 file. Without Bom, the above error may occur,
Bom. Be careful when using PHP Include files. ef bb will be added before the PHP byte stream.
BF, which may cause program errors when output to the monitor in advance. A solution is to save all included files as ANSI, and the main file can be a UTF-8. To remove an object
Bom, which can be opened using ulteredit,
Switch to the hexadecimal editing mode and put the first three bytes (that is, the damn EF bb
BF) Replace with 20, save (note that the automatic backup function is disabled during storage), switch to the default editing mode, and remove the first three spaces.
In addition, I also learned a little bit of coding knowledge: the so-called Unicode file is actually a UTF-16, but it just happens to be the same as the Unicode code, but in terms of concept, Unicode and UTF are two different things, unicode is a memory encoding representation scheme, while UTF is a solution for saving and transmitting Unicode. UTF-16 is still at the top
(LE) and high are in the back (be. The official UTF Code also includes utf-32, which can be Le and be. Non-Unicode official UTF Encoding also has utf-7, mainly used for mail Transmission. The single-byte part of UTF-8 is compatible with the iso-8859-1, which is primarily forced out of some old systems and library functions that cannot properly handle the UTF-16, and for English characters, it also saves storage space (at the cost of non-English characters wasting space ). In the iso-8859-1, both utf8 and iso-8859-1 are represented in one byte, and when it represents other characters, UTF-8 uses two or three bytes.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.