Difference between UTF-8 format encoding and UTF-8 no BOM format encoding (including Java files) __ encoding

Source: Internet
Author: User
Tags blank page
Bom--byte order mark, is the byte sequence mark

In the UCS code there is a character called ZERO WIDTH No-break Space, and its encoding is Feff. Fffe is not present in UCS, so it should not appear in the actual transmission. UCS specification recommended that we transfer the byte stream before the transmission of the character "ZERO WIDTH no-break space." This means that if the recipient receives the Feff, the byte stream is Big-endian, and if Fffe is received, it indicates that the byte stream is Little-endian. Therefore, the character "ZERO WIDTH No-break Space" is also called the BOM.

UTF-8 does not require a BOM to indicate byte order, but you can use a BOM to indicate how the encoding is encoded. The character "ZERO WIDTH no-break Space" UTF-8 code is the EF BB BF. So if the receiver receives the byte stream at the beginning of the EF BB BF, it will know that this is UTF-8 code.

In UTF-8 encoded files, the BOM accounts for three bytes. If you use Notepad to save a text file as UTF-8 encoding, open the file with UE, switch to hexadecimal edit state to see the beginning of the Fffe. This is a good way to identify UTF-8 encoded files, the software through the BOM to identify whether the file is UTF-8 code, many software also requires that the document must be read into the BOM. However, there are still a lot of software can not identify the BOM.

In the early versions of Firefox, the extensions were not BOM, but the Firefox 1.5 version has already started supporting the BOM. It is now found that PHP does not support BOM. PHP does not consider the issue of the BOM at design time, that is, he will not ignore the three characters of the BOM at the beginning of the UTF-8 encoded file.

Because it has to be seen on the Bo-blog wiki, the same bo-blog that uses PHP is plagued by the BOM. One of the other problems mentioned was that "the cookie-delivery mechanism limits the cookie from being sent out in a file with a BOM at the beginning of the file (because PHP sent the file header before the cookie was sent out), so the login and logout function failed." All the functionality that relies on cookies and session implementations is invalid. "This should be the reason for a blank page in WordPress backstage, because any file that is executed contains a BOM, and these three characters will be sent out, resulting in a failure to rely on cookies and session functions."

The solution is to save the file as an ASCII code if it contains only English characters (or ASCII code). With the UE and other editors, the dot file-> convert->utf-8 to ASCII, or select the ASCII encoding in the Save As. If it is a DOS-formatted end-of-line character, you can open it with Notepad, save the point as, and select the ASCII encoding. If you include Chinese characters, you can use the UE of the Save As function, select "UTF-8 no BOM" can be.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.