Method for Detecting and deleting blank rows in page BOM (UTF-8)

Last Update:2013-12-31 Source: Internet

Author: User

Tags ultraedit

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

We often find some blank lines on the page for no reason, but in the editor we see that this we know is caused by BOM (UTF-8, below small make up to share with you several BOM (UTF-8) detection and deletion methods.

Is the HTML code that can be seen with firebug after the above-mentioned situation appears.

Figure 1

There is a blank line in it, but we don't see it in the source code.

My most common method is to use php to replace

BOM: Wanguo code file signature BOM (Byte Order Mark, U + FEFF)

The BOM content can indicate the UNICODE encoding, but after receiving the archive, You Need To disassemble it and write it into the database. Seeing the BOM, it seems a bit ooxx.

You can see two programs in utf8_encode to test writing/removing BOM.

Add the written file content to the BOM

The Code is as follows:	Copy code
<? Php Function writeUTF8File ($ filename, $ content) { $ F = fopen ($ filename, 'w '); Fwrite ($ f, pack ("CCC", 0xef, 0xbb, 0xbf )); Fwrite ($ f, $ content ); Fclose ($ f ); } ?>

Remove BOM function

The Code is as follows:	Copy code
<? Php Function removeBOM ($ str = '') { If (substr ($ str, 0,3) = pack ("CCC", 0xef, 0xbb, 0xbf )){ $ Str = substr ($ str, 3 ); } Return $ str; } ?>

Therefore, the above BOM = pack ("CCC", 0xef, 0xbb, 0xbf), so the method for removing BOM can use the above removeBOM function or one of the following:

■ Str_replace ("replace", '', $ bom_content );
■ Preg_replace ("/^ replace/", '', $ bom_content );
Also see to judge whether this string is a function of the UTF-8:

The Code is as follows:	Copy code
Function isUTF8 ($ string) { Return (utf8_encode (utf8_decode ($ string) ==$ string ); }

Use shell in linux

Before discussing in detail the problem of BOM detection and deletion in UTF-8 coding, we may try to warm up with an example:

The Code is as follows:	Copy code
Shell> curl-s http://www.bKjia. c0m/\| head-1 \| sed-n l When <! DOCTYPE html PUBLIC "-// W3C // dtd xhtml 1.0 Transitional // EN "" http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd "> $

As shown above, the first three bytes are 357, 273, and 277, which are the BOM of the octal component.

The Code is as follows:

Copy code

As shown above, the first three bytes are EF, BB, and BF, which are the hexadecimal BOM. Note: When a third-party website page is used, examples cannot be always available. In actual project development, may face hundreds of thousands of text files, if there are a few files mixed into the BOM, it is difficult to notice, if there is no UTF-8 text file with BOM, you can use vi to write several articles. The related commands are as follows:

Set UTF-8 encoding:

The Code is as follows:	Copy code
: Sets fileencoding = UTF-8

Add BOM:

The Code is as follows:	Copy code
: Set bomb

Delete BOM:

The Code is as follows:	Copy code
: Set nobomb

Query BOM:

The Code is as follows:	Copy code
: Set bomb?

How to check BOM in UTF-8 coding?

The Code is as follows:

Copy code

Shell> grep-r-I-l $ '^ records'/path how to delete BOM from UTF-8 encoding?

Shell> grep-r-I-l $ '^ rows'/path | xargs sed-I's/^ rows '//; Q'

Recommendation: If you use SVN, you can add relevant code to the pre-commit hook to prevent BOM.

The Code is as follows:

Copy code

#! /Bin/bash

REPOS = "$1"
TXN = "$2"

SVNLOOK =/usr/bin/svnlook

For FILE in $ ($ SVNLOOK changed-t "$ TXN" "$ REPOS" | awk '/^ [AU]/{print $ NF}'); do
If $ SVNLOOK cat-t "$ TXN" "$ REPOS" "$ FILE" | grep-q $ '^ then'; then
Echo "Byte Order Mark be found in $ FILE" 1> & 2
Exit 1
Fi
Done

Many shell commands are used in this article.

Method 3: Use the ultraedit editor to directly modify the document

Just save the empty line document in the BOM format.

Is the encoding format when ultraedit saves the document:

Figure 2

Select UTF8-No BOM in it to solve all problems

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More