Detect and Delete page BOM (UTF-8) blank line Method

Detect and Delete page BOM (UTF-8) blank line Method _php Tutorial

Last Update:2016-07-13 Source: Internet

Author: User

Tags ultraedit

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

We often found in the page in a few empty lines, but in the editor to see again, this we know is caused by the BOM (UTF-8), the following small series to share several about the BOM (UTF-8) detection and deletion method.

is the HTML code that you see with Firebug after the previous situation.

Figure 1

There is somehow more than a blank line, and we see the source code inside but not.

The most common way I use PHP is to replace

BOM: Universal code file Signature BOM (Byte Order Mark, U+feff)

The contents of the BOM can indicate which encoding UNICODE is, but in the received file, to be disassembled after the write to the DB, see the BOM is a bit ooxx.

In Utf8_encode see two programs can be tested to write/remove BOM.

The file content to be written is pre-added BOM

The code is as follows	Copy Code
function Writeutf8file ($filename, $content) { $f = fopen ($filename, ' w '); Fwrite ($f, Pack ("CCC", 0XEF,0XBB,0XBF)); Fwrite ($f, $content); Fclose ($f); } ?>

To remove a BOM function

The code is as follows	Copy Code
function Removebom ($str = ") { if (substr ($str, 0,3) = = Pack ("CCC", 0XEF,0XBB,0XBF)) { $str = substr ($str, 3); } return $str; } ?>

Thus the above BOM = Pack ("CCC", 0XEF,0XBB,0XBF), so the wording of the removal BOM can be used above the Removebom function or one of the following:

Str_replace ("Nobelium", "', $bom _content);
Preg_replace ("/^ nobelium/", ", $bom _content);
Also see to determine if this string is UTF-8 function:

The code is as follows	Copy Code
function IsUTF8 ($string) { Return (Utf8_encode (Utf8_decode ($string)) = = $string); }

Using the shell in a Linux system to solve

Before discussing the problem of BOM detection and deletion in UTF-8 code, it is advisable to warm up by an example:

The code is as follows	Copy Code
shell> Curl-s http://www.bKjia.c0m/\| head-1 \| Sed-n L Nobelium//en "" HTTP://WWW.W3.ORG/TR/XHTML1/DTD/XHTML1-TRANSITIONAL.DTD "> $

As shown above, the first three bytes are 357, 273, 277, which is the octal BOM.

The code is as follows

Copy Code

As shown above, the first three bytes are EF, BB, BF, which is the hexadecimal BOM. Note: The use of third-party web pages does not guarantee that examples are always available. Actually do project development, may face hundreds of text files, if there are several files mixed with the BOM, it is very difficult to detect, if there is no BOM with the UTF-8 text file, can be fabricated by VI several, related commands are as follows:

Set UTF-8 encoding:

The code is as follows	Copy Code
: Set Fileencoding=utf-8

To add a BOM:

The code is as follows	Copy Code
: Set Bomb

To delete a BOM:

The code is as follows	Copy Code
: Set Nobomb

Query BOM:

The code is as follows	Copy Code
: Set bomb?

How to detect the BOM in UTF-8 encoding?

The code is as follows

Copy Code

Shell> grep-r-i-l $ ' ^ nobelium '/path How do I remove a BOM from UTF-8 encoding?

Shell> grep-r-i-l $ ' ^ nobelium '/path | Xargs sed-i ' s/^ nobelium//;q '

Recommendation: If you use SVN, you can add the relevant code to the Pre-commit hook to eliminate the BOM.

The code is as follows

Copy Code

#!/bin/bash

Repos= "$"
Txn= "$"

Svnlook=/usr/bin/svnlook

For FILE in $ ($SVNLOOK changed-t "$TXN" "$REPOS" | awk '/^[au]/{print $NF} '); Do
If $SVNLOOK cat-t "$TXN" "$REPOS" "$FILE" | Grep-q $ ' ^ nobelium '; Then
echo "Byte Order Mark is found in $FILE" 1>&2
Exit 1
Fi
Done

This article uses a lot of shell commands

Method Three, modify the document directly using the UltraEdit editor

Save the document that appears blank line without the BOM format.

is the encoding format when UltraEdit saves a document:

Figure 2

Select the inside of the utf8-without BOM, all resolved

http://www.bkjia.com/PHPjc/632732.html www.bkjia.com true http://www.bkjia.com/PHPjc/632732.html techarticle we often found in the page in the wrong number of empty lines, but in the editor to see again, this we know is caused by the BOM (UTF-8), the following small series to share some of the customs ...



This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More