Detect and Delete page BOM (UTF-8) blank line Method _php Tutorial

Source: Internet
Author: User
Tags ultraedit
We often found in the page in a few empty lines, but in the editor to see again, this we know is caused by the BOM (UTF-8), the following small series to share several about the BOM (UTF-8) detection and deletion method.

is the HTML code that you see with Firebug after the previous situation.

Figure 1

There is somehow more than a blank line, and we see the source code inside but not.


The most common way I use PHP is to replace

BOM: Universal code file Signature BOM (Byte Order Mark, U+feff)

The contents of the BOM can indicate which encoding UNICODE is, but in the received file, to be disassembled after the write to the DB, see the BOM is a bit ooxx.


In Utf8_encode see two programs can be tested to write/remove BOM.

The file content to be written is pre-added BOM

The code is as follows Copy Code

function Writeutf8file ($filename, $content)
{
$f = fopen ($filename, ' w ');
Fwrite ($f, Pack ("CCC", 0XEF,0XBB,0XBF));
Fwrite ($f, $content);
Fclose ($f);
}
?>

To remove a BOM function

The code is as follows Copy Code

function Removebom ($str = ")
{
if (substr ($str, 0,3) = = Pack ("CCC", 0XEF,0XBB,0XBF)) {
$str = substr ($str, 3);
}
return $str;
}
?>

Thus the above BOM = Pack ("CCC", 0XEF,0XBB,0XBF), so the wording of the removal BOM can be used above the Removebom function or one of the following:

Str_replace ("Nobelium", "', $bom _content);
Preg_replace ("/^ nobelium/", ", $bom _content);
Also see to determine if this string is UTF-8 function:

The code is as follows Copy Code

function IsUTF8 ($string)
{
Return (Utf8_encode (Utf8_decode ($string)) = = $string);
}

Using the shell in a Linux system to solve

Before discussing the problem of BOM detection and deletion in UTF-8 code, it is advisable to warm up by an example:

The code is as follows Copy Code
shell> Curl-s http://www.bKjia.c0m/| head-1 | Sed-n L
Nobelium//en "" HTTP://WWW.W3.ORG/TR/XHTML1/DTD/XHTML1-TRANSITIONAL.DTD "> $

As shown above, the first three bytes are 357, 273, 277, which is the octal BOM.

The code is as follows Copy Code
shell> curl-s http://www.111cn.Net/| head-1 | hexdump-c
00000000 EF BB BF 3c 4f All-in-a-... 00000010 6c 4c, 2d 2f 2f, |l public "-//w3c|
00000020 2f 2f, 1.0, 4d, 4c, 2e, |//DTD, XHTML, and more.
00000030 6e, 6f 6e, 6c 2f 2f, 4e | transitional//en|
00000040 3a 2f 2f All-in-77 33 | "Http://www.w3|
00000050 2e 6f, 2f, 2f, |.org/tr/xhtml1/d|, 6d, 6c, 2f,
00000060 2f, 6d 6c, 2d, 73 69 | td/xhtml1-transi|
00000070 6f 6e, 6c 2e, |TIONAL.DTD, 3e 0d 0a, >..|

As shown above, the first three bytes are EF, BB, BF, which is the hexadecimal BOM. Note: The use of third-party web pages does not guarantee that examples are always available. Actually do project development, may face hundreds of text files, if there are several files mixed with the BOM, it is very difficult to detect, if there is no BOM with the UTF-8 text file, can be fabricated by VI several, related commands are as follows:

Set UTF-8 encoding:

The code is as follows Copy Code
: Set Fileencoding=utf-8

To add a BOM:

The code is as follows Copy Code
: Set Bomb

To delete a BOM:

The code is as follows Copy Code
: Set Nobomb

Query BOM:

The code is as follows Copy Code
: Set bomb?

How to detect the BOM in UTF-8 encoding?

The code is as follows Copy Code

Shell> grep-r-i-l $ ' ^ nobelium '/path How do I remove a BOM from UTF-8 encoding?

Shell> grep-r-i-l $ ' ^ nobelium '/path | Xargs sed-i ' s/^ nobelium//;q '

Recommendation: If you use SVN, you can add the relevant code to the Pre-commit hook to eliminate the BOM.

The code is as follows Copy Code

#!/bin/bash

Repos= "$"
Txn= "$"

Svnlook=/usr/bin/svnlook

For FILE in $ ($SVNLOOK changed-t "$TXN" "$REPOS" | awk '/^[au]/{print $NF} '); Do
If $SVNLOOK cat-t "$TXN" "$REPOS" "$FILE" | Grep-q $ ' ^ nobelium '; Then
echo "Byte Order Mark is found in $FILE" 1>&2
Exit 1
Fi
Done

This article uses a lot of shell commands

Method Three, modify the document directly using the UltraEdit editor

Save the document that appears blank line without the BOM format.

is the encoding format when UltraEdit saves a document:

Figure 2

Select the inside of the utf8-without BOM, all resolved

http://www.bkjia.com/PHPjc/632732.html www.bkjia.com true http://www.bkjia.com/PHPjc/632732.html techarticle we often found in the page in the wrong number of empty lines, but in the editor to see again, this we know is caused by the BOM (UTF-8), the following small series to share some of the customs ...

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.