We often found in the page in a few empty lines, but in the editor to see again, this we know is caused by the BOM (UTF-8), the following small series to share several about the BOM (UTF-8) detection and deletion method.
is the HTML code that you see with Firebug after the previous situation.
Figure 1
There is somehow more than a blank line, and we see the source code inside but not.
The most common way I use PHP is to replace
BOM: Universal code file Signature BOM (Byte Order Mark, U+feff)
The contents of the BOM can indicate which encoding UNICODE is, but in the received file, to be disassembled after the write to the DB, see the BOM is a bit ooxx.
In Utf8_encode see two programs can be tested to write/remove BOM.
The file content to be written is pre-added BOM
The code is as follows |
Copy Code |
function Writeutf8file ($filename, $content) { $f = fopen ($filename, ' w '); Fwrite ($f, Pack ("CCC", 0XEF,0XBB,0XBF)); Fwrite ($f, $content); Fclose ($f); } ?> |
To remove a BOM function
The code is as follows |
Copy Code |
function Removebom ($str = ") { if (substr ($str, 0,3) = = Pack ("CCC", 0XEF,0XBB,0XBF)) { $str = substr ($str, 3); } return $str; } ?> |
Thus the above BOM = Pack ("CCC", 0XEF,0XBB,0XBF), so the wording of the removal BOM can be used above the Removebom function or one of the following:
Str_replace ("Nobelium", "', $bom _content);
Preg_replace ("/^ nobelium/", ", $bom _content);
Also see to determine if this string is UTF-8 function:
The code is as follows |
Copy Code |
function IsUTF8 ($string) { Return (Utf8_encode (Utf8_decode ($string)) = = $string); } |
Using the shell in a Linux system to solve
Before discussing the problem of BOM detection and deletion in UTF-8 code, it is advisable to warm up by an example:
The code is as follows |
Copy Code |
shell> Curl-s http://www.bKjia.c0m/| head-1 | Sed-n L Nobelium//en "" HTTP://WWW.W3.ORG/TR/XHTML1/DTD/XHTML1-TRANSITIONAL.DTD "> $ |
As shown above, the first three bytes are 357, 273, 277, which is the octal BOM.
The code is as follows |
Copy Code |
shell> curl-s http://www.111cn.Net/| head-1 | hexdump-c 00000000 EF BB BF 3c 4f All-in-a-... 00000010 6c 4c, 2d 2f 2f, |l public "-//w3c| 00000020 2f 2f, 1.0, 4d, 4c, 2e, |//DTD, XHTML, and more. 00000030 6e, 6f 6e, 6c 2f 2f, 4e | transitional//en| 00000040 3a 2f 2f All-in-77 33 | "Http://www.w3| 00000050 2e 6f, 2f, 2f, |.org/tr/xhtml1/d|, 6d, 6c, 2f, 00000060 2f, 6d 6c, 2d, 73 69 | td/xhtml1-transi| 00000070 6f 6e, 6c 2e, |TIONAL.DTD, 3e 0d 0a, >..| |
As shown above, the first three bytes are EF, BB, BF, which is the hexadecimal BOM. Note: The use of third-party web pages does not guarantee that examples are always available. Actually do project development, may face hundreds of text files, if there are several files mixed with the BOM, it is very difficult to detect, if there is no BOM with the UTF-8 text file, can be fabricated by VI several, related commands are as follows:
Set UTF-8 encoding:
The code is as follows |
Copy Code |
: Set Fileencoding=utf-8 |
To add a BOM:
The code is as follows |
Copy Code |
: Set Bomb |
To delete a BOM:
The code is as follows |
Copy Code |
: Set Nobomb |
Query BOM:
The code is as follows |
Copy Code |
: Set bomb? |
How to detect the BOM in UTF-8 encoding?
The code is as follows |
Copy Code |
Shell> grep-r-i-l $ ' ^ nobelium '/path How do I remove a BOM from UTF-8 encoding? Shell> grep-r-i-l $ ' ^ nobelium '/path | Xargs sed-i ' s/^ nobelium//;q ' |
Recommendation: If you use SVN, you can add the relevant code to the Pre-commit hook to eliminate the BOM.
The code is as follows |
Copy Code |
#!/bin/bash Repos= "$" Txn= "$" Svnlook=/usr/bin/svnlook For FILE in $ ($SVNLOOK changed-t "$TXN" "$REPOS" | awk '/^[au]/{print $NF} '); Do If $SVNLOOK cat-t "$TXN" "$REPOS" "$FILE" | Grep-q $ ' ^ nobelium '; Then echo "Byte Order Mark is found in $FILE" 1>&2 Exit 1 Fi Done |
This article uses a lot of shell commands
Method Three, modify the document directly using the UltraEdit editor
Save the document that appears blank line without the BOM format.
is the encoding format when UltraEdit saves a document:
Figure 2
Select the inside of the utf8-without BOM, all resolved
http://www.bkjia.com/PHPjc/632732.html www.bkjia.com true http://www.bkjia.com/PHPjc/632732.html techarticle we often found in the page in the wrong number of empty lines, but in the editor to see again, this we know is caused by the BOM (UTF-8), the following small series to share some of the customs ...