We often find some blank lines on the page for no reason, but in the editor we see that this we know is caused by BOM (UTF-8, below small make up to share with you several BOM (UTF-8) detection and deletion methods.
Is the HTML code that can be seen with firebug after the above-mentioned situation appears.
Figure 1
There is a blank line in it, but we don't see it in the source code.
My most common method is to use php to replace
BOM: Wanguo code file signature BOM (Byte Order Mark, U + FEFF)
The BOM content can indicate the UNICODE encoding, but after receiving the archive, You Need To disassemble it and write it into the database. Seeing the BOM, it seems a bit ooxx.
You can see two programs in utf8_encode to test writing/removing BOM.
Add the written file content to the BOM
The Code is as follows: |
Copy code |
<? Php Function writeUTF8File ($ filename, $ content) { $ F = fopen ($ filename, 'w '); Fwrite ($ f, pack ("CCC", 0xef, 0xbb, 0xbf )); Fwrite ($ f, $ content ); Fclose ($ f ); } ?> |
Remove BOM function
The Code is as follows: |
Copy code |
<? Php Function removeBOM ($ str = '') { If (substr ($ str, 0,3) = pack ("CCC", 0xef, 0xbb, 0xbf )){ $ Str = substr ($ str, 3 ); } Return $ str; } ?> |
Therefore, the above BOM = pack ("CCC", 0xef, 0xbb, 0xbf), so the method for removing BOM can use the above removeBOM function or one of the following:
■ Str_replace ("replace", '', $ bom_content );
■ Preg_replace ("/^ replace/", '', $ bom_content );
Also see to judge whether this string is a function of the UTF-8:
The Code is as follows: |
Copy code |
Function isUTF8 ($ string) { Return (utf8_encode (utf8_decode ($ string) ==$ string ); } |
Use shell in linux
Before discussing in detail the problem of BOM detection and deletion in UTF-8 coding, we may try to warm up with an example:
The Code is as follows: |
Copy code |
Shell> curl-s http://www.bKjia. c0m/| head-1 | sed-n l When <! DOCTYPE html PUBLIC "-// W3C // dtd xhtml 1.0 Transitional // EN "" http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd "> $ |
As shown above, the first three bytes are 357, 273, and 277, which are the BOM of the octal component.
The Code is as follows: |
Copy code |
Shell> curl-s http://www.111cn.Net/| head-1 | hexdump-C 00000000 ef bb bf 3c 21 44 4f 43 54 59 50 45 20 68 74 6d |... <! DOCTYPE htm | 00000010 6c 20 50 55 42 4c 49 43 20 22 2d 2f 57 33 43 | l PUBLIC "-// W3C | 00000020 2f 2f 44 54 44 20 58 48 54 4d 4c 20 31 2e 30 20 | // dtd xhtml 1.0 | 00000030 54 72 61 6e 73 69 74 69 6f 6e 61 6c 2f 2f 45 4e | Transitional // EN | 00000040 22 20 22 68 74 70 3a 2f 2f 77 77 77 2e 77 33 | "" http: // www. w3 | 00000050 2e 6f 72 67 2f 54 52 2f 78 68 74 6d 6c 31 2f 44 |. org/TR/xhtml1/D | 00000060 54 44 2f 78 68 74 6d 6c 31 2d 74 72 61 6e 73 69 | TD/xhtml1-transi | 00000070 74 69 6f 6e 61 6c 2e 64 74 64 22 3e 0d 0a | tional. dtd ">... | |
As shown above, the first three bytes are EF, BB, and BF, which are the hexadecimal BOM. Note: When a third-party website page is used, examples cannot be always available. In actual project development, may face hundreds of thousands of text files, if there are a few files mixed into the BOM, it is difficult to notice, if there is no UTF-8 text file with BOM, you can use vi to write several articles. The related commands are as follows:
Set UTF-8 encoding:
The Code is as follows: |
Copy code |
: Sets fileencoding = UTF-8 |
Add BOM:
The Code is as follows: |
Copy code |
: Set bomb |
Delete BOM:
The Code is as follows: |
Copy code |
: Set nobomb |
Query BOM:
The Code is as follows: |
Copy code |
: Set bomb? |
How to check BOM in UTF-8 coding?
The Code is as follows: |
Copy code |
Shell> grep-r-I-l $ '^ records'/path how to delete BOM from UTF-8 encoding? Shell> grep-r-I-l $ '^ rows'/path | xargs sed-I's/^ rows '//; Q' |
Recommendation: If you use SVN, you can add relevant code to the pre-commit hook to prevent BOM.
The Code is as follows: |
Copy code |
#! /Bin/bash REPOS = "$1" TXN = "$2" SVNLOOK =/usr/bin/svnlook For FILE in $ ($ SVNLOOK changed-t "$ TXN" "$ REPOS" | awk '/^ [AU]/{print $ NF}'); do If $ SVNLOOK cat-t "$ TXN" "$ REPOS" "$ FILE" | grep-q $ '^ then'; then Echo "Byte Order Mark be found in $ FILE" 1> & 2 Exit 1 Fi Done |
Many shell commands are used in this article.
Method 3: Use the ultraedit editor to directly modify the document
Just save the empty line document in the BOM format.
Is the encoding format when ultraedit saves the document:
Figure 2
Select UTF8-No BOM in it to solve all problems