Find files containing UTF8 BOM headers under Linux and delete BOM header information (Feef)

Source: Internet
Author: User

UTF-8 encoded files can be divided into no BOM and BOM two kinds of formats.

There is a BOM header storage or byte stream, it must be a Unicode character set encoding. The one that belongs to (Utf-8 or utf-16 or utf-32) can be judged by the head.
BOM in Utf-8 encoding file header, occupies three bytes, used to indicate that the file belongs to Utf-8 encoding
UTF-8 BOM is EFBBBF, because the UE loading UTF-8 file will be converted to UTF16, EFBBBF in Utf16 (Fffe BOM)

1, EditPlus to the method of BOM head

After the editor adjusts to the UTF8 encoding format, the saved file is preceded by a string of hidden characters (also the BOM), which is used by the editor to identify whether the file is encoded in UTF8.

Run EditPlus, click Tools, select Preferences, select files, UTF-8 identity option always delete signature,

Then the php file edited and saved after the PHP file is not with the BOM.

2, UltraEdit remove the BOM head method

After opening the file, save as an option in the encoding format selected (Utf-8 no BOM header), ok OK

Received user feedback today, users in the report to talk about the master will show the submission failed.
This feedback page is on the feedback platform, calling the back-end interface of the project. The service-related interfaces have not moved for a long time, why fail?
A machine was found on the line to debug, and the interface returned the content as "normal", a readable JSON string. The string copy, found under JS can be resolved.
But the project uses the Json_decode, but how all decode unsuccessful. Debugging a half-day feel very eccentric, has been unable to find a clue, afraid of their own brain is a set of thinking, ran to see will Golang, and then back to continue to solve.
Again how to look, all still feel very normal, should not problem.

Once again, the string returned by the interface was copied, and a label was found on the head.

Feff-utf8_bom_remove_linux_vim
Feff-utf8_bom_remove_linux_vim
Check the next, found this Feff is the BOM. Reference: Byte order mark

Find the problem and solve the problem.
Windows has a variety of editors, can solve the BOM header problem, but how to solve the Linux?
Google, the problem has been solved.
1. Find which files contain BOM headers.

Example
GREP-RL $ ' \xef\xbb\xbf '.
The BOM header can be seen through the hexdump.

2. Delete BOM header information.

Example


Sed ' 1s/^\xef\xbb\xbf//' filename.php-i

You will no longer see the BOM header when you delete it.

You can find and delete the BOM header information for a file with one command:

Example


Find. -type f-exec sed ' 1s/^\xef\xbb\xbf//'-i.bak {}\; -exec RM {}.bak\;

If you are in Windows we can refer to the following methods to solve

Windows BOM Header Solutions

The code is as follows Copy Code
<summary>
Clear the BOM header for the UTF8 file
</summary>
<param name= "FilePath" ></param>
<returns> Success </returns>
private static bool Clearbom (string FilePath)
{
if (! Checkbom (FilePath))
return true;

String filetemp = FilePath + ". Temp";

using (FileStream fsread = new FileStream (FilePath, FileMode.Open))
{
Skip the first three bytes
Fsread.seek (3, Seekorigin.begin);
int buffersize = 1024;
byte[] buffer = new Byte[buffersize];

using (FileStream fswrite = new FileStream (filetemp, Filemode.append, FileAccess.Write))
{
while (fsread.read (buffer, 0, buffersize) > 0)
{
Fswrite.write (buffer, 0, buffersize);
}
Fswrite.close ();
}
Fsread.close ();
}

Renamed
Try
{
File.delete (FilePath);
File.move (Filetemp, FilePath);
}
Catch
{
return false;
}
return true;
}

<summary>
Check to see if there is a BOM header.
UTF8 files have a 3-byte header for "EF BB BF" (called Bom--byte order Mark)
</summary>
<param name= "FilePath" ></param>
<returns></returns>
private static bool Checkbom (string FilePath)
{
BOOL Isbom = false;
using (FileStream fsread = new FileStream (FilePath, FileMode.Open))
{
byte[] buffer = new BYTE[3];
Fsread.read (buffer, 0, 3);
if (0xEF = = Buffer[0] && 0xbb = buffer[1] && 0xbf = buffer[2])
Isbom = true;
Fsread.close ();
}
return Isbom;
}

Summarize:

In many people maintenance projects, it is likely that someone else will upload some inappropriate files or code, the need for some routine monitoring, timely detection, resolution, to avoid the impact of online problems.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.