File encoding on Microsoft platform is compatible with Unix and does not generate BOM headers. unixbom

Last Update:2015-09-09 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Encountered a problem ,. when HTML is generated in the NET background on Linux, a line of Garbled text will appear, and the style will be messy. NET runs on a windows platform, a bom header is automatically added to the generated UTF-8.

The key code is removed from the BOM.

System. Text. UTF8Encoding utf8 = new System. Text. UTF8Encoding (false );
StreamWriter sw = new StreamWriter (nFile, utf8 );

The following two files are removed and not removed. ef bb bf is the BOM header.

private bool FileStreamWriteFile(Model.RecommendHtml model)        {            try            {                string writeUrl = ConfigurationManager.AppSettings["unix21"];                string htmlurl = writeUrl + @"\html\" + model.ID + ".html";                FileStream nFile = new FileStream(htmlurl, FileMode.OpenOrCreate, FileAccess.ReadWrite);                nFile.Seek(0, SeekOrigin.Begin);                nFile.SetLength(0);                 System.Text.UTF8Encoding utf8 = new System.Text.UTF8Encoding(false);                StreamWriter sw = new StreamWriter(nFile,utf8);                sw.Write(model.RecommendContent);                sw.Close();                nFile.Close();                return true;            }            catch (Exception ex)            {                return false;            }        }

References for UTF-8 and BOM headers:

The UTF-8 does not require BOM, although Unicode standards allow BOM to be used in the UTF-8.
So does not include BOM UTF-8 is the standard form,It is Microsoft's habit to place BOM in a UTF-8 File(By the way: the small-end UTF-16 with BOM is called "Unicode" without a detailed description, which is Microsoft's habit ).
BOM (byte order mark) is prepared for the UTF-16 and UTF-32, used to mark the byte order ). Microsoft uses BOM in UTF-8 because it can clearly distinguish UTF-8 from ASCII codes, but such files will cause problems in operating systems outside of Windows.

In fact, BOM is not a bad habit. BOM is also part of the Unicode standard and has a specific applicability. Usually bomis used to mark the unicodepure character stream, used to identify a convenient character processing program reading the. txt file which is Unicode encoding (UTF-8, UTF-16BE, UTF-16LE ). Windows processes BOM better because it integrates Unicode recognition codes into APIs, mainly CreateFile (). When a text file is opened, it automatically identifies and removes the BOM. This is a historical reason for using Windows because it was originally originated from a multi-code-page environment (ANSI environment ). When Unicode is introduced, Windows designers hope to be able to be compatible with Unicode and non-Unicode (Multiple byte) text files without your attention, so they can only use this small trick. In contrast, Linux systems such as Linux have a short deployment time in Multi-locale environments. In addition, the Community itself has enough power to move forward with light load (spof: microsoft's requirements for compatibility is indeed a very paranoid point, any point undermine the compatibility of the practice is not allowed, so many times is bound to their own hands), so simply one step into the UTF-8. Of course, there is a transitional period in the middle, such as from the initial full UTF-8 of GTK + 2.0 released to basically all GTK developers are not using multiple locale GTK + 1.2, I have been there for at least three to four years.

BOM is not popular in UNIX environments, because many UNIX programs do not bird BOM.The main problem lies in the first line of all the scripting languages of UNIX #! This depends on shell parsing. Many shells do not check BOM for compatibility reasons. Therefore, when adding BOM, shell will interpret it as a common character input, causing damage #! Mark, this is troublesome. In fact, many modern scripting languages, such as Python, can process BOM in their interpreters themselves, but shell is stuck here, there is no way, you can only lie down and shot. This cannot be blamed on shell, because BOM itself violates a Common UNIX design principle, that is, the data in the document must be visible. BOM cannot be edited as visible characters in the text editor, which is not satisfactory to many UNIX developers.

Http://www.cnblogs.com/findumars/p/3620078.html

========================================================== =====

Q: What is a BOM?

A: UTF-8 files can be divided into two formats: no BOM and BOM.

What is BOM? "Ef bb bf" these three bytes are called BOM. The full name of BOM is "Byte Order Mard ". in UTF-8 files, BOM is often used to indicate that this file is a UTF-8 file, and BOM is really utf16 used to represent the high and low byte sequence.

Prior to the byte stream, BOM indicates that the low byte sequence is used (the low byte is at the front), while utf8 does not need to consider the byte sequence, so it is possible to have BOM.

Remove the BOM signature using the following methods:

Code

System. Text. UTF8Encoding utf8 = new System. Text. UTF8Encoding (false );
StreamWriter stream = new StreamWriter (Server. MapPath ("normren.html"), false, utf8 );
Stream. Write ("Content ");
Stream. Close ();

// In the past, someone seems to have to rewrite utf8 so that it does not generate a flag. You don't need to do that. The system has provided related functions.
StreamWriter dout = new StreamWriter ("1.html", false, new UTF8Encoding (false ));
Dout. Write ("sdsdsd ");
Dout. Close ();

Reference: http://blog.163.com/yanfeng_0/blog/static/6200414520096303911545/

========================================================== ============

BOM (Byte Order Mark) is the standard Mark used in the UTF Encoding scheme to Mark the encoding. In the UTF-16, It is ff fe, and the UTF-8 becomes ef bb bf. This flag is optional because UTF8 bytes are not sequential, so it can be used to detect whether a byte stream is UTF-8 encoded. Microsoft does this kind of detection, but some software does not do this kind of detection, and treats it as a normal character.

Microsoft added ef bb bf three bytes before its own text file in UTF-8 format, the notepad and other programs on windows are based on these three bytes to determine whether a text file is ASCII or UTF-8, but this is only a mark by Microsoft, other platforms do not make such a mark on UTF-8 text files.

That is to say, a UTF-8 file may have BOM, there may be no BOM, so how to distinguish? Three methods. 1, open the file with a UltraEdit-32, switch to the hexadecimal editing mode, check whether the file header ef bb bf. 2. Open it with Dreamweaver and check the page properties to see if there is a check mark before "including Unicode signature BOM. 3, open with Windows notepad, select "Save as", see the default file encoding is UTF-8 or ANSI, if it is ANSI without BOM.

Reference: http://blog.163.com/result_2205/blog/static/13981945020102954023564/

Copyright Disclaimer: This article is an original article by the blogger and cannot be reproduced without the permission of the blogger.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

File encoding on Microsoft platform is compatible with Unix and does not generate BOM headers. unixbom

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

File encoding on Microsoft platform is compatible with Unix and does not generate BOM headers. unixbom

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support