First, explain what Bom is.
The full name of BOM is byte order mark. It is not hard to see from its name that it exists to mark byte order. the definition of byte order can be found here because the group in which the http://en.wikipedia.org/wiki/Endianness exists is used to determine the actual storage order of binary data in memory. When the two machines exchange data, the byte order (or endian) must be consistent. Otherwise, an error occurs when interpreting the data. For example, Intel's CPU is little endian. The binary data transmitted over the network is big endian. Therefore, when using C # to accept data from the network, you must handle the byte sequence. Actually manyProgramIt does not process the byte order, because our common programs are running on the Wintel architecture. Both parties of the communication are littleendian. when the data is sent to the network, a byte order error occurs. After the other end accepts the request, a byte order error occurs again, and the result is positive. Therefore, this problem is usually not displayed.
Return to BOM and have a detailed explanation on Wikipedia. http://en.wikipedia.org/wiki/Byte_order_mark. in summary, utf8 is used. Both UTF16 and UTF32 insert BOM before the data stream to mark the byte order of subsequent content. this was a good intention, but there were two problems: first, because of the existence of BOM, text files are no longer an ASCII compatible text file. There are a lot of tools on the market for processing text files, and all of them are suspended when Bom is encountered. Second, many XML parsers do not support Bom. That is to say, XML with Bom is considered illegal by them, but parsing fails.
How to control BOM output
Bom is produced by encoding. Naturally, Microsoft provides reloads in its utf8, UTF16, and UTF32 constructors to specify whether to add this BOM to the data stream. The following uses XML as an example:
The aboveCodeCall createxmldoc to create an xmldocument object, add some content, and write xmldocument to stream. the Code in line 17-19 shows that the first three bytes of the output content are 0xef, 0xbb, and 0xbf. these three bytes are exactly the BOM encoded by utf8.
To solve this problem, I wrote the following code:
The code above adds a new overload to save by defining the Extension Method for xmldocument. This overload adds a new parameter withbom to express whether the BOM needs to be output. the most critical part of this Code is the 4th line: The withbom parameter is passed to the utf8encoding constructor.
In the following test, we checked the starting data of the output data stream. The starting data for this time is <?, Instead of 0xef, 0xbb, and 0xbf.