XML garbled problem and encoding understanding, xml garbled encoding

Source: Internet
Author: User
Tags processing text xml parser

XML garbled problem and encoding understanding, xml garbled encoding
File Encoding
File encoding, also known as character encoding, is used to specify how to represent characters when processing text. One encoding may be better than the other encoding mainly depends on which language characters it can process or cannot process, But Unicode is usually preferred. When reading or writing files, incorrect file encoding may cause exceptions or incorrect results. Encoding type

Unicode is the preferred encoding for processing files. Unicode is a global character encoding standard that uses a 16-bit code value to represent all characters used in modern computing, including the technical symbols and special characters used in printing.


Relationship between the encoding attribute and the file format
I used to understand that the encoding defined by encoding in xml must match the file format. That is, <? Xml encoding = "UTF-8"?>, The file format must be UTF-8. The value of encoding must match the file format (BOM, BOM is the abbreviation of byte order mark). Otherwise, garbled characters may occur during XML parsing, but this is not the case.

W3C defines three rules for the XML parser to correctly read the encoding of XML files:
1. if the text block has a BOM (byte sequence mark. Generally, if it is saved in unicode format, it contains BOM, and ANSI does not), the file encoding is defined (when it is saved as a file, select the encoding format ).
2. If there is no BOM, view the encoding attribute declared by XML encoding.
3. If neither of the above is available, it is assumed that the XML file uses UTF-8 encoding.

With these three rules, it is much clearer.

First, the XML Parser parses the file based on the file BOM (File Storage Format). If the BOM is not found, the encoding specified by the encoding attribute in XML is used; if encoding is not specified in xml, UTF-8 is used by default to parse the document. It can be launched again. If both BOM and ENCODING are available, the specified BOM prevails.


Conclusion
The conclusion is that the encoding attribute should be specified as the encoding used when the document is saved.
My best advice to avoid errors is:
Use an editor that supports encoding, such as Editplus
Determine the encoding used by the editor (generally you can view and modify it)
Use the same encoding attribute in your XML document, that is, the value of encoding

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.