Previously, the encoding definition in xml must match the file format. There is such an xmlIntroduction & lt ;? Xmlencoding & quot; UTF-8 & quot ;..? & Gt;, the file format must be a UTF-8 file, that is, the first two bytes of the file should be a UTF-8 header FFFE. (Later I figured out that FFFE is not a UTF-8 BOM .. That is to say, my error understanding has lasted for a long time ..) Next, let's take a rough look at the several stages of the discussion. I discussed the relationship between the encoding attribute and the file format in xml with my colleagues the same day and finally figured out the relationship.
Previously, the encoding definition in xml must match the file format. There is such xml Introduction The file format must be a UTF-8 file, that is, the first two bytes of the file must be a UTF-8 header ff fe. (Later I figured out that FF FE is not a UTF-8 BOM .. That is to say, my error understanding has lasted for a long time ..)
Next, let's take a rough look at the several stages of the discussion.
At the beginning of the discussion, I certainly told him that the value of encoding must match the file format (BOM, abbreviated as byte order mark). otherwise, when parsing XML, it may occur (for example, if the document contains a UNICODE character, and the format specified by encoding or BOM does not match, an error will occur. that's what I meant at that time). Then he told me, it seems that this is not the case, I used DELPHI to create the XML file, no BOM, XML contains Chinese content, encoding is specified in the UTF-8, with IE can be opened normally.
When he finds that the XML file he created has no BOM, it is interesting to use UE to open such files containing UNICODE characters, the UE automatically adds ff fe to the front of the file so that the file can be normally displayed. Therefore, if there is no BOM file, you can view it in hexadecimal format under the UE. a BOM is added, this function can be removed from ue options. if you want to know this function, you can find it by yourself.
Then I was a little big-headed. how could this happen? then I thought about it. suddenly, he sent a message with the following content:
W3C defines three rules for the XML parser to correctly read the encoding of XML files:
1. if the text block has a BOM (byte sequence mark. generally, if it is saved in unicode format, it contains BOM, and ANSI does not), the file encoding is defined.
2. if BOM is not available, view the encoding attribute declared in XML.
3, if neither of the above, it is assumed that the XML file uses UTF-8 encoding
With these three rules, it is much clearer.
First, the XML parser parses the file based on the file BOM. if the BOM is not found, the encoding specified by the encoding attribute in XML is used. if the encoding attribute in xml is not specified, by default, UTF-8 is used to parse documents. It can be launched again. if both BOM and ENCODING are available, the specified BOM prevails.
Ah! I suddenly felt that it was good to have a standard document! Although it is so natural.
So far, we have finally understood the relationship between the encoding in xml and the file format. Although this record contains only a few hundred words, it took about two hours for us to discuss it.
The above is a detailed description of encoding in xml. For more information, see other related articles in the first PHP community!