Full explanation of spaces in XML

Source: Internet
Author: User
Tags contains empty end key xml parser xsl xslt xslt processor
Xml

Only suitable for the XML file structure has a certain understanding of scholars, not suitable for beginners. Please read in order from top to bottom.


Whitespace is not important for HTML files, however, the default position for XML is to preserve the space node (the explanation of the space node is shown below).


According to the provisions of the XML specification, the so-called space is a four characters of any combination sequence:
-----------------------
Empty characters (space), the corresponding character value is #x20
Returns the character (carriage return) with the corresponding character value of #xD
New line character (newline) with the corresponding character value of #xA
Jumps characters (Tab), the corresponding character value is #x9.

The space of the XML file also forms a node, which is the space node. A space node belongs to the type of a literal node.

FOR XML and XSLT, a space node involves two issues:
-----------------------
1. To determine which spaces are important in an XML input file, the XSLT processor sees these space nodes. The key that is determined is the Xml:space attribute.
2. To determine which spaces are important in the XSL template file, the XSLT processor should copy it into the result tree, and the decision key is Xsl:strip-space
And xsl:preserve-space these two commands.


"Important and unimportant space nodes."
-----------------------
If the content of a component can only be placed in the component, then the Space node in the component is unimportant (insignificant);
If the content of a component is a #pcdata type, the space node in it should be considered important (signficant).
The case that the component content is mixed with text content and components cannot be judged, depending on the semantics of the component and its contents.


Before the XSLT processor touches the XML input file, it is parsed by the XML parser
-----------------------
(1) The Xml:space property can change the pattern of the subsequent inherited XML application processing the space node, for example, the XSLT processor will be affected by the Xml:space attribute.
(2) The end of any column mark or end of content in an XML file is replaced with a single new line character (#xA).
(3) before an attribute value is given to an XML application, the XML parser should also perform a normalized operation on the attribute value. This is because different operating systems have different combinations of end characters for each line of text columns, for example, the Windows system consists of a return character, the new line character, and the Unix system is only made up of new line character groups
into a trailing sign. After reading the XML file, the XML parser first replaces all the ending symbols with a single new line character, which not only unifies the difference of the different end symbolic designs between different systems, but also simplifies the operation difficulty of the subsequent XML applications. Such a process is called "Normalization (normalization)."
A, the ending symbol for each text column is normalized to a single new line character (#xA).
b, any spaces (#x20, #xD, #xA, #x9) should be replaced by an empty characters (#x20).
C, if the attribute value contains a parameter code, it should be replaced with the reference character, for example, a new line character (#xA).
D, the property value, if it contains an entity reference, should be replaced with its replacement text.
E, in addition, any character should be placed directly in the normalized attribute value.
F, finally, if the attribute type is not CDATA, the XML parser should further remove the sequence of whitespace characters before and after the attribute value, and if there is a space sequence in the middle of the attribute value, it should be replaced with a single space character.


After the XSLT processor has built the XML input file and the structure tree of the XSL template file, it now merges the adjacent text nodes in the component into a single text node and then pulls out some text nodes. However, if the text node meets one of the following conditions, it is preserved:
-----------------------
(1) The parent component of a text node is a member of a space reserved component name set (set of Whitespace-preserving Element Names).
(2) There is at least one spaces in the text node.
(3) An ancestor component of a literal node has a xml:space attribute with a value of preserve and no other xml:space attribute value in the nearer ancestor component is default. In addition to the word knot will be drawn out.


For an XSL template, the so-called space-reserved component name set has only one Xsl:text component available. The space node of the XSL template file is deleted, but if the space node appears in the Xsl:text component, it is preserved.



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.