Space in XML

Source: Internet
Author: User
Tags xslt xslt processor

Tip: I extracted the core section about space interpretation in XSLT from getting started to proficient, So that I hope you will participate in the discussion. Let's talk about your understanding of spaces.

It is only applicable to scholars who have some knowledge about the XML file structure and is not suitable for beginners. Read in the order from top to bottom.

Spaces are not important for HTML files. However, for XML files, the default position is to retain space nodes (see the description below for space nodes ).

According to the XML specification, spaces are any combination sequence of four types of characters:
-----------------------
Space (Space). The corresponding character value is # x20
Carriage Return (carriage return), with the corresponding character value being # XD
Newline (newline). The corresponding character value is # XA
The hop character (Tab). The corresponding character value is # x9.

The space in the XML file also forms a node, that is, a space node. The space node belongs to the text node type.

For XML and XSLT, the space node involves two topics:
-----------------------
1. In the XML input file, it is important to determine which spaces are important. The XSLT processor must see these space nodes. The key is the XML: space attribute.
2. In the XSL template file, it is important to determine which spaces are important. The XSLT processor should copy them to the result tree, and the key to be determined is XSL: strip-space.
And XSL: preserve-space.

"Important and unimportant space nodes"
-----------------------
If the content of a component can only be placed in the component, the space node in the component is not important (insignificant );
If the content of a component is of the # pcdata type, the space node in the component should be regarded as signficant ).
When the component content is mixed with the text content and components, it cannot be judged, and it depends on the language of the component and its content.

before the XSLT processor comes into contact with the XML input file, it is analyzed by the XML analyzer
-------------------
(1) XML: the space attribute can change the mode in which the XML application Program processes space nodes. For example, the XSLT processor will be affected by the XML: space attribute.
(2) any column marker in the XML file or the ending symbol at the end of the content tail end is replaced with a single new line character (# xa ).
(3) before the attribute value is handed over to the XML application, the XML analyzer should standardize the attribute value. This is because different operating systems have different combinations of the ending characters in each line of text columns. For example, in windows, a new line of characters, such as the returned characters, will form an ending symbol, the UNIX system only uses the new line character group
as the ending symbol. After the XML analyzer reads the XML file, it first replaces all the ending symbols with a single new line character, which not only unifies the differences in the design of different ending symbols between different systems, it also simplifies the subsequent operations on XML applications. Such a process is called "normalization )".
A. The ending symbols of each text column must be normalized to a single new line character (# xa ).
B. Any space character (# x20, # XD, # XA, # x9) should be replaced with a space character (# x20 ).
C. If the attribute value contains a parameter code, replace it with the reference character. For example,
it is replaced with a new line character (# xa ).
D. If the attribute value contains an object reference, replace it with the replacement text.
E. In addition, any character should be directly placed in the normalized attribute value.
F. Finally, if the attribute type is not CDATA, the XML analyzer should further Delete the Space Character Sequence before and after the attribute values, and if there is a space sequence in the attribute values, it should also be replaced with a single space character.

After the XSLT processor builds the structure tree of the XML input file and the XSL template file, it combines the adjacent text nodes in the component into a single text node, and then removes some text nodes. However, if the text node meets one of the following conditions, it will be retained:
-----------------------
(1) The parent component of the text node is a member of the set of whitespace-preserving element names.
(2) there must be at least one non-space character in the text node.
(3) An ancestor component of a text node contains the XML: space attribute, whose value is preserve, and no other XML: space attribute value in the closer ancestor component is default. In addition, text knots are extracted.

For an XSL template, the so-called space reserved component name set has only one XSL: Text Component available. All space nodes in the XSL template file will be deleted. However, if the space node appears in the XSL: Text Component, it will be retained.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.