XML special character processing and CDATA

Source: Internet
Author: User
Tags xml parser

When working with XML data, special characters are handled specially and cannot be confused with node characters.

All text in the XML document is parsed by the parser.

Only the text in CDATA sections (CDATA section) is ignored by the parser.

PCDATA

PCDATA refers to the parsed character data (parsed Character).

The XML parser usually parses all the text in the XML document.

When an XML element is parsed, the text between its tags is also parsed:

<message> This text will also be parsed </message>

The parser does this because the XML element can contain other elements, as in this example, where the <name> element contains another two elements (first and last):

<name><first>Bill</first><last>Gates</last></name>

And the parser breaks it down into sub-elements like this:

<name>

<first>Bill</first>

<last>Gates</last>

</name>

Escape character

Illegal XML characters must be replaced with entity reference.

If you place a character like "<" in an XML document, the document generates an error because the parser interprets it as the beginning of the new element. So you can't write like this:

<message>if Salary < Then</message>

To avoid this type of error, you need to replace the character "<" with an entity reference, like this:

<message>if Salary &lt; Then</message>

in the XML in a 5 a pre-defined entity reference:

&lt; < Less than
&gt; > Greater than
&amp; & and number
&apos; Single quotation marks
&quot; " Double quotes

Note: strictly speaking, only the characters "<" and "&" are illegal in XML. ellipses, quotes, and greater-than numbers are legal, but it's a good practice to replace them with entity references.

Cdata

The term CDATA refers to textual data (unparsed Character data) that should not be parsed by the XML parser.

In XML elements, "<" and "&" are illegal.

"<" generates an error because the parser interprets the character as the beginning of the new element.

"&" also generates an error because the parser interprets the character as the beginning of the character entity.

Some text, such as JavaScript code, contains a large number of "<" or "&" characters. To avoid errors, you can define the script code as CDATA.

All content in the CDATA section is ignored by the parser.

CDATA part by "<![ cdata["Start, End with"]]>":

<script><! [cdata[    function matchwo (A,b) {        if (a < b && a < 0) then { c7/>return 1;        }         Else         {            return 0;        }    ]] ></script>

In the example above, the parser ignores all the content in the CDATA section.

about the CDATA section of the note:

CDATA sections cannot contain the string "]]>". Also, nested CDATA sections are not allowed.

"]]>" that marks the end of a CDATA section cannot contain spaces or lines.

XML special character processing and CDATA

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.