FAQs about getting started with XML (3)

Last Update:2018-12-08 Source: Internet

Author: User

Tags truncated

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Author: angelgavin Source: csdn

How can I load documents with foreign and special characters?

A document can contain foreign characters, for example:

Foreign characters (úóí ?)

For example, the foreign character of escape must be prefixed with the escape sequence. Foreign characters can be either UTF-8 encoded or specified with different encodings, as shown below:

Foreign characters (bytes)

Now you can load XML correctly.

Other characters are stored in XML and must be processed in different ways. The following XML:

This & that
The following error occurs:
Spaces are not allowed here.
Row 0000001: This & that
Location: 0000012: ---------- ^

Here & is part of the XML syntax structure. If it is only placed inside the XML data source, it cannot be interpreted &. You must replace a Special Character Sequence called "entity.

This & that
The following characters need corresponding entities:

<
&&
>
""
''

Quotation Mark characters are used as delimiters to mark attribute values. Therefore, they cannot be used within the attribute values. For example, the following content will return an error:

The single quotes are used as attribute delimiters and in the attribute value itself. To solve this problem, you can replace the attribute delimiters with double quotation marks:

Or you can escape single quotes as entities'

Both methods return the attribute value John's stuff through the getattribute method in the XML object model. Similarly, for double quotes, you can use entities
".
You can also put the text in the CDATA section to process special characters in the element content. The following content is correct:

In this example, the XML Object Model displays the CDATA node as a child node of the XML node, which returns a string

This & that is just "text" content.
As nodevalue.

In Visual Studio 6.0 C ++, how does one use the msxml com component?

In Visual C ++ 6.0, the easiest way to use the msxml com component is to use the # import command:

# Import "MSXML. dll" named_guids no_namespace # import "MSXML. dll" named_guids no_namespace
It defines all ixml * interfaces and interface IDs.Program. You can also obtain the MSXML library, header file (in English), and UUID. Lib containing IIDS.

How to Use HTML entities in XML?

The following XML contains HTML objects:

It produces the following errors:

Reference undefined object 'copy '.
Row: 1, Location: 23, error code: 0xc00ce002
Copyright? 2000 ,...
---------------------- ^

This is because XML only has five built-in entities. For more information about built-in entities, see how to load documents with foreign and special characters ?.

To use HTML entities, you must use DTD to define them. For more information about DTD, see W3C XML recommendations ). To use this DTD, include it directly in the doctype tag, as shown below:

To load it, you need to disable the validateonparse attribute of the ixmldomdocument interface. Paste it to the validator test page, disable DTD verification, and then click Verify ". Note that the document will be loaded and the copyright character will be displayed in the DOM tree at the end of the validator page.

If you have completed DTD verification, you must include the HTML entity as the parameter entity in the existing DTD, as shown below:

% Htmlent;
% Htmlent;

It defines all HTML entities so that they can be used in XML documents.

How do I handle white spaces in element content?

Xml dom has three methods to access element text:

Attribute Behavior

Nodevalue returns the original text (including blank characters) on the text, CDATA, comment, and PI nodes as specified in the original XML source ). For the element node and document itself, a null value is returned.

Data is the same as nodevalue

Duplicate text connection refers to multiple text and CDATA nodes in the subtree and return the combined result.

Note: blank characters include new lines, tabs, and spaces.

The nodevalue attribute usually returns the content of the original document, regardless of how the document is loaded and the current XML: Space Range.

The text property connection refers to all text in the Child tree and extends the object. This is related to how to load the document, the current status of the preservewhitespace switch, and the current XML: Space range, as shown below:

Preservewhitespace = true when the document is loaded

Preservewhitespace = true	Preservewhitespace = true	Preservewhitespace = false	Preservewhitespace = false
XML: Space = preserve	XML: Space = default	XML: Space = preserve	XML: Space = default
Retained	Retained	Retained	Retain and truncate

Preservewhitespace = false when the document is loaded

Preservewhitespace = true	Preservewhitespace = true	Preservewhitespace = false	Preservewhitespace = false
XML: Space = preserve	XML: Space = default	XML: Space = preserve	XML: Space = default
Semi-Reserved	Semi-retained and truncated	Semi-Reserved	Semi-retained and truncated

The reserved content indicates the content of the original text exactly the same as that of the original XML document. truncation means that leading and trailing spaces have been deleted, semi-retention means that "important white spaces" are retained and "unimportant white spaces" are normalized ". An important white space character is a blank character in the text content. The unimportant white space character is the white space character between tags, as shown below:

\ N
\ T Jane \ n
\ Tsmith \ n

In this example, red is an unimportant white space character that can be ignored, while green is an important white space character. Because it is a part of the text content, it has an important meaning that cannot be ignored. In this example, the text property returns the following results:

Status Return Value
Retain "\ n \ t Jane \ n \ tsmith \ n"

Keep and truncate "Jane \ n \ tsmith"

Semi-Reserved "Jane Smith"

Semi-retained and truncated "Jane Smith"

Note that "semi-retained" will normalize unimportant white space characters, for example, new lines and tab characters will degrade to a single space. If you change the XML: space attribute and preservewhitespace switch, different values are returned for the text attribute.

CDATA and XML: Space = "preserve" subtree boundaries
In the following example, the content of the CDATA node or the "Reserved" node is connected because they are not involved in unimportant white space character normalization. For example:

\ N
\ T Jane \ n
\ T Smith]> \ n

In this case, the white space characters in the CDATA node are no longer "merged" with the "unimportant" white space characters and will not be truncated. Therefore, when "semi-retained and truncated", the following content is returned:

"Jane Smith"

Here, unimportant white spaces between and the mark will be included, and are irrelevant to the content of the CDATA node. If the following content is used instead of CDATA, the same result is returned:

Smith
Entity is special

An object is loaded and analyzed as part of a DTD and displayed under the doctype node. They do not have to have any XML: Space Range. For example:

Jane \ n
\ T \ n
">
]>
& Jane;

Assuming that preservewhitespace = false (within the doctype tag range), the blank characters that are not important during entity analysis are lost. The entity does not have any blank character nodes. The tree will be similar:

Doctype foo
Entity: Jane
Element: Employee
Element: Name
Text: Jane
Element: Title
Text>: Software Design Engineer
Element: foo
Attribute: XML: Space = "preserve"
Entityref: Jane

Note that the DOM tree exposed under the doctype internal entity node does not contain any whitespace node. This means that the subnode of the entityref node does not have a whitespace node, even if the object reference is within the range of XML: Space = "preserve.

Each entity instance referenced in a given document usually has the same tree.

If the entity must retain white space characters, it must specify its own XML: space attribute within itself, or the document preservewhitespace switch must be set to true.

How do I handle white spaces in properties?

You can access attribute values in several ways. The ixmldomattribute interface has the nodevalue attribute, which is equivalent to the nodevalue and text attributes extended by Microsoft. Returned text of these properties:

Attrnode. nodevalue
Attrnode. Value
Getattribute ("name") returns the same content (and extended entity) as in the original document ).
Attrnode. nodetypedvalue null
Besides leading and trailing white space characters, attrnode. text is the same as nodevalue.

The XML specification defines the following behavior for an XML application: Text returned by the property type
Cdata id, idref, idrefs, entity, entities, notation, enumeration

Semi-Standardization

In this semi-normalization, the new line and tab characters are converted to spaces, but multiple spaces are not degraded into one space.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More