FAQ for Getting Started with XML (iii)

Source: Internet
Author: User
Tags define object cdata error code include object model access visual studio
xml| problem

Author: Angelgavin Origin: CSDN

How do I load documents with foreign and special characters?

Documents can contain foreign characters, such as:


Foreign characters (úóí?)


For example  TRIDACNA foreign characters must precede the escape sequence. Foreign characters can be UTF-8 encoded or specified in different encodings, as follows:


Foreign characters ( Tridacna)


You can now load the XML correctly.

Other characters are kept in XML and need to be handled in a different way. The following XML:


This &
Produces the following error:
Spaces are not allowed here.
Line 0000001:this & that
Location 0000012:----------^


Here & is part of an XML syntactic structure that cannot be interpreted as & if it is placed only inside an XML data source. You need to replace a special character sequence called "entity."

This &
The following characters require the corresponding entity:


< <
& &
> >
" "
' '


Quotation marks are used as delimiters for property values in markup, so they are usually not used inside property values. For example, the following content will return an error:

The single quotes here are used both as property delimiters and in the property value itself. To correct this problem, you can change the property delimiter to double quotes:

Or you can escape the single quote as an entity '

Both of these methods return the property value, John's Stuff, through the GetAttribute method in the XML object model. Similarly, for double quotes, you can use the entity
"。
You can also handle special characters in element content by placing text in a CDATA section. Here's what's right:

In this example, the XML object model displays a CDATA node as a child node of an XML node, which returns a string

This & the is just "text" content.
As NodeValue.

How do I use the MSXML COM component in Visual Studio 6.0 c + +?

The easiest way to use MSXML COM component M in Visual C + + 6.0 is to use the #import directive:

#import "Msxml.dll" Named_guids no_namespace#import "Msxml.dll" Named_guids no_namespace
It defines all the ixml* interfaces and interface IDs so that they can be used in the application. You can also get the MSXML type library and header file (in English) from InetSDK, as well as the uuid.lib containing the class IIDs.

How do I use HTML entities in XML?

The following XML contains HTML entities:


The Copyright? Watts, Microsoft INC, All rights reserved.


It produces the following error:


Reference to undefined entity ' copy '.
Line: 1, Location: 23, error code: 0XC00CE002
The Copyright? 2000, ...
----------------------^


This is because XML has only five built-in entities. For more information about built-in entities, see how to load documents with foreign and special characters?

To use HTML entities, you need to define them with DTDs. For more information about DTDs, see the XML recommendations for the consortium. To use the DTD, include it directly in the DOCTYPE tag, as follows:


The Copyright? Watts, Microsoft INC, All rights reserved.

To load it, you need to turn off the Validateonparse property of the IXMLDOMDocument interface. Try pasting it into the Validator test page, turn off DTD validation, and then click Validate. Note that the document will be loaded and the copyright character will appear in the DOM tree at the end of the validator page.

If you have completed DTD validation, you must include the HTML entity that is the parameter entity in the existing DTD, as follows:


%htmlent;
%htmlent;


It will define all HTML entities so that they can be used in XML documents.

How do I handle whitespace characters in element content?

The XML DOM has three ways to access element text content:

Property behavior

NodeValue returns the original text content (including whitespace characters) on the text, CDATA, COMMENT, and PI nodes as specified in the original XML source. For the ELEMENT node and the DOCUMENT itself, null values are returned.

Data is the same as NodeValue

Text repeats the concatenation of multiple text and CDATA nodes in the specified subtree and returns the combined result.

Note: Whitespace characters include new lines, tabs, and spaces.

The NodeValue property typically returns the contents of the original document, regardless of how the document is loaded and the current xml:space scope.

The Text property joins all the text in the specified subtree and expands the entity. This is related to the current state of the document, such as the Hegazai, PreserveWhitespace switch, and the current Xml:space range, as shown below:

  PreserveWhitespace = True when the document is loaded

Preservewhitespace=true Preservewhitespace=true Preservewhitespace=false Preservewhitespace=false
Xml:space=preserve Xml:space=default Xml:space=preserve Xml:space=default
Keep Keep Keep Reserved and truncated

  PreserveWhitespace = False when the document is loaded

Preservewhitespace=true Preservewhitespace=true Preservewhitespace=false Preservewhitespace=false
Xml:space=preserve Xml:space=default Xml:space=preserve Xml:space=default
Semi-reserved Semi-reserved and truncated Semi-reserved Semi-reserved and truncated

The reserved representation here is identical to the original text content in the original XML document, truncation means that the leading and trailing spaces have been deleted, and the semi retention means that "important whitespace characters" are retained and "unimportant whitespace characters" are normalized. Important whitespace characters are whitespace characters inside the text content. The unimportant whitespace character is the white space character between the tags, as shown below:

\ n
\ t jane\n
\tsmith \ n


In this example, red is an unimportant white-space character that can be ignored, and green is an important white-space character because it is part of the text content and therefore has important implications that cannot be ignored. So in this case, the Text property returns the following results:

State return value
Keep "\n\t jane\n\tsmith \ n"
  
Preserve and truncate "Jane\n\tsmith"
  
Semi-reserved "Jane Smith"
  
Semi-reserved and truncate "Jane Smith"

Note that "semi-retention" will normalize whitespace characters that are not important, for example, new lines and tab characters will be degraded to a single space. If you change the Xml:space property and the preservewhitespace switch, the Text property returns the corresponding different values.

CDATA and xml:space= "preserve" subtree boundaries
In the following example, the contents of a CDATA node or "reserved" node are connected because they do not participate in the unimportant whitespace character normalization. For example:


\ n
\ t Jane \ n
\ t Smith]>\n


In this case, the whitespace character inside the CDATA node is no longer merged with the "unimportant" whitespace character and is not truncated. Therefore, the "semi-retention and truncation" situation returns the following:

"Jane Smith"

Here, the unimportant whitespace characters between the tags are included, regardless of the content of the CDATA node. If you replace CDATA with the following, you will return the same result:

Smith
The entity is special

Entities are loaded and parsed as part of a DTD and are displayed under the DOCTYPE node. They do not necessarily have any xml:space range. For example:


Jane \ n
\t\n
">
]>
&Jane;


Assuming Preservewhitespace=false (within the DOCTYPE tag range), whitespace characters that are not important when parsing an entity are missing. The entity will not have a white-space character node. The tree will resemble the following:


DOCTYPE Foo
Entity:jane
Element:employee
Element:name
Text:jane
Element:title
Text>:software Design Engineer
Element:foo
Attribute:xml:space= "Preserve"
Entityref:jane


Note that the DOM tree exposed under the DOCTYPE internal ENTITY node does not contain any whitespace nodes. This means that the child nodes of the EntityRef node also have no whitespace nodes, even if the entity references are within the scope of the xml:space= "preserve".

An instance of each ENTITY referenced in a given document usually has the same tree.

If an entity must retain an absolute white-space character, it must specify its own Xml:space property internally, or the document PreserveWhitespace switch must be set to true.

How do I handle whitespace characters in a property?

There are several ways to access property values. The Ixmldomattribute interface has a NodeValue property, which is equivalent to the NodeValue and text properties that are extended as Microsoft. These properties return: The text returned by the property


Attrnode.nodevalue
Attrnode.value
GetAttribute ("name") returns exactly the same content (and extended entities) as the original document.
Attrnode.nodetypedvalue Null
Attrnode.text other than the leading and trailing whitespace characters have been truncated, the other is the same as nodevalue.


The XML Language specification defines the following behavior for an XML application: The text returned by the property type
CDATA IDs, IDREF, IDREFS, ENTITY, entities, notation, enumerations

Semi-normalized full normalization

This semi-normalized representation converts new rows and tab characters to spaces, but multiple spaces do not degenerate into a single space.



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.