How can XML Parser implement one of the DTD validation: Element level validation?

Source: Internet
Author: User
Tags xml parser expression engine

How to validate DTD

The structure definition of the XML document by DTD is mainly reflected in two aspects: the definition of the child node type and the definition of attributes. To verify the DTD of an XML parser, you must verify the two DTD definitions. First, check the type of the DTD subnode.

DTD statements define child nodes in the following types:
<! Element A any>
Node A can contain any node type, which is the simplest case.
<! Element A (# pcdata)>
Node A can only contain text information.
<! Element A (B, c)>
Node a can and must contain Node B and node C, and Node B must be located before node C.
<! Element A (B *, c)>
A node can be followed by any number of B nodes, followed by a C node.
<! Element A (B ?, C +) *>
This situation is complicated. Ignore the *, B? Indicates that there can be one and at most one B node, and then at least one C node.
Then, consider the outermost layer *. The number of repeated occurrences of this combination is 0, 1, 2,... try to enumerate it. The possible situation is:
BC, BCC,..., C, CC, CCC,..., BBC, bbcc, bbccc..., bcbc, bcbcbcbc ,...,...

If you want to verify the DTD subnode type, this is obviously *,?, + The situation is complicated. In fact, we can find that this declaration method is itself a regular expression. Can we use regular expression validation to verify the type of the DTD subnode? The answer is yes.

Let's look at an example. <! Element A (B ?, C +) *> in this case, the memory data structure generated by the XML parser is:

Two types of layers:
Dtdelementdecl
Dtdelementdeclnode

Dtdelementdeclnode represents a declared subnode, while dtdelementdecl represents a complete node declaration.
The above expression produces the following structure:
Dtdelementdeclnode B;
Dtdelementdeclnode C;

B. setname ("B ");
B. setcounttype (enumoneorzero );

C. setname ("C ");
C. setcounttype (enumoneormore );

Dtdelementdeclnode D;
D. setcounttype (enumzeroormore );
D. addchild (B );
D. addchild (C );

Dtdelementdecl;
A. addchild (d );

Now we can see that there is a subnode under dtdelementdecl, which can be any one. This subnode contains its own two types of subnodes and loops down.
If you use a regular expression for verification, you first need to translate this hierarchy into a regular expression. For example, the above structure can be expressed as :( B? C +) *, very simple. You can use a regular expression analysis engine to analyze its structure (such as boost RegEx ).

With a regular expression, you can validate the XML document. However, since the Regular Expression Engine currently only supports string matching, therefore, you also need to convert the hierarchy of the corresponding nodes in the XML document into a corresponding string. For example:
<A>
<B/>
<C/>
<C/>
<C/>
<B/>
<C/>
</A>
From the previous analysis, we can see that this is a match <! Element A (B ?, C +) *> to the corresponding regular expression string, which can be expressed:
Bcccbc.
The final job is to use (B? C +.

The above is an implementation method for the structure verification of DTD elements in my XML parser. For the attribute validation, we will provide it in the next blog.
If you are interested in seeing the source code, please provide the mail address.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.