Understanding the xml dom tree structure

Source: Internet
Author: User
Understanding the xml dom tree structure

Huang hongfa, Department of Computer Information System, Beijing Institute of Information Engineering

2002-4-29 14:00:24

I. Introduction
XML is the abbreviation of extensible markup language. It is a scalable identifier language that allows you to create your own identifiers to identify the content you represent. The full name of Dom is Document Object Model (Document Object Model), which defines a set of interfaces irrelevant to the platform and language, so thatProgramAnd scripts can dynamically access and modify the content, structure, and style of XML documents. XML creates identifiers, and Dom is used to tell the program how to operate and display these identifiers.
Ii. DOM tree structure
In fact, XML organizes data into a tree. Dom parses the XML document and creates a tree model for the XML document in logic. The nodes of the tree are objects one by one. In this way, you can operate on XML documents by operating on this tree and these objects, providing a perfect conceptual framework for processing all aspects of the document.
See the following XML document:
<Line id = "1"> the <bold> first </bold> line </line>
The Dom structure is as follows:


Because Dom "Everything is a node (everything-is-a-node)", every document, element, text, ATTR, and comment in the XML tree are DOM nodes.
The preceding example shows that Dom is essentially a collection of nodes. Because the document may contain different types of information, several different types of nodes are defined, such: document, element, text, ATTR, cdatasection, processinginstruction, notation, entityreference, entity, documenttype, documentfragment, etc.
When creating an XML file, define the following XML file:
<? XML version = "1.0" encoding = "UTF-8"?>
<Students>
<! -- This is an example -->
<Student>
<Name>
<First-name> Mike </first-name>
<Last-name> Silver </last-name>
</Name>
<Sex> male </sex>
<Class studentid = "15"> 98211 </class>
<Birthday>
<Day> 3 </day>
<Month> 3 </month>
<Year> 1979 </year>
</Birthday>
</Student>
<Student>
<Name>
<First-name> Ben </first-name>
<Last-name> Silver </last-name>
</Name>
<Sex> male </sex>
<Class studentid = "16"> 98211 </class>
<Birthday>
<Day> 3 </day>
<Month> 3 </month>
<Year> 1980 </year>
</Birthday>
</Student>
</Students>
We can naturally imagine a structure like this, but this is just a description of the data, not the structure of the DOM tree.


We can use the followingCodeObtain the root node and the number of child nodes under the root node of the XML document above.
Import javax. xml. parsers .*;
Import org. W3C. Dom .*;
Import java. Io. file;
Import xmlwriter. xmlproperties;
Public class XML
{
Public static void main (string ARGs [])
{Try
{File = new file ("links. xml ");
Documentbuilderfactory factory = documentbuilderfactory. newinstance ();
Documentbuilder builder = factory. newdocumentbuilder ();
Document Doc = builder. parse (File );
Doc. normalize ();
Element theroot = Doc. getdocumentelement ();
Nodelist theList = theroot. getchildnodes ();
System. Out. println ("the students root has" + theList. getlength () + "children ");
}
}
}
The result is displayed as follows:

& Lt; table width = "100%" & gt;
as shown in the XML document above, students only has three child nodes (including comments), but the program returns seven child nodes. Why? Because the knots and elements in the Dom are not equivalent, its seven nodes include two student elements, annotations, and text nodes around them. These text nodes may be carriage return line breaks, spaces, or backspaces. If we delete these carriage return lines, spaces, and backspaces, these text nodes will not be available during Dom interpretation, there are only three child nodes. Is the precise description of the DOM tree:


3. Common basic node types: Documents, elements, attributes, text, and comments
There are a total of 12 node types in XML, of which the most common node types are 5:
Element: an element is the basic component unit of XML ., Describes the basic information of XML.
Attribute: an attribute node contains information about an element node. It is usually contained in an element and describes the attributes of an element.
Text: contains many text information or is blank.
Document: The document node is the parent node of all other nodes in the document.
NOTE: Annotations describe and comment on relevant information.
4. Common basic methods:
After the JAXP package is used to explain the XML document, the basic operations on Dom node objects include:
Appendchild (node newchild): Add a new Child to the end of the child List on the current node.
Getattributes (): Get the attribute list of the current node. The returned type is namednodemap.
Getchildnodes (): Get the Child list of the current node. The return type is nodelist.
Getfirstchild () and getlastchild (): Get the first and last child nodes.
Getnextsibling () and getpreviussibling (): Obtain the next and previous sibling nodes of the current node.
Getnodename (), getnodetype (), and getnodevalue (): Get the name, type, and value of the current node.
Getparentnode (): Get the Father's Day of the current node.
Insertbefore (node newchild, node refchild): Insert a new node before the refchild node of the current node.
Removechild (node oldchild): deletes an oldchild node.
The above lists some common basic methods for DOM tree operations. There are many other methods. You can refer to the relevant specifications.
5. recursively traverse the DOM tree
The DOM tree structure is very similar to a binary tree. The child node set of an element is the branch of this element. However, an attribute node is not a child node of an element. It only describes some properties of this element node, it is part of the element node structure. The following is a Java program that traverses the DOM:
Public class recurdom (nodelist)
{
Node node;
Int I;
If (nodelist. getlength () = 0)
{
// No subnode is returned for this node
Return;
}
For (I = 0; I <nodelist. getlength (); I ++)
{
Node = nodelist. item (I );
If (node. getnodetype () = node. element_node)
Recurdom (node. getchildnodes (); // recursive call
}
}
6. Summary:
Dom is a programming model in a browser and also the main interface of XML. It has nothing to do with the language and platform. It is a tree-based API, it loads all the data into the memory to form a tree based on the parent-child node hierarchy. The types of these nodes can be elements, text, attributes, comments, or others. It allows developers to read, create, delete, and edit XML data. This emphasizes again that "everything in Dom is a node (everything-is-a-node )".
The program in this article is written in Java and the JAXP package is used to explain the XML document. This article applies to developers who understand the basic concepts of XML and are ready to use Dom to write applications to manipulate XML documents.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.