Simple XML tutorial-Document Object Model-Dom and SAX (2)

Source: Internet
Author: User

Tag: XML DOM

Dom Overview:

Dom is a typical parsing Technology Based on the tree structure of XML documents. In terms of concept, Dom Parsing is very easy to understand. Dom first loads the XML document and stores the XML tree structure in the computer memory for further processing.

1. Tree-based Dom and XML parsing mode 1.1xml tree-based parsing mode

The document type descriptions, elements, attributes, processing instructions, comments, and text content in XML documents can all be considered as nodes of the Status tree. Although the meaning of nodes is slightly different from the XML document itself and XPath, an XML document can be seen as a node tree distributed according to a certain level of structure.

When an XML document is loaded into the computer memory as a document, the application can access the XML document as it accesses other data objects and conveniently process the document. Dom (Document Object Model) is a typical example of such a resolution technology.

We can see that the DOM Processing Method for XML is completely different from that for Sax. When the SAX Parser processes XML documents, the nodes in the XML documents are read and analyzed one by one. In the DOM parser, the entire XML file is loaded by the parser. If the loaded XML file contains syntax errors, the DOM parser can detect and report errors during the loading phase. Therefore, the DOM parser does not need to record the processing status as it does in the sax parsing process, and the parsing process is not complicated. (In fact, most parsers use SAX to create the DOM tree)

Compared with sax, Dom has a great advantage in XML document content processing. Because XML documents are stored in a tree structure, the entire XML document is always accessible to applications at any time. In addition, it is very easy to dynamically modify documents in applications.

Dom also has obvious shortcomings over sax. Likewise, Because XML documents are completely stored in computer memory, the counting machine needs to consume several times the memory of the original XML document to process the tree structure in the memory after Dom parsing. When processing large XML documents, Dom greatly affects program efficiency due to memory capacity consumption. Therefore, using Dom is not good for application scalability. It is usually used to parse and process XML documents with a small amount of data.

1.2 Java Dom Programming Interface

Dom is composed of a set of abstract interfaces. The DOM interface defines Dom programming specifications. Different Dom parsers use different methods to implement these interfaces. Node is the most basic data type in the Dom. It represents an abstract node. The specific node types on the DOM tree of the XML document are derived from the node type. Dom interfaces in JAXP can be represented by class diagrams in


1. As mentioned above, node represents an abstract node in the DOM tree. It does not have a definite node type. A specific node in the DOM tree is derived from the node type. Each node derived from node can obtain the node type by testing its object type, or it can be obtained by the node's getnodetype () method. Each node can also get the node name and value through getnodename () and getnodevalue. The following table summarizes the basic rules of node names, node values, and node type values of the subinterfaces of the node interface.


2. Document

The document node represents the entire XML document. After Dom parses an XML document, the DOM node tree formed by XML is displayed as a document object to the application. Because the elements, processing commands, and annotations of an XML document are within the scope of an XML document, the document contains methods used to create elements, attributes, annotations, processing commands, and other node types.

Node's normalize () method provides a way to normalize XML documents. After a Document Object is operated by normalize (), adjacent text nodes are merged and empty text nodes are removed. The text in the XML document is separated by elements, processing instructions, comments, and CDATA.

3. element and ATTR

An ATTR object represents an attribute of an element. Dom does not regard attribute nodes as part of the entire document tree. Therefore, from the DOM perspective, the parent nodes and sibling nodes of element nodes are all null. The property node can use the getowenerelement () method to obtain the elements attached to the current property.

An element object represents an element node in the DOM tree. In addition to the conventional methods defined in node, element also defines a set of methods for retrieving attributes in an element, dynamically adding an attribute, or dynamically deleting an attribute.

4. entity and entityreference

Entity represents an entity (not an Entity description ). The entity nodes in dom2 have the following features: the entity cannot be modified (the entity nodes and descendant nodes are both read-only), the entity node does not have a parent node, and the name control prefix cannot be resolved.

Entityreference indicates an object reference. Like object nodes, Object Reference nodes and background nodes are also read-only. Besides the methods inherited from node, entityreference does not define its own unique methods. Unlike an object, an object node can pass through the getpublicid () and getsystemid () public and system identifiers, but an object reference node does not.

5. Text and cdatsection

The text node is used to represent the text content in the elements and attributes of the XML document.

Cdatsection is used to represent the content of the CDATA section in the XML document. The Dom parser can only recognize the ending mark "]>" of CDATA and use it as the CDATA node delimiter. Adjacent cdatasection nodes are not merged by the normalize () method in the node.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.