Java xml parsing learning notes (1) -- DOM

Source: Internet
Author: User
Tags cdata

The following is a simple example of how to parse xml4 methods and their differences. The content comes from the internet. I just want to sort it out.

First introduce xml

XML document Node Type

U document)

U element)

U attribute)

U text (PCDATA -- parsed character data)

U comment)

U DOCTYPE: verifies the correctness of the document content.

U entity)

U CDATA (character data)

XML syntax
1. Statement:
 
2. root node: A root node is required.
3. Tag: The tag must end with a case sensitivity, and the tag must be nested sequentially.
4. attribute: The value must be enclosed in quotation marks.
5. spaces are retained, and up to one HTML space is retained.
6. Naming rules: the name must be known

A) The name can contain letters, numbers, and other characters.

B) The name cannot start with a number or punctuation.

C) The name cannot start with the character "xml" (or XML or Xml ).

7. The name cannot contain spaces.
8. You should not use ":" in the XML element name because it is used for reserved words in namespaces.
9. Labels take precedence over attributes.

10. The XML namespace provides methods to avoid element naming conflicts.

11. CDATA: character data,字符数据, Character data is not escaped

12. entity: & entity;

Put a simple xml file in the src directory.

First. xml

 
 
  
   
    
     
Lu B1234
    Qingdao, Shandong Province
   
   
    
     
Lu A1234
    Jinan City, Shandong Province
   
  
 

1) DOM (JAXP Crimson parser)
DOM is the official W3C standard for XML documents in a way unrelated to the platform and language. DOM is a collection of nodes or information fragments organized by hierarchies. This hierarchy allows developers to search for specific information in the tree. To analyze this structure, you usually need to load the entire document and construct a hierarchy before you can do any work. Because it is based on information layers, DOM is considered to be tree-based or object-based. DOM and tree-based processing in the broad sense have several advantages. First, because the tree is persistent in the memory, you can modify it so that the application can change the data and structure. It can also navigate up and down the tree at any time, rather than one-time processing like SAX. DOM is much easier to use.

Package Test; import java. io. file; import javax. xml. parsers. documentBuilder; import javax. xml. parsers. documentBuilderFactory; import org. w3c. dom. document; import org. w3c. dom. nodeList; public class MyXmlReader {public static void main (String [] args) {try {File f = new File ("src/first. xml "); DocumentBuilderFactory factory = DocumentBuilderFactory. newInstance (); // 1. Create the DocumentBuilder factory DocumentBuilder = factory. newDocumentBuilder (); // 2. Create DocumentBuilder Document doc = builder. parse (f); // 3. parse the xml file into a Document object, representing the DOM tree NodeList nl = doc. getElementsByTagName ("value"); for (int I = 0; I
 
  

DOM has five basic objects: Document, Node, NodeList, Element, and Attr. Here we will introduce them one by one:

1.1 Document Object

Represent the entire XML Document. All other nodes are included in the Document object in a certain order and arranged into a tree structure, programmers can traverse this tree to get all the content of the XML document, which is also the starting point for XML document operations. We always get a Document object by parsing the XML source file, and then perform subsequent operations. In addition, the Document contains methods for creating other nodes, such as createAttribut (), to create an Attr object. It includes the following methods:

CreateAttribute (String ):Create an Attr object with the given attribute name and place it on an Element object using the setAttributeNode method.

CreateElement (String ):Create an Element object with the given tag name, which represents a tag in the XML document. Then, you can add attributes on this Element object or perform other operations.

CreateTextNode (String ):Create a Text object with the given string. The Text object represents the plain Text string contained in the tag or attribute. If there are no other labels in a tag, the Text object represented by the tag Text is the unique sub-object of this Element object.

GetElementsByTagName (String ):Returns a NodeList object that contains all the labels of the given tag name.

GetDocumentElement ():Returns an Element object representing the root node of the DOM tree, that is, the object representing the root Element of the XML document.

1.2. Node object

It is the most basic object in the DOM structure and represents an abstract node in the document tree. In actual use, the Node object is rarely used, but the sub-objects of Node objects such as Element, Attr, and Text are used to operate the document. The Node object provides an abstract and common root for these objects. Although the method for accessing its child nodes is defined in the Node object, it is important to note that there are some Node sub-objects, such as Text objects, which do not have child nodes. The main methods of Node objects include:

AppendChild (org. w3c. dom. Node ):Add a child node to the node and put it at the end of all the child nodes. If the child node already exists, delete it and add it.

GetFirstChild ():If a subnode exists, the first subnode, the peer-to-peer, And the getLastChild () method are returned to the last subnode.

GetNextSibling ():Return the next sibling node of the node in the DOM tree. The peer and the getpreviussibling () method return the previous sibling node.

GetNodeName ():Return the node name based on the node type.

GetNodeType ():Type of the returned node.

GetNodeValue (): Return the value of the node.

HasChildNodes (): Determines whether a subnode exists.

HasAttributes ():Determine whether the node has attributes.

GetOwnerDocument ():Returns the Document Object of the node.

InsertBefore (org. w3c. dom. Node new, org. w3c. dom. Node ref ):Insert a child object before a given child object.

RemoveChild (org. w3c. dom. Node ):Deletes a given subnode object.

ReplaceChild (org. w3c. dom. Node new, org. w3c. dom. Node old ):Use a new Node object to replace the given child Node object.

1.3 NodeList object

As the name implies, a list contains one or more nodes. We can simply regard it as a Node array. We can obtain the elements in the list through the method:

GetLength ():The length of the returned list.

Item (int ):Return the Node object at the specified position.

1.4 Element Object

It represents the label element in the XML document, inherited from Node, and is also the main sub-object of Node. Tags can contain attributes, so Element objects have methods for accessing their attributes. methods defined in any Node can also be used on Element objects.

GetElementsByTagName (String ):Returns a NodeList object that contains a tag with a given tag name in its child nodes.

GetTagName ():Returns a string that represents the tag name.

GetAttribute (String ):Returns the attribute value of the given attribute name in the tag. The main thing to note here is that entity attributes should be allowed in the XML document, and this method is not applicable to these entity attributes. In this case, the getAttributeNodes () method is used to obtain an Attr object for further operations.

GetAttributeNode (String ):Returns an Attr object that represents a given property name.

1.5. Attr object

Represents the attribute in a tag. Attr inherits from Node, but because Attr is actually contained in Element, it cannot be considered as a sub-object of Element, so Attr is not part of the DOM tree in DOM, therefore, the returned values of getParentNode (), getpreviussibling (), and getNextSibling () in Node are null. That is to say, Attr is actually regarded as part of its Element object, and does not appear as a separate node in the DOM tree. This must be different from other Node sub-objects.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.