Java xml parsing learning notes (1) -- DOM

Last Update:2014-03-27 Source: Internet

Author: User

Tags cdata

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The following is a simple example of how to parse xml4 methods and their differences. The content comes from the internet. I just want to sort it out.

First introduce xml

XML document Node Type

U document)

U element)

U attribute)

U text (PCDATA -- parsed character data)

U comment)

U DOCTYPE: verifies the correctness of the document content.

U entity)

U CDATA (character data)

XML syntax

1. Statement:

2. root node: A root node is required.

3. Tag: The tag must end with a case sensitivity, and the tag must be nested sequentially.

4. attribute: The value must be enclosed in quotation marks.

5. spaces are retained, and up to one HTML space is retained.

6. Naming rules: the name must be known

A) The name can contain letters, numbers, and other characters.

B) The name cannot start with a number or punctuation.

C) The name cannot start with the character "xml" (or XML or Xml ).

7. The name cannot contain spaces.

8. You should not use ":" in the XML element name because it is used for reserved words in namespaces.

9. Labels take precedence over attributes.

10. The XML namespace provides methods to avoid element naming conflicts.

11. CDATA: character data,字符数据, Character data is not escaped

12. entity: & entity;

Put a simple xml file in the src directory.

First. xml

 
 
  
   
    
     
Lu B1234
    Qingdao, Shandong Province
   
   
    
     
Lu A1234
    Jinan City, Shandong Province

1) DOM (JAXP Crimson parser)
DOM is the official W3C standard for XML documents in a way unrelated to the platform and language. DOM is a collection of nodes or information fragments organized by hierarchies. This hierarchy allows developers to search for specific information in the tree. To analyze this structure, you usually need to load the entire document and construct a hierarchy before you can do any work. Because it is based on information layers, DOM is considered to be tree-based or object-based. DOM and tree-based processing in the broad sense have several advantages. First, because the tree is persistent in the memory, you can modify it so that the application can change the data and structure. It can also navigate up and down the tree at any time, rather than one-time processing like SAX. DOM is much easier to use.

Package Test; import java. io. file; import javax. xml. parsers. documentBuilder; import javax. xml. parsers. documentBuilderFactory; import org. w3c. dom. document; import org. w3c. dom. nodeList; public class MyXmlReader {public static void main (String [] args) {try {File f = new File ("src/first. xml "); DocumentBuilderFactory factory = DocumentBuilderFactory. newInstance (); // 1. Create the DocumentBuilder factory DocumentBuilder = factory. newDocumentBuilder (); // 2. Create DocumentBuilder Document doc = builder. parse (f); // 3. parse the xml file into a Document object, representing the DOM tree NodeList nl = doc. getElementsByTagName ("value"); for (int I = 0; I
 
  DOM has five basic objects: Document, Node, NodeList, Element, and Attr. Here we will introduce them one by one:
  1.1 Document Object
  Represent the entire XML Document. All other nodes are included in the Document object in a certain order and arranged into a tree structure, programmers can traverse this tree to get all the content of the XML document, which is also the starting point for XML document operations. We always get a Document object by parsing the XML source file, and then perform subsequent operations. In addition, the Document contains methods for creating other nodes, such as createAttribut (), to create an Attr object. It includes the following methods:
  CreateAttribute (String ):Create an Attr object with the given attribute name and place it on an Element object using the setAttributeNode method.
  CreateElement (String ):Create an Element object with the given tag name, which represents a tag in the XML document. Then, you can add attributes on this Element object or perform other operations.
  CreateTextNode (String ):Create a Text object with the given string. The Text object represents the plain Text string contained in the tag or attribute. If there are no other labels in a tag, the Text object represented by the tag Text is the unique sub-object of this Element object.
  GetElementsByTagName (String ):Returns a NodeList object that contains all the labels of the given tag name.
  GetDocumentElement ():Returns an Element object representing the root node of the DOM tree, that is, the object representing the root Element of the XML document.
  1.2. Node object
  It is the most basic object in the DOM structure and represents an abstract node in the document tree. In actual use, the Node object is rarely used, but the sub-objects of Node objects such as Element, Attr, and Text are used to operate the document. The Node object provides an abstract and common root for these objects. Although the method for accessing its child nodes is defined in the Node object, it is important to note that there are some Node sub-objects, such as Text objects, which do not have child nodes. The main methods of Node objects include:
  AppendChild (org. w3c. dom. Node ):Add a child node to the node and put it at the end of all the child nodes. If the child node already exists, delete it and add it.
  GetFirstChild ():If a subnode exists, the first subnode, the peer-to-peer, And the getLastChild () method are returned to the last subnode.
  GetNextSibling ():Return the next sibling node of the node in the DOM tree. The peer and the getpreviussibling () method return the previous sibling node.
  GetNodeName ():Return the node name based on the node type.
  GetNodeType ():Type of the returned node.
  GetNodeValue (): Return the value of the node.
  HasChildNodes (): Determines whether a subnode exists.
  HasAttributes ():Determine whether the node has attributes.
  GetOwnerDocument ():Returns the Document Object of the node.
  InsertBefore (org. w3c. dom. Node new, org. w3c. dom. Node ref ):Insert a child object before a given child object.
  RemoveChild (org. w3c. dom. Node ):Deletes a given subnode object.
  ReplaceChild (org. w3c. dom. Node new, org. w3c. dom. Node old ):Use a new Node object to replace the given child Node object.
  1.3 NodeList object
  As the name implies, a list contains one or more nodes. We can simply regard it as a Node array. We can obtain the elements in the list through the method:
  GetLength ():The length of the returned list.
  Item (int ):Return the Node object at the specified position.
  1.4 Element Object
  It represents the label element in the XML document, inherited from Node, and is also the main sub-object of Node. Tags can contain attributes, so Element objects have methods for accessing their attributes. methods defined in any Node can also be used on Element objects.
  GetElementsByTagName (String ):Returns a NodeList object that contains a tag with a given tag name in its child nodes.
  GetTagName ():Returns a string that represents the tag name.
  GetAttribute (String ):Returns the attribute value of the given attribute name in the tag. The main thing to note here is that entity attributes should be allowed in the XML document, and this method is not applicable to these entity attributes. In this case, the getAttributeNodes () method is used to obtain an Attr object for further operations.
  GetAttributeNode (String ):Returns an Attr object that represents a given property name.
  1.5. Attr object
  Represents the attribute in a tag. Attr inherits from Node, but because Attr is actually contained in Element, it cannot be considered as a sub-object of Element, so Attr is not part of the DOM tree in DOM, therefore, the returned values of getParentNode (), getpreviussibling (), and getNextSibling () in Node are null. That is to say, Attr is actually regarded as part of its Element object, and does not appear as a separate node in the DOM tree. This must be different from other Node sub-objects.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More