Java and XML joint programming DOM

Source: Internet
Author: User
Tags abstract date contains integer interface object model variable tostring
Dom|xml| Programming Dom Preliminary

The DOM is the acronym for Document Object Model, which is the documentation objects module. As I said earlier, XML organizes data into a tree, so DOM is a description of the object of the tree. In layman's parlance, it is to create a tree model logically for XML documents by parsing XML documents, and the nodes of the tree are objects. By accessing these objects, we are able to access the contents of the XML document.

Let's look at a simple example of how we can manipulate an XML document in the DOM.

This is an XML document and is the object we want to manipulate:

<?xml version= "1.0" encoding= "UTF-8"? ><messages><message>good-bye serialization, Hello Java!</ Message></messages>


Next, we need to parse the contents of this document into a single Java object for the program to use, and with JAXP we can do this with just a few lines of code. First, we need to build a parser factory to use this factory to get a specific parser object:

Documentbuilderfactory dbf = Documentbuilderfactory.newinstance ();


The purpose of using documentbuilderfacotry here is to create a program that is not specific to a parser, when the static method of the Documentbuilderfactory class newinstance () is invoked, It determines which parser to use depending on a system variable. And because all parsers are subject to the interfaces defined by JAXP, the code is the same regardless of which parser is used. So when you switch between different parsers, you just need to change the value of the system variable without changing any of the code. This is the benefit that the factory brings. The specific implementation of this factory pattern can be reviewed in the class diagram below.

Documentbuilder db = Dbf.newdocumentbuilder ();


When a factory object is obtained, the static method Newdocumentbuilder () method is used to obtain a Documentbuilder object that represents the specific DOM parser. But what kind of parser, Microsoft or IBM, is not important for the program.

We can then use this parser to parse the XML document:

Document doc = Db.parse ("C:/xml/message.xml");


Documentbuilder's Parse () method takes an XML document name as an input parameter and returns a Document object that represents the tree model of an XML document. All future operations on XML documents have nothing to do with the parser, and you can do it directly on the document object. The specific approach to document manipulation is defined by the DOM.


JAXP supports DOM 2 recommended by the consortium. If you are familiar with DOM, the following is simple: you just need to follow the DOM specification to make a method call. Of course, if you are not clear about the DOM, there is no need to worry, we will have a detailed introduction. What you need to know and remember here is that DOM is a model for describing the data in an XML document, and the whole reason for introducing DOM is to use this model to manipulate the data in an XML document. The DOM specification has nodes (i.e. objects), attributes, and methods that we access to XML data through the access of these nodes.

Starting with the Document object above, we can start our Dom tour. Using the getElementsByTagName () method of the Document object, we get a NodeList object, a Node object that represents a LABEL element in an XML document, and the NodeList object, known by its name, Represents a list of node objects:

NodeList nl = doc.getelementsbytagname ("message");


What we get through such a statement is a list of node objects that correspond to all <message> tags in an XML document. We can then use the item () method of the NodeList object to get each node object in the list:

Node My_node = nl.item (0);


When a node object is created, the data stored in the XML document is extracted and encapsulated in the node. In this example, to extract the contents of the message label, we typically use the Getnodevalue () method of the Node object:

String message = My_node.getfirstchild (). Getnodevalue ();

Note that a getfirstchild () method is also used here to get the first child node object under the message. Although there are no other child tags or attributes underneath the message label, we insist on using the Getfirsechild () method here, which is primarily related to the definition of DOM by the consortium. The consortium defines the text part of the label as a node, so we get to the node that represents the text before we can use Getnodevalue () to get the content of the text.

Now that we've been able to extract the data from the XML file, we can use the data in the right place to build the application.

Below, we'll focus more on the DOM and make a more detailed parsing of DOM, making it easier to use.

Dom detailed
1. Basic DOM Objects

There are 5 basic objects for DOM: Document,node,nodelist,element and attr. The following is a general introduction to the functionality and implementation methods of these objects.

The Document object represents the entire XML file, and all other node is included in a certain order within the documents object, arranged in a tree-like structure that allows the programmer to traverse the tree to get all the content of the XML document, which is also the starting point for the operation of the XML document. We always get a document object by parsing the XML source file before we do the next action. In addition, document contains methods for creating other nodes, such as createattribut () to create a attr object. The main methods it contains are:

CreateAttribute (String): Creates a attr object with the given property name and can then be placed on an element object using the Setattributenode method.

CreateElement (String): Creates an element object with the given label name, represents a label in the XML document, and can then add attributes or other actions on the element object.

createTextNode (String): Creates a text object with the given string that represents the plain text string contained in the label or property. If there is no other label within a label, the text object in the label represents the only child of the element object.

getElementsByTagName (String): Returns a NodeList object that contains the label for all given label names.

Getdocumentelement (): Returns an Element object that represents the root node of the DOM tree, which is the object that represents the root element of the XML document.

The node object is the most basic object in the DOM structure, representing an abstract node in the document tree. In practice, it is very rare to actually use the node object, but rather to manipulate the document with child objects such as element, Attr, text, and so on. The node object provides an abstract, public root for these objects. Although the method of accessing its child nodes is defined in the node object, there are some node child objects, such as the text object, which do not have child nodes, which is to be noted. The main methods that the Node object contains are:

AppendChild (Org.w3c.dom.Node): Add a child node to this node and place it at the end of all child nodes, if the child node already exists, then delete it and add it.

Getfirstchild (): If a node has a child node, it returns the first child node, the peer, and the Getlastchild () method returns the last child node.

Getnextsibling (): Returns the next sibling node of this node in the DOM tree, and the Getprevioussibling () method returns its former sibling node.

Getnodename (): Returns the name of a node based on the type of node.

Getnodetype (): Returns the type of the node.

Getnodevalue (): Returns the value of a node.

HasChildNodes (): Determine if there are child nodes.

HasAttributes (): Determines whether the node exists with attributes.

Getownerdocument (): Returns the Document object where the node is.

InsertBefore (Org.w3c.dom.Node new,org.w3c.dom.node ref): Inserts a child object before a given child object.

RemoveChild (Org.w3c.dom.Node): Deletes the given child node object.

ReplaceChild (Org.w3c.dom.Node new,org.w3c.dom.node old): Replaces a given subnode object with a new node object.

The NodeList object, as its name suggests, represents a list that contains one or more node. You can simply think of it as an array of node, and we can use the method to get the elements in the list:

GetLength (): Returns the length of the list.

Item (int): Returns the Node object for the specified position.

An Element object represents a LABEL element in an XML document, inherits from node, and is the primary child of node. You can include attributes in a label so that the element object has methods to access its properties, and any method defined in node can also be used on the element object.

getElementsByTagName (String): Returns a NodeList object that contains a label with the given label name in the Descendants node below it.

Gettagname (): Returns a String representing the name of the label.

GetAttribute (String): Returns the value of the property for the given property name in the label. The main requirement here is that entity attributes should be allowed in the XML document, and this method does not apply to these entity attributes. The Getattributenodes () method is needed to get a attr object for further action.

GetAttributeNode (String): Returns a Attr object that represents the given property name.

The Attr object represents a property in a label. Attr inherits from node, but because attr is actually contained in an element, it cannot be considered a child of an element, so attr in the DOM is not part of the DOM tree, so the Getparentnode () in node, Both getprevioussibling () and getnextsibling () will return null. In other words, attr is actually considered part of the element object that contains it, and it does not appear as a separate node in the DOM tree. This is distinguished from other node child objects when used.

It should be noted that the DOM objects described above are defined by interfaces in the DOM, and are defined using an IDL language independent of the specific language. As a result, Dom can actually be implemented in any object-oriented language, as long as it implements the interfaces and functions defined by the DOM. At the same time, some methods are not defined in the DOM and are expressed using IDL attributes, which are mapped to the appropriate method when mapped to a specific language.

2. Dom Instance

With the above introduction, I believe you have more understanding of DOM. The following example will make you more familiar with Dom.

Let's talk about what this example is going to do, we want to save some URL addresses in a file named Link.xml, with a simple program where we can read and display these URLs through the DOM, or in turn write the added URL address to the XML file. It's simple, but it's practical, and it's enough to sample the most usage of DOM.

The XML file itself is not complex and does not give it a DTD. Link.xml:


<?xml version= "1.0" standalone= "yes"? ><links><link><text>jsp Insider</text><url Newwindow= "No" >http://www.jspinsider.com</url><author>jsp insider</author><date>< Day>2</day><month>1</month><year>2001</year></date><description>a JSP information site.</description></link><link><text>the makers of Java</text>< URL newwindow= "No" >http://java.sun.com</url><author>sun microsystems</author><date> <day>3</day><month>1</month><year>2001</year></date><description >sun Microsystem ' s website.</description></link><link><text>the standard JSP container </text><url newwindow= "No" >http://jakarta.apache.org</url><author>apache Group</ author><date><day>4</day><month>1</month><year>2001</year></ date&Gt;<description>some Great software.</description></link></links>

The first program we call Xmldisplay.java, the detailed list of programs can be found in the attachment. The main function is to read the contents of each node in this XML file, and then on the formatted output on the System.out, let's take a look at this program:

Import Javax.xml.parsers.*;import org.w3c.dom.*;

This is the introduction of the necessary class, because it uses the XML parser provided by Sun, which requires the introduction of the Java.xml.parsers package, which contains the concrete implementations of the DOM parser and the SAX parser. The Org.w3c.dom package defines the DOM interface established by the consortium.

Documentbuilderfactory factory = documentbuilderfactory.newinstance ();D Ocumentbuilder builder= Factory.newdocumentbuilder ();D ocument doc=builder.parse ("Links.xml");d oc.normalize ();

In addition to the above, there is a small trick to call normalize () on the Document object, which removes the unnecessary text node objects in the XML document that are mapped in the DOM tree as whitespace from the formatted content. Otherwise you may get a DOM tree that is not what you think it is. Especially in the output, this normalize () is more useful.

NodeList links =doc.getelementsbytagname ("link");

Just now, whitespace in an XML document is also mapped as an object in the DOM tree. Thus, the Getchildnodes method that calls the node method directly sometimes has problems and sometimes does not return the desired NodeList object. The solution is to use element Getelementbytagname (String), and the returned nodelise is the object of expectation. You can then use the item () method to extract the desired element.

for (int i=0;i<links.getlength (); i++) {element link= (Element) Links.item (i); System.out.print ("Content:"); System.out.println (Link.getelementsbytagname ("text"). Item (0). Getfirstchild (). Getnodevalue ()); System.out.print ("URL:"); System.out.println (link.getelementsbytagname ("url"). Item (0). Getfirstchild (). Getnodevalue ()); System.out.print ("Author:"); System.out.println (Link.getelementsbytagname ("author"). Item (0). Getfirstchild (). Getnodevalue ()); System.out.print ("Date:"); Element linkdate= (Element) link.getelementsbytagname ("date"). Item (0); String day=linkdate.getelementsbytagname ("Day"). Item (0). Getfirstchild (). Getnodevalue (); String Month=linkdate.getelementsbytagname ("month"). Item (0). Getfirstchild (). Getnodevalue (); String Year=linkdate.getelementsbytagname ("year"). Item (0). Getfirstchild (). Getnodevalue (); System.out.println (day+ "-" +month+ "-" +year); System.out.print ("Description:"); System.out.println (Link.getelementsbytagname ("description"). Item (0). Getfirstchild (). Getnodevalue ()); System.out.println ();}

The code snippet above completes the formatted output of the XML document content. As long as you notice the details of the problem, such as the Getfirstchile () method and the use of the getElementsByTagName () method, these are relatively easy.

The following is the question of writing back to the XML document after modifying the DOM tree. This program is named Xmlwrite.java. In the JAXP1.0 version, there are no direct classes and methods that can handle writing problems with XML documents, and need to use some of the auxiliary classes in other packages. In the JAXP1.1 version, the introduction of the support for XSLT, the so-called XSLT, is the transformation of XML documents (translation), a new document structure. With this new addition, we can easily write the newly generated or modified DOM tree back to the XML file, let's look at the implementation of the Code, the main function of which is to add a new link node to the Links.xml file:

Import Javax.xml.parsers.*;import Javax.xml.transform.*;import Javax.xml.transform.dom.domsource;import Javax.xml.transform.stream.streamresult;import org.w3c.dom.*;

Several classes in the newly introduced Java.xml.transform package are used to handle XSLT transformations.

We want to add a new link node to the XML file above, so we'll start by reading the Links.xml file, building a DOM tree, then modifying the DOM tree (adding nodes), and finally writing the modified Dom back into the Links.xml file:

Documentbuilderfactory factory = documentbuilderfactory.newinstance ();D Ocumentbuilder builder= Factory.newdocumentbuilder ();D ocument doc=builder.parse ("Links.xml");d oc.normalize ();//---Get variable----String text= " Hanzhong ' s homepage '; String url= "www.hzliu.com"; String author= "Hzliu Liu"; String discription= "A site from Hanzhong Liu, give u lots of suprise!!!";

To see the point and simplify the program, we hard-code the content to be added to the memory string object, and in practice, an interface is often used to extract user input, or to extract the desired content from the database through JDBC.

Text textseg; Element link=doc.createelement ("link");

The first thing to understand is that no matter what type of node,text, attr or element, they are created by the Createxxx () method in the Document object (XXX represents the type to be created), so To add a link project to an XML document, we first create a link object:

Element linktext=doc.createelement ("text"); Textseg=doc.createtextnode (text); Linktext.appendchild (TEXTSEG); Link.appendchild (Linktext); Element linkurl=doc.createelement ("url"); Textseg=doc.createtextnode (URL); linkurl.appendchild (TEXTSEG); Link.appendchild (Linkurl); Element linkauthor=doc.createelement ("author"); Textseg=doc.createtextnode (author); Linkauthor.appendchild ( TEXTSEG); Link.appendchild (Linkauthor); Java.util.Calendar rightnow = Java.util.Calendar.getInstance (); String day=integer.tostring (Rightnow.get (Java.util.Calendar.DAY_OF_MONTH)); String month=integer.tostring (Rightnow.get (Java.util.Calendar.MONTH)); String year=integer.tostring (Rightnow.get (Java.util.Calendar.YEAR)); Element linkdate=doc.createelement ("date"); Element linkdateday=doc.createelement ("Day"), Textseg=doc.createtextnode (Day), Linkdateday.appendchild (TEXTSEG); Element linkdatemonth=doc.createelement ("month"); Textseg=doc.createtextnode (month); Linkdatemonth.appendchild ( TEXTSEG); Element linkdateyear=doc.createelement ("Year"); TeXtseg=doc.createtextnode (year); Linkdateyear.appendchild (textseg); Linkdate.appendchild (Linkdateday); Linkdate.appendchild (Linkdatemonth); Linkdate.appendchild (linkdateyear); Link.appendchild (linkdate); Element linkdiscription=doc.createelement ("description"); Textseg=doc.createtextnode (discription); Linkdiscription.appendchild (TEXTSEG); Link.appendchild (linkdiscription);

The process of creating a node may be a bit of a uniform one, but it should be noted that the text contained in the element (in the DOM, which also represents a node, and therefore must also create the appropriate node for them), You cannot set the contents of these text directly using the setNodeValue () method of the Element object, but you need to set the text with the setNodeValue () method of the text object you created so that you can add the created element and its text content to the DOM tree. Look at the previous code and you will better understand this:

Doc.getdocumentelement (). appendchild (link);

Finally, don't forget to add the created node to the DOM tree. The Getdocumentelement () method of the document class, which returns an Element object that represents the root node of documents. In an XML document, the root node must be unique.

Transformerfactory tfactory =transformerfactory.newinstance (); Transformer Transformer = Tfactory.newtransformer ();D omsource Source = new Domsource (DOC); Streamresult result = new Streamresult (New Java.io.File ("Links.xml")); Transformer.transform (source, result);

The DOM tree is then exported using XSLT. The Transformerfactory also applies the factory pattern, making the specific code irrelevant to the specific converter. The method of implementation is the same as documentbuilderfactory, here is not to repeat. The Transfrom method of the transformer class accepts two parameters, one data source and one output target result. The Domsource and Streamresult are used separately, so that the contents of the DOM can be exported to an output stream, and when the output stream is a file, the contents of the DOM are written to the file.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.