Java and XML joint programming dom

Source: Internet
Author: User
Tags xml parser xslt

In Android, WS interfaces are often called, and interfaces generally support the XML format. Therefore, it is very important to parse XML in Java, here I will write a more detailed article, as follows:

---- Article ----
Source: http://www0.ccidnet.com/tech/guide/2001/10/08/58_3393.html
Dom preliminary

Dom is short for Document Object Model, that is, Document Object Model. As mentioned above, XML organizes data into a tree, so Dom is an object for this tree. In layman's terms, by parsing XML documents, a tree model is created for the XML documents in logic, and the nodes of the tree are objects. By accessing these objects, we can access the content of XML documents.

Let's take a simple example to see how we operate an XML document in the Dom.

This is an XML document and the object we want to operate on:

<? XML version = "1.0" encoding = "UTF-8"?>
<Messages>
<Message> good-bye serialization, hello Java! </Message>
</Messages>

Next, we need to resolve the content of this document to Java objects for use by the program. With JAXP, we only need a few lines of code to achieve this. First, we need to create a parser factory to use this factory to obtain a specific parser object:

Documentbuilderfactory DBF = documentbuilderfactory. newinstance ();

The purpose of using documentbuilderfacotry here is to create a program unrelated to the specific parser. When the static method newinstance () of the documentbuilderfactory class is called, it determines which parser to use based on a system variable. Because all the Resolvers obey the interface defined by JAXP, the Code is the same no matter which parser is used. Therefore, when switching between different Resolvers, you only need to change the value of the system variable without changing any code. This is the benefit of the factory. For the specific implementation of this factory model, see the class diagram below.

Documentbuilder DB = DBF. newdocumentbuilder ();

After obtaining a factory object, you can use its static method newdocumentbuilder () to obtain a documentbuilder object, which represents a specific Dom parser. But which parser, Microsoft or IBM, is not important for the program.

Then, we can use this parser to parse the XML document:

Document Doc = dB. parse ("C:/XML/message. xml ");

The parse () method of documentbuilder accepts an XML document name as the input parameter and returns a document object, which represents the tree model of an XML document. All subsequent operations on XML documents will be irrelevant to the parser. You can directly perform operations on this document object. The specific document operation method is defined by Dom.


JAXP supports Dom 2 recommended by W3C. If you are familiar with Dom, the following content is very simple: you only need to follow the DOM specifications to call the method. Of course, if you are not clear about Dom, don't worry. We will introduce it in detail later. Here, you need to know and remember that Dom is a model used to describe the data in the XML document. All the reasons for introducing Dom are to use this model to operate the data in the XML document. The DOM specification defines nodes (objects), attributes, and methods. We can use these nodes to access XML data.

Starting from the document object obtained above, we can start our Dom journey. By using the getelementsbytagname () method of the document object, we can get a nodelist object. A Node object represents a tag element in an XML document, while a nodelist object knows its meaning based on its name, it represents a list of node objects:

Nodelist NL = Doc. getelementsbytagname ("message ");

We can use this statement to obtain a list of node objects corresponding to all <message> labels in the XML document. Then, we can use the item () method of the nodelist object to obtain each node object in the list:

Node my_node = NL. Item (0 );

After a Node object is created, the data stored in the XML file is extracted and encapsulated in the node. In this example, to extract the content in the message tag, we usually use the getnodevalue () method of the Node object:

String message = my_node.getfirstchild (). getnodevalue ();

Note that the getfirstchild () method is used to obtain the first subnode object in the message. Although there are no sub-labels or attributes except text under the message tag, we insist on using the getfirsechild () method here, which is mainly related to W3C definition of Dom. W3C defines the text part in the label as a node. Therefore, we need to obtain the node representing the text before we can use getnodevalue () to obtain the text content.

Now that we are able to extract data from an XML file, we can use the data to build an application.

In the following content, we will focus more on Dom and make a more detailed analysis for Dom, so that we can use it more easily.

Dom details

1. Basic DOM object

Dom has five basic objects: Document, node, nodelist, element, and ATTR. The following describes the functions and implementation methods of these objects.


The document object represents the entire XML document. All other nodes are included in the document object in a certain order and arranged into a tree structure, programmers can traverse this tree to get all the content of the XML document, which is also the starting point for XML document operations. We always get a document object by parsing the XML source file, and then perform subsequent operations. In addition, the document contains methods for creating other nodes, such as createattribut (), to create an ATTR object. It includes the following methods:

Createattribute (string): Create an ATTR object with the given attribute name, and place it on an element object using the setattributenode method.

Createelement (string): creates an element object with the given Tag Name, representing a tag in the XML document. Then, you can add attributes or perform other operations on this element object.

Createtextnode (string): Creates a Text object with the given string. The text object represents the plain text string contained in the tag or attribute. If there are no other labels in a tag, the text object represented by the tag text is the unique sub-object of this element object.

Getelementsbytagname (string): returns a nodelist object that contains all the tags with the given tag name.

Getdocumentelement (): returns an element object representing the root node of the DOM tree, that is, the object representing the root element of the XML document.

The Node object is the most basic object in the DOM structure and represents an abstract node in the document tree. In actual use, the Node object is rarely used, but the sub-objects of node objects such as element, ATTR, and text are used to operate the document. The Node object provides an abstract and common root for these objects. Although the method for accessing its child nodes is defined in the Node object, it is important to note that there are some node sub-objects, such as text objects, which do not have child nodes. The main methods of node objects include:

Appendchild (Org. w3C. dom. node): Add a child node to the node and put it at the end of all the child nodes. If the child node already exists, delete it and add it.

Getfirstchild (): If a node has a subnode, the first subnode is returned, and the getlastchild () method returns the last subnode.

Getnextsibling (): return the next sibling node of the node in the DOM tree. The peer-to-peer method and the getpreviussibling () method return the previous sibling node.

Getnodename (): return the node name based on the node type.

Getnodetype (): Type of the returned node.

Getnodevalue (): return the value of the node.

Haschildnodes (): determines whether a subnode exists.

Hasattributes (): determines whether the node has attributes.

Getownerdocument (): return the Document Object of the node.

Insertbefore (Org. W3C. Dom. node new, org. W3C. Dom. node ref): inserts a child object before a given child object.

Removechild (Org. W3C. Dom. node): deletes a given subnode object.

ReplaceChild (Org. W3C. Dom. node new, org. W3C. Dom. node old): replace the given subnode object with a new Node object.

The nodelist object, as its name implies, represents a list containing one or more nodes. We can simply regard it as a node array. We can obtain the elements in the list through the method:

Getlength (): the length of the returned list.

Item (INT): return the Node object at the specified position.

The element object represents the label element in the XML document. It inherits from node and is also the primary sub-object of node. Tags can contain attributes, so element objects have methods for accessing their attributes. methods defined in any node can also be used on element objects.

Getelementsbytagname (string): returns a nodelist object that contains tags with the given tag name in its child nodes.

Gettagname (): returns a string that represents the tag name.

Getattribute (string): return the value of the attribute of the given attribute name in the tag. The main thing to note here is that entity attributes should be allowed in the XML document, and this method is not applicable to these entity attributes. In this case, the getattributenodes () method is used to obtain an ATTR object for further operations.

Getattributenode (string): returns an ATTR object that represents a given attribute name.

The ATTR object represents the attributes in a tag. ATTR inherits from node, but because ATTR is actually contained in element, it cannot be considered as a sub-object of element, so ATTR is not part of the DOM tree in Dom, therefore, the returned values of getparentnode (), getpreviussibling (), and getnextsibling () in node are null. That is to say, ATTR is actually regarded as part of its element object, and does not appear as a separate node in the DOM tree. This must be different from other node sub-objects.

It should be noted that the above mentioned DOM objects are defined by interfaces in the Dom and are defined by IDL languages unrelated to specific languages. Therefore, Dom can be implemented in any object-oriented language, as long as it implements the interfaces and functions defined by Dom. At the same time, some methods are not defined in the Dom and are expressed by IDL attributes. When mapped to a specific language, these attributes are mapped to corresponding methods.

2. Dom instance

With the above introduction, I believe you have a better understanding of Dom. The example below will make you more familiar with Dom.

Let's talk about what this example is to do first. We hope to create a project named Link. some URLs are saved in the XML file. Through a simple program, we can read and display these URLs Through Dom, you can also write the added URL to the XML file. It is very simple, but very practical. It is enough to demonstrate the vast majority of Dom usage.

XML files are not complex, so their DTD is not given. Link. xml:

<? XML version = "1.0" standalone = "yes"?>
<Links>
<Link>
<Text> JSP insider </text>
<URL newwindow = "no"> http://www.jspinsider.com </URL>
<Author> JSP insider </author>
<Date>
<Day> 2 </day>
<Month> 1 </month>
<Year> 2001 </year>
</Date>
<Description> a JSP information site. </description>
</Link>
<Link>
<Text> the makers of Java </text>
<URL newwindow = "no"> http://java.sun.com </URL>
<Author> Sun Microsystems </author>
<Date>
<Day> 3 </day>
<Month> 1 </month>
<Year> 2001 </year>
</Date>
<Description> Sun Microsystem's website. </description>
</Link>
<Link>
<Text> the standard JSP Container </text>
<URL newwindow = "no"> http://jakarta.apache.org </URL>
<Author> Apache group </author>
<Date>
<Day> 4 </day>
<Month> 1 </month>
<Year> 2001 </year>
</Date>
<Description> some great software. </description>
</Link>
</Links>

The first program is called xmldisplay. java. The specific program list can be found in the attachment. The main function is to read the content of each node in the XML file, and then format and output it on system. Out. Let's take a look at this program:

Import javax. xml. parsers .*;
Import org. W3C. Dom .*;

This is to introduce necessary classes, because here we use the XML Parser provided by Sun, so we need to introduce Java. XML. the parsers package contains the specific implementation of the DOM parser and the SAX Parser. The Org. W3C. Dom package defines the DOM interface developed by W3C.

Documentbuilderfactory factory = documentbuilderfactory. newinstance ();
Documentbuilder builder = factory. newdocumentbuilder ();
Document Doc = builder. parse ("links. xml ");
Doc. normalize ();

In addition to the above, there is also a small trick to call normalize () for the document object, you can remove the unnecessary text Node object mapped to the DOM tree as the blank content of the formatted content in the XML document. Otherwise, the DOM tree you get may not be as you think. This normalize () is more useful especially in output.

Nodelist links = Doc. getelementsbytagname ("Link ");

As mentioned earlier, the blank characters in the XML document will also be mapped to the DOM tree as objects. Therefore, the getchildnodes method that directly calls the node method sometimes has some problems, and sometimes it cannot return the expected nodelist object. The solution is to use getelementbytagname (string) of element, and the returned nodelise is the expected object. Then, you can use the item () method to extract the desired element.

For (INT I = 0; I <links. getlength (); I ++ ){
Element link = (element) links. item (I );
System. Out. Print ("content :");
System. Out. println (link. getelementsbytagname ("text"). Item (0). getfirstchild (). getnodevalue ());
System. Out. Print ("url :");
System. Out. println (link. getelementsbytagname ("url"). Item (0). getfirstchild (). getnodevalue ());
System. Out. Print ("Author :");
System. Out. println (link. getelementsbytagname ("author"). Item (0). getfirstchild (). getnodevalue ());
System. Out. Print ("Date :");
Element linkdate = (element) Link. getelementsbytagname ("date"). Item (0 );
String day = linkdate. getelementsbytagname ("day"). Item (0). getfirstchild (). getnodevalue ();
String month = linkdate. getelementsbytagname ("month"). Item (0). getfirstchild (). getnodevalue ();
String year = linkdate. getelementsbytagname ("year"). Item (0). getfirstchild (). getnodevalue ();
System. Out. println (day + "-" + month + "-" + year );
System. Out. Print ("Description :");
System. Out. println (link. getelementsbytagname ("Description"). Item (0). getfirstchild (). getnodevalue ());
System. Out. println ();
}

The code snippet above completes the formatting and output of the XML document content. You only need to pay attention to some details, such as the use of the getfirstchile () method and the getelementsbytagname () method.

The following content is the problem of re-writing to the XML document after the DOM tree is modified. The program name is xmlwrite. java. In jaxp1.0, there are no direct classes and methods to deal with the writing of XML documents. You need to use some helper classes in other packages. In jaxp1.1, the support for XSLT is introduced. The so-called XSLT is a new document structure after the XML document is transformed (translation. With this new function, we can easily write the newly generated or modified DOM tree back to the XML file. Let's take a look at the implementation of the Code, the main function of this Code is to links. add a new link node to the XML file:

Import javax. xml. parsers .*;
Import javax. xml. Transform .*;
Import javax. xml. Transform. Dom. domsource;
Import javax. xml. Transform. Stream. streamresult;
Import org. W3C. Dom .*;

Several Classes in the newly introduced java. xml. Transform package are used to process XSLT transformations.

We want to add a new link node to the preceding XML file. Therefore, we need to read the link node first. XML file, construct a DOM tree, modify the DOM tree (add nodes), and write the modified Dom back to links. XML file:

Documentbuilderfactory factory = documentbuilderfactory. newinstance ();
Documentbuilder builder = factory. newdocumentbuilder ();
Document Doc = builder. parse ("links. xml ");
Doc. normalize ();
// --- Get the variable ----
String text = "Hanzhong's Homepage ";
String url = "www.hzliu.com ";
String author = "hzliu Liu ";
String discription = "A site from Hanzhong Liu, give u lots of Suprise !!! ";

In order to see the key points and simplify the program, we hard encode the content to be added to the memory String object. In actual operations, we often use an interface to extract user input, or extract the desired content from the database through JDBC.

Text textseg;
Element link = Doc. createelement ("Link ");

First, it should be clear that no matter what type of node, text, ATTR, or element, they are created through createxxx () in the Document Object () (xxx indicates the type to be created). Therefore, to add a link project to the XML document, you must first create a link object:

Element linktext = Doc. createelement ("text ");
Textseg = Doc. createtextnode (text );
Linktext. appendchild (textseg );
Link. appendchild (linktext );
Element linkurl = Doc. createelement ("url ");
Textseg = Doc. createtextnode (URL );
Linkurl. appendchild (textseg );
Link. appendchild (linkurl );
Element linkauthor = Doc. createelement ("author ");
Textseg = Doc. createtextnode (author );
Linkauthor. appendchild (textseg );
Link. appendchild (linkauthor );
Java. util. Calendar rightnow = java. util. Calendar. getinstance ();
String day = integer. tostring (rightnow. Get (Java. util. Calendar. day_of_month ));
String month = integer. tostring (rightnow. Get (Java. util. Calendar. month ));
String year = integer. tostring (rightnow. Get (Java. util. Calendar. Year ));
Element linkdate = Doc. createelement ("date ");
Element linkdateday = Doc. createelement ("day ");
Textseg = Doc. createtextnode (day );
Linkdateday. appendchild (textseg );
Element linkdatemonth = Doc. createelement ("month ");
Textseg = Doc. createtextnode (month );
Linkdatemonth. appendchild (textseg );
Element linkdateyear = Doc. createelement ("year ");
Textseg = Doc. createtextnode (year );
Linkdateyear. appendchild (textseg );
Linkdate. appendchild (linkdateday );
Linkdate. appendchild (linkdatemonth );
Linkdate. appendchild (linkdateyear );
Link. appendchild (linkdate );
Element linkdiscription = Doc. createelement ("Description ");
Textseg = Doc. createtextnode (discription );
Linkdiscription. appendchild (textseg );
Link. appendchild (linkdiscription );

The process of creating a node may be a bit the same, but note that the text contained in the element (in the Dom, these text also represents a node, therefore, you must also create the corresponding node for them. You cannot directly set the content of the text using the setnodevalue () method of the element object. Instead, you must use the setnodevalue () of the created Text object () to add the created element and its text content to the DOM tree. Look at the previous code to better understand this:

Doc. getdocumentelement (). appendchild (Link );

Finally, do not forget to add the created nodes to the DOM tree. The getdocumentelement () method of the document class, which returns the element object representing the document root node. In XML documents, the root node must be unique.

Transformerfactory tfactory = transformerfactory. newinstance ();
Transformer transformer = tfactory. newtransformer ();
Domsource source = new domsource (DOC );
Streamresult result = new streamresult (New java. Io. File ("links. xml "));
Transformer. Transform (source, result );

Then the DOM tree is output using XSLT. Here, transformerfactory also applies the factory mode so that the specific code is irrelevant to the specific converter. The implementation method is the same as documentbuilderfactory. The transfrom method of the transformer class accepts two parameters: a data source and an output target result. Here we use domsource and streamresult respectively, so that we can output the DOM content to an output stream. When the output stream is a file, dom content is written to the file.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.