The use of the Xml:java document model in Java

Source: Internet
Author: User
Tags add format exception handling file size string version tostring trim
XML in the first article in this series, I studied the performance of some of the main XML document models written in Java. However, performance is only part of the problem when you start to select this type of technology. Ease of use is at least as important, and it is already a major reason to support the use of Java-specific models rather than language-independent DOM.

To actually understand which models really work, you need to know how they rank in terms of usability. In this article, I'm going to try this work, starting with the sample code, to show how to encode common types of operations in each model. The results are summarized to conclude this article, and some other factors are proposed to promote a representation that is easier to use than another.

Refer to the previous article (see Resources or the handy link under "content") to get background information for each model used in this comparison, including the actual version number. You can also refer to the Resources section for source download, links to the Model home page, and other relevant information.

Code comparison
In these comparisons of usage techniques for different document representations, I'll show you how to implement three basic operations in each model:

Build a document from an input stream
Iterate over elements and content, and make some changes:
Removes leading and trailing white space from the text content.
If the result text content is empty, delete it.
Otherwise, it wraps to a new element named "text" in the namespace of the parent element.
Write the modified document to the output stream

The code for these examples is based on the benchmark program I used in the previous article and has been simplified. The focus of the benchmark program is to show the best performance for each model; For this article, I'll try to show the easiest way to implement the operation in each model.

I've structured the example for each model into two separate pieces of code. The first paragraph is the code that reads the document, invokes the modified code, and writes the modified document. The second paragraph is a recursive method that truly traverses the presentation of the document and executes the modification. To avoid distraction, I've ignored exception handling in my code.

You can link to the download page from the resources section at the bottom of this page to get the complete code for all samples. The download version of the sample includes a test driver and some added code to check the operations of different models by calculating the number of elements, deletions, and additions.

Even if you don't want to use the DOM implementation, it's worth browsing for a description of DOM usage. Because the DOM example is the first example, I use it to explore more detailed information about some of the problems and structures of the example compared to the model that follows. Browsing through these can add to the details you want to know, and if you read one of the other models directly, you'll miss the details.

Dom
The DOM specification covers all types of operations represented by the document, but it does not involve issues such as parsing the document and generating text output. Includes two DOM implementations in performance testing, Xerces and crimson, which use different techniques for these operations. Listing 1 shows a form of Xerces's top-level code.

Listing 1. Xerces DOM top-level code
1//Parse the document from input stream (' in ')
2 Domparser parser = new Domparser ();
3 Parser.setfeature ("Http://xml.org/sax/features/namespaces", true);
4 Parser.parse (New InputSource (in));
5 Document doc = Parser.getdocument ();

6//Recursively walk and modify document
7 modifyelement (Doc.getdocumentelement ());

8//write the document to output stream ("Out")
9 OutputFormat format = new OutputFormat (DOC);
XMLSerializer serializer = new XMLSerializer (out, format);
One serializer.serialize (Doc.getdocumentelement ());



As I noted in the note, the first block of code (第1-5 line) in Listing 1 handles parsing of the input stream to construct the document representation. Xerces defines the Domparser class to build a document from the output of the Xerces parser. The InputSource class is part of the SAX specification and adapts to any one of several input forms used by the SAX parser. With a single call for actual parsing and document construction, the application can retrieve and use the constructed document if the operation is successfully completed.

The second code block (第6-7 line) simply passes the root element of the document to the recursive modification method that I am going to talk about immediately. The code is essentially the same as the code for all the document models in this article, so I'll skip it in the rest of the examples and no more discussion.

The third code block (第8-11 line) handles writing the document as text to the output stream. Here, the OutputFormat class wraps the document and offers several options for formatting the generated text. The XMLSerializer class handles the actual generation of output text.

The Xerces modify method uses only the standard DOM interface, so it is also compatible with any other DOM implementations. Listing 2 shows the code.

Listing 2. DOM Modify method
1 protected void Modifyelement (element Element) {

2//Loop through child nodes
3 Node child;
4 Node Next = (node) element.getfirstchild ();
5 while (child = next!= null) {

6//Set next before we change anything
7 next = child.getnextsibling ();

8//Handle Child by node type
9 if (child.getnodetype () = = Node.text_node) {

/trim whitespace from content text
One String trimmed = child.getnodevalue (). Trim ();
if (trimmed.length () = = 0) {

//Delete child if nothing but whitespace
Element.removechild (child);

} else {

//Create a "text" element matching parent namespace
Document doc = Element.getownerdocument ();
A String prefix = Element.getprefix ();
String name = (prefix = null)? "Text": (prefix + ": text");
Element Text =
Doc.createelementns (Element.getnamespaceuri (), name);

//wrap the trimmed content with new element
Text.appendchild (Doc.createtextnode (trimmed));
Element.replacechild (text, child);

25}
/Else if (child.getnodetype () = Node.element_node) {

//Handle child elements and recursive call
Modifyelement ((Element) child);

29}
30}
31}



The method shown in Listing 2 uses the same basic method that all documents represent. Calling it through an element, it iterates through the child elements of that element in turn. If you find a text content child element, either delete the text (if it is just a space), or wrap the text with a new element named "text" in the same namespace as the containing element (if you have a character that is not a space). If a child element is found, then the method uses the child element to recursively call itself.

For the DOM implementation, I use a pair of references: child and next to track where I am in the sorted list of children. The reference to the next child node (line 7th) is loaded before any other processing is done on the current child node. Doing so allows me to delete or replace the current child nodes without losing my traces in the list.

When I create a new element to wrap the 第16-24 text content (line), the DOM interface starts to get a bit messy. The method used to create the element is associated with the document and becomes a whole, so I need to retrieve the element (line 17th) in the owner document that I am currently working on. I want to place this new element in the same namespace as the existing parent element, and in the DOM, which means I need to construct the qualified name of the element. Depending on whether there is a prefix for the namespace, this operation will be different (第18-19 line). With the qualified name of the new element and the namespace URI in the existing element, I can create a new element (第20-21 line).

Once you create a new element, I just create and add a text node to wrap the content String, and then replace the original text node (第22-24 line) with the newly created element.

Listing 3. Crimson DOM top-level code
1//Parse the document from input stream
2 System.setproperty ("Javax.xml.parsers.DocumentBuilderFactory",
3 "Org.apache.crimson.jaxp.DocumentBuilderFactoryImpl");
4 Documentbuilderfactory dbf = Documentbuilderfactoryimpl.newinstance ();
5 Dbf.setnamespaceaware (TRUE);
6 Documentbuilder builder = Dbf.newdocumentbuilder ();
7 Document doc = Builder.parse (in);

8//Recursively walk and modify document
9 Modifyelement (Doc.getdocumentelement ());

A//write the document to output stream
((XmlDocument) doc). write (out);



The Crimson DOM sample code in Listing 3 uses the JAXP interface for parsing. JAXP provides a standardized interface for parsing and translating XML documents. The parsing code in this example can also be used for Xerces (appropriate changes to the attribute settings of the Document builder class name) to replace the earlier given Xerces-specific sample code.

In this example, I first set the system attributes in lines 2nd through 3rd to select the Builder factory class that the DOM representation is to be constructed (JAXP only supports building DOM representations directly and does not support building any other representations discussed in this article). This step is required only if you want to select a specific DOM to be used by JAXP; otherwise, it uses the default implementation. For completeness, I included setting this feature in my code, but more generally it was set to a JVM command-line parameter.

I then created an instance of the builder factory in lines 4th through 6th, enabled namespace support for the builder that was constructed with that factory instance, and created a document builder from the Builder factory. Finally (line 7th), I use the document Builder to parse the input stream and construct the document representation.

To write a document, I use the basic method defined internally in crimson. This method is not guaranteed to be supported in future versions of Crimson, but the alternative method of using JAXP to transform code to output the document as text requires an XSL processor such as Xalan. That's beyond the scope of this article, but for more information, you can check out the JAXP tutorial in Sun.

Jdom
Using JDOM's top-level code is a little simpler than using DOM-implemented code. For the build document representation (第1-3 line), I use a saxbuilder with a validation that is prohibited by parameter values. By using the provided Xmloutputter class, it is equally easy to write the modified document to the output stream (第6-8 line).

Listing 4. JDOM top-level code
1//Parse the document from input stream
2 Saxbuilder builder = new Saxbuilder (false);
3 Document doc = Builder.build (in);

4//Recursively walk and modify document
5 modifyelement (Doc.getrootelement ());

6//write the document to output stream
7 Xmloutputter outer = new Xmloutputter ();
8 Outer.output (doc, out);



The Modify method JDOM in Listing 5 is also simpler than the same method of DOM. I get a list of all the elements of the element and scan the list, checking the text (like a String object) and elements. This list is "alive," so I can make changes directly to it without having to invoke the method on the parent element.

Listing 5. JDOM Modify Method
1 protected void Modifyelement (element Element) {

2//Loop through child nodes
3 List children = element.getcontent ();
4 for (int i = 0; i < children.size (); i++) {

5//Handle Child by node type
6 Object child = Children.get (i);
7 if (child instanceof String) {

8//trim whitespace from content text
9 String trimmed = child.tostring (). Trim ();
Ten if (trimmed.length () = = 0) {

One//delete child if only whitespace (adjusting index)
Children.remove (i--);

' Else {

//wrap the trimmed content with new element
element text = new Element ("text", Element.getnamespace ());
Text.settext (trimmed);
Children.set (i, text);

18}
' Else if ' (child instanceof Element) {

//Handle child elements and recursive call
Modifyelement ((Element) child);

22}
23}
24}



The technique for creating new elements (第14-17 rows) is very simple, and unlike the DOM version, it does not require access to the parent document.

dom4j
DOM4J's top-level code is slightly more complex than JDOM's, but their lines of code are very similar. The main difference here is that I saved the Documentfactory (line 5th) that was used to build the dom4j document, and refreshed writer (line 10th) after outputting the modified document text.

Listing 6. Top-level code for DOM4J
1//Parse the document from input stream
2 Saxreader reader = new Saxreader (false);
3 Document doc = Reader.read (in);

4//Recursively walk and modify document
5 m_factory = Reader.getdocumentfactory ();
6 modifyelement (Doc.getrootelement ());

7//write the document to output stream
8 XMLWriter writer = new XMLWriter (out);
9 Writer.write (DOC);
Ten Writer.flush ();



As you can see in Listing 6, DOM4J uses a factory method to construct the objects that are contained in the document representation (built from the parsing). Each Component object is defined according to the interface. So any type of object that implements one of these interfaces can be included in the representation (in contrast to JDOM, it uses a specific class: These classes can, in some cases, be divided into subclasses and inherited, but any class used in the document representation needs to be based on the original JDOM class). By using different factories for DOM4J document building, you can get the documents that are constructed in different series of components.

In the sample code (line 5th), I retrieved the (default) document factory used to build the document and stored it in an instance variable (m_factory) for use by the Modify method. This step is not strictly required-you can use components from different factories in one document at the same time. Or you can bypass the factory and create an instance of the component directly-but in this case, I just want to create the same type of component that is used in the rest of the document and use the same factory to make sure this step is done.

Listing 7. Dom4j Modify Method
1 protected void Modifyelement (element Element) {

2//Loop through child nodes
3 List children = element.content ();
4 for (int i = 0; i < children.size (); i++) {

5//Handle Child by node type
6 Node child = (node) children.get (i);
7 if (child.getnodetype () = = Node.text_node) {

8//trim whitespace from content text
9 String trimmed = Child.gettext (). Trim ();
Ten if (trimmed.length () = = 0) {

One//delete child if only whitespace (adjusting index)
Children.remove (i--);

' Else {

//wrap the trimmed content with new element
Element Text = m_factory.createelement
(Qname.get ("text", Element.getnamespace ());
Text.addtext (trimmed);
Children.set (i, text);

19}
else if (child.getnodetype () = = Node.element_node) {

//Handle child elements and recursive call
Modifyelement ((Element) child);

23}
24}
25}



The dom4j Modify method in Listing 7 is very similar to the method used in JDOM. Instead of checking the type of a content item by using the instanceof operator, I can get the type code (or use instanceof, but the type code method looks clearer) through the Node interface method Getnodetype. The creation of new elements (第15-16 rows) can be distinguished by using the QName object to represent the element name and building elements by calling methods of the saved factory.

Electric XML
The top-level code for Electric XML (EXML) in Listing 8 is the simplest of any of these examples, and can be read and written with a single method call.

Listing 8. EXML top-level code
1//Parse the document from input stream
2 Document doc = new document (in);

3//Recursively walk and modify document
4 Modifyelement (Doc.getroot ());

5//write the document to output stream
6 doc.write (out);



The EXML Modify method in Listing 9, which, like JDOM, requires a instanceof check, but it is most similar to the DOM method. In EXML, you cannot create an element with a namespace-qualified name, so instead, I create a new element and then set its name to achieve the same effect.

Listing 9. EXML Modify Method
1 protected void Modifyelement (element Element) {

2//Loop through child nodes
3 Child child;
4 Child next = Element.getchildren ().
5 while (child = next!= null) {

6//Set next before we change anything
7 next = child.getnextsibling ();

8//Handle Child by node type
9 if (child instanceof Text) {

/trim whitespace from content text
One String trimmed = ((Text) child). GetString (). Trim ();
if (trimmed.length () = = 0) {

//delete child if only whitespace
Child.remove ();

} else {

//wrap the trimmed content with new element
The element text = new element ();
Text.addtext (trimmed);
Child.replacewith (text);
Text.setname (Element.getprefix (), "text");

21}
' Else if ' (child instanceof Element) {

Handle child elements with recursive call
Modifyelement ((Element) child);

25}
26}
27}



XPP
XPP's top-level code (in Listing 10) is the longest of all the examples, and it requires quite a few settings compared to other models.

Listing 10. XPP top-level code
1//Parse the document from input stream
2 m_parserfactory = Xmlpullparserfactory.newinstance ();
3 M_parserfactory.setnamespaceaware (TRUE);
4 Xmlpullparser parser = M_parserfactory.newpullparser ();
5 parser.setinput (New BufferedReader) (New InputStreamReader (in));
6 Parser.next ();
7 XmlNode doc = M_parserfactory.newnode ();
8 Parser.readnode (DOC);

9//Recursively walk and modify document
Ten modifyelement (DOC);

One//write the document to output stream
Xmlrecorder recorder = M_parserfactory.newrecorder ();
Writer Writer = new OutputStreamWriter (out);
Recorder.setoutput (writer);
Recorder.writenode (DOC);
Writer.close ();



Because the JAXP interface is used, I must first create an instance of the parser factory and enable namespace processing (第2-4 rows) before creating the parser instance. Once I get the parser instance, I can set the input into the parser and actually build the document representation (第5-8 line), but this involves more steps than the other models.

Output processing (第11-16 rows) also involves more steps than other models, primarily because XPP requires Writer instead of directly accepting the Stream as an output target.

The XPP Modify method in Listing 11, although it requires more code to create a new element (第13-21 row), is the most similar to the JDOM method. Namespace processing is a bit of a hassle here. I must first create the qualified name of the element (第15-16 line), then create the element, and finally set the name and namespace URI (第18-21 line) later.

Listing 11. XPP Modify Method
1 protected void Modifyelement (XmlNode element) throws Exception {

2//Loop through child nodes
3 for (int i = 0; i < Element.getchildrencount (); i++) {

4//Handle Child by node type
5 Object child = Element.getchildat (i);
6 if (child instanceof String) {

7//Trim whitespace from content text
8 String trimmed = child.tostring (). Trim ();
9 if (trimmed.length () = = 0) {

Ten//delete child if only whitespace (adjusting index)
One element.removechildat (i--);

Or else {

//construct qualified name for wrapper element
String prefix = Element.getprefix ();
String name = (prefix = null)? "Text": (prefix + ": text");

//wrap the trimmed content with new element
XmlNode text = M_parserfactory.newnode ();
Text.appendchild (trimmed);
Element.replacechildat (i, text);
Text.modifytag (Element.getnamespaceuri (), "text", name);

22}
} else if (child instanceof XmlNode) {

Handle//elements with recursive
Modifyelement ((XmlNode) child);

26}
27}
28}



Conclusion
DOM, dom4j, and electric XML all get these almost equally easy to use code samples, where EXML may be the simplest, and dom4j are more difficult to be limited by some small conditions. DOM provides very real-language benefits, but if you only use Java code, it looks a bit cumbersome by comparing it to a Java-specific model. I think it shows that Java-specific models typically successfully implement the goal of simplifying XML documents in Java code.

Beyond the basics: real-world usability
Code samples show that JDOM and EXML provide a simple and clear interface for basic document operations (using elements, attributes, and text). In my experience, their approach does not do a good job of programming the entire presentation of the document. To accomplish these types of tasks, the component methods used by DOM and dom4j-which implement some common interfaces from all document components of attributes to namespaces-work better.
A related example is the XML stream type (XML streaming (XMLS)) encoding that I implemented recently for JDOM and dom4j. This code traverses the entire document and encodes each component. JDOM implementations are much more complex than DOM4J implementations, mainly because JDOM uses a number of unique classes that do not have a common interface to represent each component.

Because JDOM lacks a public interface, even the code that handles the Document object and the code that handles the Element object have some of the same types of components as subcomponents, but they must be different. Special methods are also required to retrieve Namespace components that are relative to other types of subcomponents. Even when dealing with a subassembly type that is considered content, you need to use multiple if statements with instanceof checks on the component type, rather than using a clearer and faster switch statement.

Ironically, it is possible that one of the initial goals of JDOM is to take advantage of the Java Collection classes, which themselves are largely based on interfaces. The use of interfaces in the library adds a lot of flexibility, at the expense of adding some complexity, and is usually a good tradeoff for code designed for reuse. This may also be largely due to dom4j, which reaches a mature and stable state, much faster than JDOM.

Still, DOM is a great choice for developers who use multiple languages. DOM implementations are widely used in a variety of programming languages. It is also the basis for many other XML-related standards, so even if you use a Java-specific model, there is a good chance that you'll need to get to know the DOM gradually. Because it is officially recommended by the Consortium (as opposed to a nonstandard Java model), it may be needed in some types of projects.

In terms of ease of use, among the three main competitors of JDOM, dom4j, and electric XML, dom4j differs from the other two in that it uses an interface based approach with multiple inheritance layers. This makes it more difficult to follow API Javadocs. For example, a method that you are looking for (for example, content (), which is used in line 3rd of our dom4j Modify Method Example) may be part of the Branch interface of the element extension, not the element interface itself. However, this interface based design adds a lot of flexibility (see Sidebar Beyond the basics: real-world usability). Given the benefits of DOM4J's performance, stability, and feature settings, you should consider it a strong candidate for most projects.

In any Java-specific document model, JDOM may have the broadest user base, and it is indeed one of the simplest models to use. However, as a choice for project development, it must tolerate API uncertainty and updates from one version to the next, and it also behaves poorly in performance comparisons. Based on the current implementation, I would recommend dom4j instead of JDOM for people embarking on new projects.

In addition to XPP, EXML consumes much less resources than any other model and, given the advantages of EXML's ease of use, you should certainly think it applies to applications that are important in jar file size. However, the limitations and limited licenses of EXML's XML support, as well as the relatively poor performance shown on larger files, have to be discarded in many applications.

XPP requires more steps when parsing and writing text documents, and requires more steps when dealing with namespaces. If XPP is going to add some convenient ways to deal with some of these common situations, then it might be better in comparison. As it now shows, the performance leader in the last article has become the usability loser in this article. However, because of the XPP performance advantages, it is worthwhile to use it as a EXML alternative for applications that require a smaller jar file size.

Next time ...
The two articles I've written so far relate to the performance and availability of XML document models written in Java. In the latter two articles of this series, I will discuss ways to use Java technology for XML data binding. These methods have many similarities to the methods of the document model, but they further map XML documents into the actual application data structures. We'll see how this can be done so well in terms of ease of use and performance improvements.

Back to DeveloperWorks, examine the essence of XML data binding for Java code. At the same time, you can give comments and questions about this article by using the links below for a forum.

Resources

Click the discussion at the top or bottom of this article to participate in this article's forum.
If you need background information, try DeveloperWorks XML Java Programming tutorials, Understanding SAX tutorials, and understanding the DOM tutorial.
Download the test programs and document model libraries that are used in this article from the download page.
Find out about Java APIs for XML processing (JAXP) or read Jaxp Tutorial.
Gets the details of the author's writings about XML streaming, which is another option for Java serialization of XML documents between programs.
Review of the author's previous articles: XML in java:document models, part 1.
Based on Tony Darugar's team's analysis of several large XML projects, refer to his recommendations for effective DOM with Java.
Java XML Document Model:
Xerces Java
Crimson
Jdom
dom4j
Electric XML
XML Pull Parser (XPP)
Check out IBM's WebSphere Studio application Developer, an integrated visual programming environment for Java, XML, and Web services that implements the DOM.
Want to improve your skills? Please check the XML Certification page-IBM part of the professional education program.




Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.