Go to: Parse XML in Java

Source: Internet
Author: User

In normal work, it is inevitable that XML will be used as the data storage format. Which of the following solutions is the most suitable for us? In this article, I made an incomplete evaluation of the four mainstream solutions, and only tested the XML traversal, because XML traversal is the most commonly used at work (at least I think ).

Pre-Backup

  Test environment:

AMD Poison Dragon 1.4g oc 1.5g, 256 m ddr333, Windows2000 Server SP4, Sun JDK 1.4.1 + eclipse 2.1 + resin 2.1.8, tested in debug mode.

The XML file format is as follows:

<? XML version = "1.0" encoding = "gb2312"?>
<Result>
<Value>
<NO> a1234 </NO>
<ADDR> No. XX, section X, XX Road, xx Town, XX County, Sichuan Province </ADDR>
</Value>
<Value>
<NO> b1234 </NO>
<ADDR> XX group, XX village, XX Township, xxx City, Sichuan Province </ADDR>
</Value>
</Result>

  Test method:

 
JSP
End call bean (as to why JSP is used to call, please refer to: http://blog.csdn.net/rosen/archive/2004/10/15
/138324. aspx) for each solution to parse the XML files of 10 K, 100 k, 1000 K, and K respectively, and calculate the consumption time (unit: milliseconds ).

JSP file:

<% @ Page contenttype = "text/html; charset = gb2312" %>
<% @ Page import = "com. Test. *" %>

<HTML>
<Body>
<%
String ARGs [] = {""};
Myxmlreader. Main (ARGs );
%>
</Body>
</Html>

Test

The first appearance is Dom (JAXP crimson parser)

 
Dom is the official W3C standard for XML documents in a way unrelated to the platform and language. Dom
It is a collection of nodes or information fragments organized by hierarchies. This hierarchy allows developers to search for specific information in the tree. To analyze this structure, you usually need to load the entire document and construct a hierarchy before you can perform any
Work. Because it is based on information layers, Dom is considered to be tree-based or object-based. Dom
Tree-based processing in the broad sense has several advantages. First, because the tree is persistent in the memory, you can modify it so that the application can change the data and structure. It can also be up or down in the tree at any time
Navigation, rather than one-time processing like sax. Dom is much easier to use.

On the other hand, parsing and loading a very large document may be slow and resource-consuming, so it is better to use other methods to process such data. These event-based models, such as sax.

Bean file:

Package com. test;

Import java. Io .*;
Import java. util .*;
Import org. W3C. Dom .*;
Import javax. xml. parsers .*;

Public class myxmlreader {

Public static void main (string arge []) {
Long Lasting = system. currenttimemillis ();
Try {
File F = new file ("data_10k.xml ");
Documentbuilderfactory factory = documentbuilderfactory. newinstance ();
Documentbuilder builder = factory. newdocumentbuilder ();
Document Doc = builder. parse (f );
Nodelist NL = Doc. getelementsbytagname ("value ");
For (INT I = 0; I <NL. getlength (); I ++ ){
System. Out. Print ("license plate number:" + Doc. getelementsbytagname ("no"). Item (I). getfirstchild (). getnodevalue ());
System. Out. println ("owner address:" + Doc. getelementsbytagname ("ADDR"). Item (I). getfirstchild (). getnodevalue ());
}
} Catch (exception e ){
E. printstacktrace ();
}
System. Out. println ("Run time:" + (system. currenttimemillis ()-Lasting) + "millisecond ");
}
}

10 K elapsed time: 265 203 219 172
9172 K elapsed time: 9016 8891 9000
691719 K elapsed time: 675407 708375 739656
10000k time consumed: outofmemoryerror

Followed by Sax

 
The advantages of this processing are very similar to those of streaming media. The analysis can start immediately, rather than waiting for all data to be processed. Besides, because the application only checks data when reading data
Data is stored in the memory. This is a huge advantage for large documents. In fact, the application does not even have to parse the entire document; it can stop parsing when a condition is met. In general
It is much faster than its replacement Dom.

Select Dom or sax?

For developers who need to write their own code to process XML documents, choosing Dom or the sax Parsing Model is a very important design decision.

Dom uses a tree structure to access XML documents, while sax uses an event model.

 
The Dom parser converts an XML document into a tree containing its content and can traverse the tree. Use dom
The advantage of parsing the model is that programming is easy. Developers only need to call the build instruction and then use navigation
APIS accesses the required Tree node to complete the task. You can easily add and modify elements in the tree. However, when using the DOM parser, you need to process the entire XML
Documentation, so the performance and memory requirements are relatively high, especially when a large XML file is encountered. Due to its traversal capability, Dom parser is often used in XML
The document needs to be frequently changed in the service.

The SAX Parser uses an event-based model, which is used to parse XML
A document can trigger a series of events. When a given tag is found, it can activate a callback method to tell the method that the tag has been found. Sax
Memory requirements are usually relatively low, because it allows developers to determine the tag to be processed by themselves. Especially when developers only need to process part of the data contained in the document
This scalability is better reflected. However, it is difficult to use the SAX Parser to encode data, and it is difficult to access multiple different data in the same document at the same time.

Bean file:

Package com. test;
Import org. xml. Sax .*;
Import org. xml. Sax. helpers .*;
Import javax. xml. parsers .*;

Public class myxmlreader extends defaulthandler {

Java. util. Stack tags = new java. util. Stack ();

Public myxmlreader (){
Super ();
}

Public static void main (string ARGs []) {
Long Lasting = system. currenttimemillis ();
Try {
Saxparserfactory Sf = saxparserfactory. newinstance ();
Saxparser sp = SF. newsaxparser ();
Myxmlreader reader = new myxmlreader ();
Sp. parse (New inputsource ("data_10k.xml"), Reader );
} Catch (exception e ){
E. printstacktrace ();
}
System. Out. println ("Run time:" + (system. currenttimemillis ()-Lasting) + "millisecond ");
}

Public void characters (char ch [], int start, int length) throws saxexception {
String tag = (string) tags. Peek ();
If (tag. Equals ("no ")){
System. Out. Print ("license plate number:" + new string (CH, start, length ));
}
If (tag. Equals ("ADDR ")){
System. Out. println ("Address:" + new string (CH, start, length ));
}
}

Public void startelement (
String Uri,
String localname,
String QNAME,
Attributes attrs ){
Tags. Push (QNAME );
}
}

10 K elapsed time: 110 47 109 78
344 K elapsed time: 406 375 422
3234 K elapsed time: 3281 3688 3312
10000k consumption time: 32578 34313 31797 31890 30328

Then the JDOM http://www.jdom.org/

 
JDOM aims to become a Java-specific document model, which simplifies interaction with XML and is faster than Dom. Because it is the first Java
For specific models, JDOM has been vigorously promoted and promoted. Considering using the Java specification request JSR-102 to ultimately use it as the java standard extension ". From 2000
JDOM development started at the beginning of the year.

JDOM and Dom are mainly different in two aspects. First, JDOM only uses a specific class instead of an interface. This simplifies APIs in some ways, but also limits flexibility. Second, the API uses a large number of collections classes to simplify the use of Java developers who are already familiar with these classes.

 
The purpose of the JDOM Document declaration is to "use 20% (or less) effort to solve 80% (or more) Java/XML problems" (assumed as 20% based on the learning curve ). JDOM
It is certainly useful for most Java/XML applications, and most Developers find that APIs are much easier to understand than Dom. JDOM
It also includes extensive checks of program behavior to prevent users from doing anything meaningless in XML. However, it still requires you to fully understand XML
In order to do more than basic work (or even understand some situations of errors ). This may be more meaningful than learning Dom or JDOM interfaces.

  
JDOM does not contain a parser. It usually uses the sax2 parser to parse and verify the input XML document (although it can also
Input ). It contains some converters that output the JDOM representation into the sax2 event stream, Dom model, or XML text document. JDOM is in Apache
Open Source Code released under the license variant.

Bean file:

Package com. test;

Import java. Io .*;
Import java. util .*;
Import org. JDOM .*;
Import org. JDOM. Input .*;

Public class myxmlreader {

Public static void main (string arge []) {
Long Lasting = system. currenttimemillis ();
Try {
Saxbuilder builder = new saxbuilder ();
Document Doc = builder. Build (new file ("data_10k.xml "));
Element Foo = Doc. getrootelement ();
List allchildren = Foo. getchildren ();
For (INT I = 0; I <allchildren. Size (); I ++ ){
System. Out. Print ("license plate number:" + (element) allchildren. Get (I). getchild ("no"). gettext ());
System. Out. println ("owner address:" + (element) allchildren. Get (I). getchild ("ADDR"). gettext ());
}
} Catch (exception e ){
E. printstacktrace ();
}
System. Out. println ("Run time:" + (system. currenttimemillis ()-Lasting) + "millisecond ");
}
}

10 K elapsed time: 125 62 187 94
704 K elapsed time: 625 640 766
27984 K elapsed time: 30750 27859 30656
10000k time consumed: outofmemoryerror

Finally dom4j http://dom4j.sourceforge.net/

 
Although dom4j represents completely independent development results, it was originally a smart branch of JDOM. It combines many functions that exceed the representation of basic XML documents, including integrated
XPath support, XML Schema support, and event-based processing for large or streaming documents. It also provides the option to build the document representation, which uses the dom4j API
And standard DOM interfaces. It has been under development since the second half of 2000.

To support all these features, dom4j
Use interfaces and abstract basic class methods. Dom4j uses many collections in the API
But in many cases, it also provides some alternative methods to allow better performance or more direct encoding methods. The direct advantage is that although dom4j has made more complex APIs
But it provides much greater flexibility than JDOM.

Dom4j
The goal is the same as that of JDOM: ease-of-use and intuitive operations for Java developers. It is also committed to becoming a more complete solution than JDOM, implementing essentially all
Objective of Java/XML problems. When this goal is achieved, it places less emphasis on preventing incorrect application behavior than JDOM.

Dom4j
Is a very good Java XML
API, featuring excellent performance, powerful functionality, and extreme ease of use, is also an open source software. Now you can see that more and more Java software is using dom4j
To read and write XML, it is particularly worth mentioning that Sun's jaxm is also using dom4j.

Bean file:

Package com. test;

Import java. Io .*;
Import java. util .*;
Import org. dom4j .*;
Import org. dom4j. Io .*;

Public class myxmlreader {

Public static void main (string arge []) {
Long Lasting = system. currenttimemillis ();
Try {
File F = new file ("data_10k.xml ");
Saxreader reader = new saxreader ();
Document Doc = reader. Read (f );
Element root = Doc. getrootelement ();
Element Foo;
For (iterator I = root. elementiterator ("value"); I. hasnext ();){
Foo = (element) I. Next ();
System. Out. Print ("license plate number:" + Foo. elementtext ("no "));
System. Out. println ("owner address:" + Foo. elementtext ("ADDR "));
}
} Catch (exception e ){
E. printstacktrace ();
}
System. Out. println ("Run time:" + (system. currenttimemillis ()-Lasting) + "millisecond ");
}
}

10 K elapsed time: 109 78 109 31
297 K elapsed time: 359 172 312
2281 K elapsed time: 2359 2344 2469
10000k consumption time: 20938 19922 20031 21078

 
JDOM and Dom do not perform well in performance tests, and memory overflow occurs when testing 10 m documents. Dom and JDOM are also worth considering in the case of small documents. Although JDOM
Developers have already stated that they want to focus on performance issues before the official release, but from the performance point of view, it is indeed not worth recommending. In addition, Dom is still a good choice. Dom
The implementation is widely used in multiple programming languages. It is also the basis of many other XML-related standards, because it is officially recommended by W3C (compared with non-standard Java
Model), so it may also be required in some types of projects (such as using DOM in Javascript ).

Sax performs well, depending on its specific parsing method. A sax detects the upcoming XML Stream but does not load it into the memory (of course, some documents are temporarily hidden in the memory when the XML Stream is read ).

Undoubtedly, dom4j is the winner of this test. Currently, many open-source projects use dom4j in large numbers. For example, the famous hibernate also uses dom4j to read xml configuration files. If portability is not considered, use dom4j!

Source: http://www.it.com.cn/f/edu/053/27/93819.htm

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.