Incomplete test of four XML parsing techniques in Java

Last Update:2014-05-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Test Environment: AMD poison dragon 1.4goc1.5G, 256MDDR333, Windows2000ServerSP4, SunJDK1.4.1 + Eclipse2.1 + Resin2.1.8. The XML file format is as follows :? Xmlversion = 1.0 encoding = GB

Test environment:

AMD poison dragon 1.4g oc 1.5G, 256 M DDR333, Windows2000 Server SP4, Sun JDK 1.4.1 Eclipse 2.1 Resin 2.1.8, tested in Debug mode.

The XML file format is as follows:

Reference content is as follows:

　　 A1234

No. XX, Section X, XX Road, XX town, XX county, Sichuan province

　　 B1234

XX group, XX village, XX Township, XXX city, Sichuan province

Test method:

Let each scheme parse the XML file of 10 K, 100 K, 1000 K, and K, and calculate the time (unit: milliseconds ).

Reference content is as follows:

JSP file:

<% @ Page contentType = "text/html; charset = gb2312" %> <% @ page import = "com. test. *" %>

　　<% String args [] = {""}; MyXMLReader. main (args); %>

Test

The first appearance is DOM (JAXP Crimson parser)

DOM represents the official W3C standard of XML documents in a way unrelated to the platform and language. DOM is a collection of nodes or information fragments organized in hierarchies. This hierarchy allows developers to search for specific information in the tree. To analyze this structure, you usually need to load all documents and structural hierarchies before you can do anything. Because it is based on information layers, DOM is considered to be tree-based or object-based. DOM and tree-based processing in the broad sense have several advantages. First, because the tree is persistent in the memory, you can modify it so that the program can make changes to the data and structure. It can also navigate in the tree at any time, rather than one-time processing like SAX. DOM applications are much simpler.

On the other hand, parsing and loading all documents in very large documents may be slow and resource-consuming, so it is better to use other methods to process such data. These event-based models, such as SAX.

Reference content is as follows:

Bean file:

Package com. test;

Import java. io. *; import java. util. *; import org. w3c. dom. *; import javax. xml. parsers .*;

Public class MyXMLReader {

Public static void main (String arge []) {

Long lasting = System. currentTimeMillis ();

Try {

File f = new File ("data_10k.xml ");

DocumentBuilderFactory factory = DocumentBuilderFactory. newInstance ();

DocumentBuilder builder = factory. newDocumentBuilder ();

Document doc = builder. parse (f );

NodeList nl = doc. getElementsByTagName ("VALUE ");

For (int I = 0; I

System. out. print ("license plate number:" doc. getElementsByTagName ("NO"). item (I). getFirstChild (). getNodeValue ());

System. out. println ("owner address:" doc. getElementsByTagName ("ADDR"). item (I). getFirstChild (). getNodeValue ());

}

} Catch (Exception e ){

E. printStackTrace ();

}

System. out. println ("Run time:" (System. currentTimeMillis ()-lasting) "millisecond ");}}

10 K elapsed time: 265 203 219 172

9172 K time consumption: 9016 8891 9000

691719 K time consumption: 675407 708375 739656

10000k takes time: OutOfMemoryError

Followed by SAX

The advantages of such processing are very similar to those of streaming media. Analysis can start immediately, rather than waiting for all data to be processed. In addition, because the application only checks data when reading data, it does not need to store the data in the memory. This is a huge strength for large documents. In fact, a program does not even have to parse all documents; it can end parsing when a condition is met. In general, SAX is much faster than its changer DOM.

Select DOM or SAX?

For developers who need to write their own code to process XML documents, choosing DOM or the SAX parsing model is a very important design decision.

DOM uses a tree structure to access XML documents, while SAX uses an event model.

The DOM parser converts an XML document into a tree containing its content and can traverse the tree. The advantage of using DOM to parse the model is that programming is easy. developers only need to call the build instruction and then use navigation APIs to visit the desired tree node to complete the task. You can easily add and modify elements in the tree. However, because the DOM parser needs to process all XML documents, the performance and memory requests are relatively high, especially when a large XML file is encountered. Because of its traversal capability, the DOM parser is often used in services that require frequent changes in XML documents.

The SAX parser uses an event-based model. it triggers a series of events when parsing XML documents. when a given tag is found, it can activate a callback method, tell the method that the label has been found. SAX usually has relatively low memory requests, because it allows developers to determine the tag to be processed by themselves. Especially when developers only need to process part of the data contained in the document, the extension of SAX can be better reflected. However, when using the SAX parser, encoding is more difficult, and it is difficult to visit multiple different data in the same document at the same time.

Reference content is as follows:

Bean file:

Package com. test; import org. xml. sax. *; import org. xml. sax. helpers. *; import javax. xml. parsers .*;

Public class MyXMLReader extends DefaultHandler {

Java. util. Stack tags = new java. util. Stack ();

Public MyXMLReader (){

Super ();}

Public static void main (String args []) {

Long lasting = System. currentTimeMillis ();

Try {

SAXParserFactory sf = SAXParserFactory. newInstance ();

SAXParser sp = sf. newSAXParser ();

MyXMLReader reader = new MyXMLReader ();

Sp. parse (new InputSource ("data_10k.xml"), reader );

} Catch (Exception e ){

E. printStackTrace ();

}

System. out. println ("Run time:" (System. currentTimeMillis ()-lasting) "millisecond ");}

Public void characters (char ch [], int start, int length) throws SAXException {

String tag = (String) tags. peek ();

If (tag. equals ("NO ")){

System. out. print ("license plate number:" new String (ch, start, length);} if (tag. equals ("ADDR ")){

System. out. println ("address:" new String (ch, start, length ));}}

Public void startElement (String uri, String localName, String qName, Attributes attrs ){

Tags. push (qName );}}

10 K elapsed time: 110 47 109 78

344 k time consumption: 406 375 422

3234 K time consumption: 3281 3688 3312

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Incomplete test of four XML parsing techniques in Java

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Incomplete test of four XML parsing techniques in Java

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support