Four kinds of parsing methods of XML

Source: Internet
Author: User

This article is I go online to find information, plus I summed up some of the

Data Source:

Http://www.cnblogs.com/allenzheng/archive/2012/12/01/2797196.html

Http://www.cnblogs.com/lanxuezaipiao/archive/2013/05/17/3082949.html

First, introduction and analysis of advantages and disadvantages

1. DOM(Document Object Model)

DOM is the official standard for representing XML documents in a platform-and language-neutral way. The DOM is a collection of nodes or pieces of information that are organized in a hierarchical structure. This hierarchy allows developers to look for specific information in the tree. Analyzing the structure usually requires loading the entire document and constructing the hierarchy before any work can be done. Because it is based on the information hierarchy, the DOM is considered to be tree-based or object-based.

Advantages
① allows applications to make changes to data and structures.
② access is bidirectional and can be navigated up and down in the tree at any time, capturing and manipulating any part of the data.
Disadvantages
① usually needs to load the entire XML document to construct the hierarchy, consuming resources.

Xml:

<?xml version="1.0"encoding="UTF-8"?><users> <user id="0"> <name>yaobo</name> <age> A</age> <sex>male</sex> </user> <user id="1"> <name> Students </name> <age> -</age> <sex>Male</sex> </user> <user id="2"> <name>wjm</name> <age> at</age> <sex>male</sex> </user> <user id="3"> <name>wh</name> <age> -</age> <sex>Male</sex> </user> <user > <name>yaobo</name&gt        ; <age> A</age> </user></users>

Dom parsing:

Package Test1;import Javax.xml.parsers.documentbuilder;import javax.xml.parsers.documentbuilderfactory;import Org.w3c.dom.document;import Org.w3c.dom.node;import org.w3c.dom.NodeList; Public classDom { Public voidparser (String URL) {documentbuilderfactory dbf=documentbuilderfactory.newinstance (); Try{Documentbuilder db=Dbf.newdocumentbuilder (); Document Doc=db.parse (URL); NodeList Users=doc.getchildnodes ();  for(intI=0; I<users.getlength (); i++) {Node user=Users.item (i); //this time, the node users are removed.//System.out.println (User.getnodename () +users.getlength ());NodeList userlist=user.getchildnodes ();  for(intj=0; J<userlist.getlength (); j + +) {Node info=Userlist.item (j); //this time the node user was taken out.//System.out.println (Info.getnodename ());NodeList attribute=info.getchildnodes (); //System.out.println (Attribute.item (0). Getnodename () + ":" +attribute.item (0). Gettextcontent ());                     for(intk=0; K<attribute.getlength (); k++){                        if(Attribute.item (k). Getnodename ()! ="#text"){                            //this time, the attribute node below the user is removedSystem. out. println (Attribute.getlength () +attribute.item (k). Getnodename () +":"+Attribute.item (k). Gettextcontent ()); }} System. out. println (); }                            }        } Catch (Exception e) {e.printstacktrace (); }    }}
Package Test1;  Public class Test {    publicstaticvoid  main (string[] args) {        String url=  "src/test/jixie.xml"; //         sax sax=new sax (); //         sax.parse (); //                Dom dom=new  dom ();        Dom.parser (URL);}    }

2. SAX (simple API for XML)

The advantages of Sax processing are very similar to the advantages of streaming media. The analysis can begin immediately, rather than waiting for all data to be processed. Also, because the application examines data only when it is being read, it does not need to store the data in memory. This is a huge advantage for large documents. In fact, the application doesn't even have to parse the entire document; it can stop parsing when a condition is met. In general, Sax is much faster than its surrogate dom.

Select Dom or choose Sax? Choosing the DOM or Sax parsing model is a very important design decision for developers who need to write their own code to handle XML documents. The DOM accesses the XML document in a tree-structured way, and Sax uses the event model.

The DOM parser transforms an XML document into a tree containing its contents and can traverse the tree. The advantage of parsing a model with DOM is that programming is easy, and developers simply need to invoke the build instructions and then use the navigation APIs to access the desired tree nodes to complete the task. It is easy to add and modify elements in the tree. However, because of the need to process the entire XML document when using the DOM parser, the performance and memory requirements are high, especially when encountering large XML files. Because of its traversal capabilities, DOM parsers are often used in services where XML documents require frequent changes.

The SAX parser uses an event-based model that can trigger a sequence of events when parsing an XML document, and when a given tag is found, it can activate a callback method that tells the method that the label has been found. Sax's requirements for memory are usually low because it lets the developer decide which tag to process. Especially if the developer only needs to work with some of the data contained in the document, Sax has a better ability to scale. But it is difficult to code with a SAX parser, and it is difficult to access multiple different data in the same document at the same time.

Advantage
① does not need to wait for all data to be processed, the analysis can begin immediately.
② only examines data when it is read, and does not need to be stored in memory.
③ can stop parsing when a condition is met, without having to parse the entire document.
④ high efficiency and performance to parse documents larger than system memory.

Disadvantages
① requires the application to be responsible for the processing logic of the tag (for example, maintaining a parent/child relationship, etc.), and the more complex the document is.
② One-way navigation, cannot locate the document hierarchy, it is difficult to access different parts of the same document at the same time, XPath is not supported.

The following is a picture to explain the sax parsing in detail.


The XML file is loaded by the SAX parser, because the sax parsing is parsed according to the order of the XML file, and when the <?xml.....> is read, the Startdocument () method is called, and when the <books> is read into the Because it is a elementnode, it calls Startelement (string uri, String localname, String qName, Attributes Attributes) method, The second parameter is the name of the node, note: Because some environment is different, sometimes the second parameter may be empty, so you can use the third parameter, so before parsing, call to see which parameters can be used, the 4th parameter is the attribute of this node. Here we do not need this node, so starting from the <book> node, that is, 1 of the position in the figure, when read, call Startelement (...) method, because there is only one property ID, can be obtained through Attributes.getvalue (0), and then in the figure 2 of the place will call characters (char[] ch, int start, int length) method, do not think there is blank, The SAX parser is less likely to think that the SAX parser will consider it a textnode. But this blank is not the data we want, we want the text information under the <name> node. It is necessary to define a record when the name of the previous node is tagged, in characters (...) method, determine whether the current node is name, and then fetch the value to thinking in Java.

Sax parsing:

Package Test1;import Java.io.ioexception;import java.io.inputstream;import java.util.arraylist;import Java.util.hashmap;import Java.util.list;import Java.util.map;import Javax.xml.parsers.parserconfigurationexception;import Javax.xml.parsers.saxparser;import Javax.xml.parsers.saxparserfactory;import Org.xml.sax.attributes;import Org.xml.sax.saxexception;import Org.xml.sax.helpers.DefaultHandler; Public classSax { Public voidparse (String URL) {saxparserfactory SF=saxparserfactory.newinstance ();        SAXParser sp; Try{SP=Sf.newsaxparser (); Myhander MH=NewMyhander ("User");        Sp.parse (URL, MH); } Catch(Exception e) {e.printstacktrace (); }     }  }classMyhander extends defaulthandler{String nodeName; Map<String,String>map; //to save the parsed data to the list collectionList<map>list;  PublicMyhander (String nodeName) { This. nodename=NodeName; List=NewArraylist<map>(); }    //This is the name that resolves to the current nodeString TagName; @Override Public voidstartdocument () throws Saxexception {System. out. println ("parsing begins:"); } @Override Public voidstartelement (String uri, String localname, String qName, Attributes Attributes) throws Saxexception { System. out. println ("startelement:"+qName); //determine if the required node name        if(qname==nodeName) {Map=NewHashmap<string,string>(); }                if(attributes!=NULL&& attributes.getlength () >0){            //This step is to save the ID in the <user id= "" > ID in the name and attribute to the mapMap.put (Attributes.getqname (0), Attributes.getvalue (0)); } tagName=QName; } @Override Public voidCharacters (Char[] ch,intStartintlength) throws Saxexception {System. out. println ("characters:"); if(map!=NULL&& tagname!=NULL){            //gets the contents of the current nodeString content=NewString (ch,start,length); if(content!=NULL&&!content.trim (). Equals ("") &&!content.trim (). Equals ("\ n") ) {System. out. println ("Yaobo"+content);                Map.put (tagName, content); System. out. println (tagname+" "+content); } content=NULL; }} @Override Public voidendElement (String uri, String localname, String qName) throws Saxexception {System. out. println ("endElement:"+tagName); if(Qname.equals (nodeName)) {list.add (map); Map=NULL; }} @Override Public voidenddocument () throws Saxexception {System. out. println ("Parse End"); System. out. println (list); }    }
Analysis:Parsing XML using sax is a top-down, event-driven parsing method that automatically calls Startdocument (), startelement (), characters (), EndElement (), and, as appropriate, in the parsing process. Enddocument () and other related methods. The following illustration shows a detailed explanation of the results of the compilation execution:
    • The Startdocument () method is only called when the document starts parsing, and is only called once for each parse.
    • The Startelement () method is called each time it starts parsing an element, that is, when the element tag is encountered.
    • The characters () method is also called every time the element tag is parsed, even if the contents of the element's label are empty or wrapped. And if the element is nested, the characters () method is called again before the parent element ends the tag, which needs to be noted here.
    • The EndElement () method is called each time it ends parsing an element, that is, when an element tag is encountered.
    • The Enddocument () startdocument () method is only called at the end of the document resolution, and is only called once per resolution.

Four kinds of parsing methods of XML

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.