Dark Horse programmer--xml Parsing---Four ways of parsing XML in Java

Last Update:2015-07-13 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

------Java Training, Android training, iOS training,. NET training, look forward to communicating with you! -------

XML is a kind of universal data Interchange Format, its platform-independent, language-independent, system-independent, to the data integration and interaction brings great convenience. XML is interpreted the same way in different locales, except for the syntax that is implemented differently.

XML parsing methods are divided into four kinds: 1, Dom parsing, 2, sax parsing, 3, Jdom parsing, 4, dom4j parsing. The first two are the basic methods, the official platform-independent parsing method, the latter two are extension methods, they are expanded on the basis of the method, only applicable to the Java platform.

Four ways are described in detail for the following XML file:

1 <?xml version= "1.0" encoding= "UTF-8"?> 2 <bookstore> 3     <book id= "1" > 4         <name> ice and Fire song </name> 5         <author> George Martin </author> 6         <year>2014</year> 7         <price>89 </price> 8     </book> 9     <book id= "2" >10         <name> Anderson fairy tale </name>11         <year >2004</year>12         <price>77</price>13         <language>english</language>14     </book>    </bookstore>

First, Dom parsing

The full name of the DOM is the Document Object model, which is also known as the documentation. In an application, the DOM-based XML parser transforms an XML document into a collection of object models (often called the DOM tree), and the application is manipulating the XML document data through the operation of the object model. Through the DOM interface, an application can access any part of the data in an XML document at any time, so this mechanism of using the DOM interface is also called a random access mechanism.

The DOM interface provides a way to access XML document information through a hierarchical object model, which forms a tree of nodes based on the document structure of the XML. Regardless of the type of information described in the XML document, even if it is a table of data, a list of items, or a document, the model generated by using the DOM is in the form of a node tree. That is, the DOM enforces the use of the tree model to access the information in the XML document. Because XML is essentially a hierarchical structure, this method of description is quite effective.

The random access provided by the DOM tree provides a lot of flexibility in the development of the application, and it can control the content of the entire XML document in any arbitrary way. However, because the DOM parser transforms the entire XML document into a DOM tree in memory, the need for memory is higher when the document is larger or the structure is more complex. Also, the traversal of a complex tree is a time-consuming operation. Therefore, the DOM parser has higher requirements for machine performance and is not very efficient to achieve. However, because the idea of tree structure adopted by DOM parser is consistent with the structure of XML document, and because of the convenience of random access, Dom parser has a wide use value.

Advantages:

1, the formation of a tree structure, to facilitate better understanding, mastery, and code easy to write.

2, in the process of parsing, the tree structure is stored in memory, easy to modify.

Disadvantages:

1, because the file is a one-time read, so the cost of memory is relatively large.

2, if the XML file is larger, it is easy to affect the parsing performance and may cause memory overflow.

The following is the parsing code:

 1 public class Domtest {2 public static void Main (string[] args) {3//Create an Documentbuilderfactory object 4 Documentbuilderfactory dbf = Documentbuilderfactory.newinstance (); 5//Create an Documentbuilder object 6 try {7//Create Documentbuilder object 8 Documentbuilder db = Dbf.newdocumentbuilder ();              9//Load the Books.xml file into the current project by the parser method of the Documentbuilder object to document document = Db.parse ("books.xml"); 11 Gets the collection of all book nodes NodeList Booklist = document.getElementsByTagName ("book"); 13//Through No Delist's GetLength () method can obtain the length of the Booklist ("a total" + booklist.getlength () + "book"); 15/ /traverse each book node for (int i = 0; i < booklist.getlength (); i++) {System.out.println ("=====                  ============ below starts traversing the "+ (i + 1) +" contents of this book ================= "); 18//the item (i) method gets a books node, nodelist index value starting from 0 19  Node book = Booklist.item (i); 20               Gets the collection of all properties for the book node NamedNodeMap attrs = Book.getattributes (); System.out . println ("No." + (i + 1) + "book in total" + attrs.getlength () + "attributes"); 23//Traverse the property = 0; J < Attrs.getlength ();                     J + +) {25//Gets a property of the book node through the item (index) method. attr = Attrs.item (j); 27                     Gets the property name System.out.print ("Property name:" + attr.getnodename ()); 29//Gets the property value 30                  System.out.println ("--attribute value" + attr.getnodevalue ()); 31}32//Parse child node of book node 33                 NodeList childNodes = Book.getchildnodes (); 34//Traversal ChildNodes get node name and node value for each node 35 System.out.println ("i+1" + "book Total" + childnodes.getlength () + "sub-node"); int k = 0; K < Childnodes.getlength ();          k++) {38//distinguishes between the text type of node and the element type Node39           if (Childnodes.item (k). Getnodetype () = = Node.element_node) {40//Gets the node name of the ELEMENT type node 41 System.out.print ("First" + (k + 1) + "node name of node:" "Childnodes.item" (k). GetNode Name ()); 43//Gets the node value of element type node System.out.println ("--node value is:" + Childnode S.item (k). Getfirstchild (). Getnodevalue ());//system.out.println ("--node value is:" + Childnodes.item (k). g Ettextcontent ());}47}48 System.out.println ("======================             End traversal of "+ (i + 1) +" contents of the book ================= ");}50} catch (Parserconfigurationexception e) {51 E.printstacktrace (), saxexception} catch (e) {e.printstacktrace (); Xception e) {e.printstacktrace (); 56} 57}58}

Second, sax parsing

The full name of Sax is the simple API for XML, also known as the XML trivial application interface. Unlike the DOM, the access pattern provided by Sax is a sequential pattern, which is a way to read and write XML data quickly. When parsing an XML document using the SAX parser, a series of events are triggered and the corresponding event handlers are activated, and the application accesses the XML document through these event handlers, and the Sax interface is also referred to as the event-driven interface.

Advantages:

1, the use of event-driven mode, the memory cost is relatively small.

2, for processing only the data in the XML file.

Disadvantages:

1, the code is more troublesome.

2, it is difficult to access many different data in the XML file at the same time.

The following is the parsing code:

  1 public class Saxtest {2/** 3 * @param args 4 */5 public static void main (string[] args) {6  锟 pound copy take a 锟 jin saxparserfactory 锟 jin copy real 锟 Jin 7 saxparserfactory factory = Saxparserfactory.newinstance ();             8//Through 锟 Jin factory 锟 jin copy take SAXParser real 锟 Jin Copy 9 try {saxparser parser = Factory.newsaxparser (); 11 锟 Jin 锟 jin saxparserhandler 锟 jin copy 锟 pound copy saxparserhandler handler = new Saxparserhandler (); Parser.parse ("books.xml", Handler); SYSTEM.OUT.PRINTLN ("~! ! ! Total "+ handler.getbooklist (). Size () 15 +" book ");                 (Book Book:handler.getBookList ()) {System.out.println (Book.getid ()); 18 System.out.println (Book.getname ()); System.out.println (Book.getauthor ()); System.out.println (Book.getyear ()); System.out.println (Book.getprice ()); System.out.println (book.geTlanguage ()); System.out.println ("----finish----"); (Parserconfigurationexception e) {//TODO auto-generated Catch block 2 7 E.printstacktrace (); (Saxexception e) {//TODO auto-generated catch block-E.printstacktrace () ;  * catch (IOException e) {+//TODO auto-generated catch block E.printstacktrace ();     * * * * * * * * * Saxparserhandler extends DefaultHandler {----String value = NULL; 40 Book book = null; The private arraylist<book> Booklist = new arraylist<book> (); Public arraylist<book> getbooklist () {Booklist return, Bookindex = 0;         47/** 48 * Used to identify the resolution begins in * * * @Override public void startdocument () throws Saxexception {52 TODO auto-generated Method Stub SUPER.STARTDOCUment (); System.out.println ("Sax parsing begins"); 55} 56 57/** 58 * Used to identify the parsing end of the "* * * @Override" public void Enddocument () throws Saxexce ption {//TODO auto-generated Method Stub super.enddocument (); System.out.println ("Sax parsing End "); 65} 66 67/** 68 * Parse XML element * * * @Override the public void startelement (String uri, Stri Ng LocalName, String qName, Attributes Attributes) throws Saxexception {73//Call STA of DefaultHandler class Rtelement method Super.startelement (URI, LocalName, qName, attributes); if (Qname.equals ("book")) {bookindex++; 77//Create a book object from book = new B Ook (); 79//Start parsing the properties of the book Element System.out.println ("====================== begins to traverse the contents of a certain ================="); 81//Do not know the name and the number of attributes under the book element, how to get the property name and the property value of the attributes.getlength int num = n (); for (int i = 0;i < num; i++) {System.out.print ("The book Element's" + (i + 1) + "attribute name is:" + Attributes.getqnam E (i)); SYSTEM.OUT.PRINTLN ("---attribute value is:" + attributes.getvalue (i)); if (Attributes.getqname (i). Equals ("id")) {Book.setid (Attributes.getvalue (i)); 8 9} (!qname.equals) ("name") &&!qname.equals ("book     Store ") {System.out.print (" node name is: "+ qName +"---"); 94} @Override 98 public void EndElement (string uri, String localname, String qName) throws Saxexception {100//Call de The EndElement method of the Faulthandler class is 101 super.endelement (URI, LocalName, qName); 102//Judging whether a book has been traversed to end 103 if (Qname.equals ("book")) {104 Booklist.add (book); the book = null;106 System.out.println ("====================== To end a traversal of a book's content ================="); 107}108 Else if (qname.equals (" name ")) {109 Book.setname (value); LSE if (Qname.equals ("author")) {Book.setauthor (value), 113}114 Else if (qname.equals ("yea R ")) {Book.setyear (value),}117 else if (qname.equals (" price ")) {118 Book.s         Etprice (value); 119}120 Else if (qname.equals ("language")) {121 book.setlanguage (value); 122 }123}124 @Override126 public void characters (char[] ch, int start, int length) 127 t         Hrows saxexception {//TODO auto-generated method stub129 super.characters (CH, start, length); 130 Value = new String (CH, start, length), 131 if (!value.trim (). Equals ("")) {System.out.println ("section The point value is: "+ value"; 133}134}135}

Three, jdom analysis

Characteristics:

1. Use only specific classes, not interfaces.

2, the API uses the collections class extensively.

The following is the parsing code:

 1 public class Jdomtest {2 private static arraylist<book> bookslist = new Arraylist<book> (); 3/** 4         * @param args 5 */6 public static void main (string[] args) {7//For Jdom parsing of books.xml files 8         Ready to work 9//1. Create a Saxbuilder object Saxbuilder saxbuilder = new Saxbuilder (); InputStream in;12             try {13//2. Create an input stream to load the XML file into the input stream in = new FileInputStream ("Src/res/books.xml"); 15 InputStreamReader ISR = new InputStreamReader (in, "UTF-8"); 16//3. The input stream is loaded into the saxbuild via the Saxbuilder build method ER Medium Document document = Saxbuilder.build (ISR); 18//4. Gets the root node of the XML file through the Document object Elem ent rootelement = document.getrootelement (); 20//5. Gets the List collection of child nodes under the root node, list<element> Bookli st = Rootelement.getchildren (); 22//Continue parsing for (Element book:booklist) { Ok bookentity = new book(); System.out.println ("====== begins to parse the first" + (Booklist.indexof (book) + 1) 26 + "Books ======                 "); 27//Parse the property set of book list<attribute> attrlist = Book.getattributes (); 29 When you know the name of a property under a node, get the node value of Book.getattributevalue ("id"); 31//Traverse Attrlist (for unclear book node  (Attribute attr:attrlist) {33//Get property name, and number of properties)                     Attrname = Attr.getname (); 35//Gets the attribute value of * String AttrValue = Attr.getvalue (); 37                     System.out.println ("attribute name:" + attrname + "----attribute value:" 39 + attrValue);                 if (Attrname.equals ("id")) {Bookentity.setid (attrValue); 41}42 }43//Traversal of node names and node values for the child nodes of the book node list<element> bookchilds = book.getc            Hildren (); 45     for (Element child:bookchilds) {System.out.println ("node name:" + child.getname () + "----Node value:" 47                         + Child.getvalue ()); if (Child.getname (). Equals ("name")) {49 Bookentity.setname (Child.getvalue ());}51 Else if (Child.getname (). E                     Quals ("author")) {Bookentity.setauthor (Child.getvalue ()); 53}54                     else if (Child.getname (). Equals ("year")) {Bookentity.setyear (Child.getvalue ()); 56 }57 Else if (Child.getname () Equals ("price")) {BOOKENTITY.S                         Etprice (Child.getvalue ());}60 Else if (Child.getname (). Equals ("language")) {61                 Bookentity.setlanguage (Child.getvalue ()); 62}63}64 System.out.println("====== End Resolution" + (Booklist.indexof (book) + 1) 65 + "Books ======"); Bookslist.add Entity); bookentity = null;68 System.out.println (bookslist.size ());                 Stem.out.println (Bookslist.get (0). GetId ()); System.out.println (Bookslist.get (0). GetName ()); 71 }73} catch (FileNotFoundException e) {e.printstacktrace (); (Jdomexception e)     {e.printstacktrace (); IOException} catch (e) {e.printstacktrace (); 79}80 }81}

4, DOM4J analysis

Characteristics:

1. An intelligent branch of Jdom, which incorporates many functions beyond the basic XML document representation.

2. It uses interfaces and abstract basic class methods.

3, with excellent performance, flexibility, powerful and extremely easy to use features.

4, is an open-source file

The following is the parsing code:

 1 public class Dom4jtest {2 private static arraylist<book> Booklist = new Arraylist<book> (); 3/** 4 * @param args 5 */6 public static void main (string[] args) {7//parse Books.xml file 8//Create sax Reader Object Reader 9 Saxreader reader = new Saxreader (); try {11//load through the Read method of the reader object books.xm L file to get the Docuemnt object. Document = Reader.read (new File ("Src/res/books.xml")); 13//Gets the root node through the Document object Bookstore             Element bookstore = Document.getrootelement (); 15//Get iterator 16 through the Elementiterator method of the Element object Iterator it = Bookstore.elementiterator (); 17//Traversal iterator to get information (books) in the root node (It.hasnext ()) {                 System.out.println ("===== begins to traverse a book ====="); Element books = (element) it.next (); 21 Gets the property name and property value of book list<attribute> bookattrs = Book.attributes (); R (AttrIbute attr:bookattrs) {System.out.println ("attribute name:" + attr.getname () + "--Attribute value:" 25                 + Attr.getvalue ());}27 Iterator ITT = Book.elementiterator (); 28 while (Itt.hasnext ()) {element bookchild = (Element) itt.next (); System.ou T.println ("Node name:" + bookchild.getname () + "--node Value:" + bookchild.getstringvalue ());}32 Sys Tem.out.println ("===== end traversal of a book =====");}34} catch (Documentexception e) {//TODO Aut O-generated catch block36 e.printstacktrace (); 37}38}39}

Final: Comparison Summary

DOM4J performance is the best, even Sun's JAXM is also in use dom4j. Many open source projects currently employ dom4j, such as the famous Hibernate, which also uses DOM4J to read XML configuration files. If portability is not considered, then dom4j is used.
Jdom and Dom perform poorly during performance testing, and memory overflows when testing 10M documents. In the case of small documents it is also worth considering Dom and Jdom. Although Jdom developers have stated that they expect to focus on performance issues before the full release, there is no merit in the performance perspective. In addition, Dom is still a very good choice. DOM implementations are widely used in many programming languages. It is also the basis for many other XML-related standards, because it is formally recommended (as opposed to a non-standard Java model), so it may also be needed in some types of projects (such as using the DOM in JavaScript).
Sax behaves better, depending on its specific parsing-event-driven behavior. A sax detects the incoming XML stream, but does not load into memory (of course, some documents are temporarily hidden in memory when the XML stream is read).

Dark Horse programmer--xml Parsing---Four ways of parsing XML in Java

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Dark Horse programmer--xml Parsing---Four ways of parsing XML in Java

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support