Explanation of sax parsing XML

Source: Internet
Author: User
Tags xml document reader xml parser xml reader

Java parses XML in two ways: Dom and sax. Although Dom is W3C standard and provides a standard parsing method, its parsing efficiency has been unsatisfactory because when using Dom to parse XML, the parser reads the entire document and constructs a tree (node tree) with resident memory. Then, your code can use the standard DOM interface to operate the tree. However, in most cases, we are only interested in part of the content of the document, so we do not need to parse the entire document first, it is also time-consuming to index some data from the root node of the node tree.
Sax is an alternative to XML parsing. Compared with the Document Object Model Dom, Sax is a faster and lighter way to read and operate XML data.
Method. Sax allows you to process a document while reading it, so that you do not have to wait until the entire document is stored. It does not involve the overhead and concept jumps required by Dom.An event-based APIIt is suitable for processing data streams, that is, processing data in sequence with the flow of data. Sax API
You will be notified when an event occurs when you parse your document. When you reply to the message, the data you do not save will be discarded.
The following is an example of how to parse XML by using SAX (a little long, because all methods for processing the sax event are annotated in detail). There are four main interfaces for event processing in the sax API:Contenthandler, dtdhandler, entityresolverAndErrorhandler. The example below may be a little lengthy. In fact, as long as the inheritanceDefaulthandlerClass
The method that covers part of the event processing method can also achieve the effect of this example, but to look at the global, let's look at all the main event parsing methods in the sax API. (ActuallyDefaulthandlerThe above four event processor interfaces are implemented, and then the default implementation of each abstract method is provided .)

1, contenthandlerInterface: The processor interface for receiving notifications of the logical content of a document.

1. import Org. XML. sax. attributes; 2. import Org. XML. sax. contenthandler; 3. import Org. XML. sax. locator; 4. import Org. XML. sax. saxexception; 5. 6. class mycontenthandler implements contenthandler {7. stringbuffer jsonstringbuffer; 8. int frontblankcount = 0; 9. public mycontenthandler () {10. jsonstringbuffer = new stringbuffer (); 11 .} 12. /* 13. * receives notifications of character data. 14. * In the Dom, CH [begin: end] is equivalent to the node value (nodevalue) of the text node 15. */16. @ override 17. public void characters (char [] CH, int begin, int length) throws saxexception {18. stringbuffer buffer = new stringbuffer (); 19. for (INT I = begin; I <begin + length; I ++) {20. switch (CH [I]) {21. case '\': buffer. append ("\\\\"); break; 22. case '\ R': buffer. append ("\ r"); break; 23. case '\ N': buffer. append ("\ n"); break; 24. Case '\ t': buffer. append ("\ t"); break; 25. case '\ "': buffer. append ("\\\" "); break; 26. default: buffer. append (CH [I]); 27 .} 28 .} 29. system. out. println (this. toblankstring (this. frontblankcount) + 30. ">>> characters (" + Length + "):" + buffer. tostring (); 31 .} 32. 33. 34. /* 35. * receive notifications at the end of a document. 36. */37. @ override 38. public void enddocument () throws saxexception {39. system. out. println (this. toblankstring (-- this. frontblankcount) + 40. ">>> end document"); 41 .} 42. 43. 44. /* 45. * receive notifications at the end of a document. 46. * The parameter meaning is as follows: 47. * URI: namespace of the element 48. * localname: The local name of the element (without the prefix) 49. * QNAME: the limited name (with prefix) of the element 50.*51. */52. @ override 53. public void endelement (string Uri, string localname, string QNAME) 54. throws saxexception {55. system. out. println (this. toblankstring (-- this. frontblankcount) + 56. ">>> end element:" + QNAME + "(" + URI + ")"); 57 .} 58. 59. /* 60. * end the URI ing of the URI range prefix. 61. */62. @ override 63. public void endprefixmapping (string prefix) throws saxexception {64. system. out. println (this. toblankstring (-- this. frontblankcount) + 65. ">>> end prefix_mapping:" + prefix); 66 .} 67. 68. /* 69. * Receives blank notifications that can be ignored in the element content. 70. * The parameter meaning is as follows: 71. * Ch: 72 characters from the XML document. * start: the start position 73 in the array. * length: 74 characters read from the array. */75. @ override 76. public void ignorablewhitespace (char [] CH, int begin, int length) 77. throws saxexception {78. stringbuffer buffer = new stringbuffer (); 79. for (INT I = begin; I <begin + length; I ++) {80. switch (CH [I]) {81. case '\': buffer. append ("\\\\"); break; 82. case '\ R': buffer. append ("\ r"); br Eak; 83. case '\ N': buffer. append ("\ n"); break; 84. case '\ t': buffer. append ("\ t"); break; 85. case '\ "': buffer. append ("\\\" "); break; 86. default: buffer. append (CH [I]); 87 .} 88 .} 89. system. out. println (this. toblankstring (this. frontblankcount) + ">>> ignorable whitespace (" + Length + "):" + buffer. tostring (); 90 .} 91. 92. /* 93. * receive notifications of processing commands. 94. * parameter meaning: 95. * target: Processing Command target 96. * Data: Processing Command data. If not provided, it is null. 97. */98. @ override 99. public void processinginstruction (string target, string data) 100. throws saxexception {101. system. out. println (this. toblankstring (this. frontblankcount) + ">>> process instruction: (target = \" "102. + target + "\", Data = \ "" + Data + "\") "); 103 .} 104. 105. /* 106. * receives the object used to find the origin of the sax document event. 107. * The parameter meaning is as follows: 108. * locator: returns the object 109 of any event location in the sax document. */110. @ override 111. public void setdocumentlocator (locator Locator) {112. system. out. println (this. toblankstring (this. frontblankcount) + 113. ">>> set document_locator: (linenumber =" + locator. getlinenumber () 114. + ", columnnumber =" + locator. getcolumnnumber () 115. + ", systemid =" + locator. get systemid () 116. + ", publicid =" + locator. getpubl ICID () + ")"); 117. 118.} 119. 120./* 121. * receives notifications of skipped entities. 122. * parameter meaning: 123. * Name: name of the object to be skipped. If it is a parameter entity, the name will start with '%', 124. * if it is an external DTD subset, it will be the string "[DTD]" 125. */126. @ override 127. public void skippedentity (string name) throws saxexception {128. system. out. println (this. toblankstring (this. frontblankcount) + 129. ">>> skipped_entity:" + name); 130 .} 131. 132. /* 133. * receives notifications of the beginning of a document. 134. */135. @ override 136. public void startdocument () throws saxexception {137. system. out. println (this. toblankstring (this. frontblankcount ++) + 138. ">>> start document"); 139 .} 140. 141. /* 142. * receives notifications starting with an element. 143. * The parameter meaning is as follows: 144. * URI: namespace of the element 145. * localname: The local name of the element (without the prefix) 146. * QNAME: the qualified name (with prefix) of the element 147. * ATTS: the attribute set of the element 148. */149. @ override 150. public void startelement (string Uri, string localname, string QNAME, 151. attributes ATTS) throws saxexception {152. system. out. println (this. toblankstring (this. frontblankcount ++) + 153. ">>> start element:" + QNAME + "(" + URI + ")"); 154 .} 155. 156. /* 157. * Start Prefix URI namespace range ing. 158. * The event information is not required for normal namespace processing: 159. * When the http://xml.org/sax/features/namespaces function is true (default), 160. * The sax XML reader automatically replaces the prefix of the element and attribute name. 161. * The parameter meaning is as follows: 162. * Prefix: the prefix is 163. * URI: namespace 164. */165. @ override 166. public void startprefixmapping (string prefix, string URI) 167. throws saxexception {168. system. out. println (this. toblankstring (this. frontblankcount ++) + 169. ">>> start prefix_mapping: xmlns:" + prefix + "=" 170. + "\" "+ URI +" \ ""); 171. 172 .} 173. 174. private string toblankstring (INT count) {175. stringbuffer buffer = new stringbuffer (); 176. for (INT I = 0; I <count; I ++) 177. buffer. append (""); 178. return buffer. tostring (); 179 .} 180. 181 .}

2, dtdhandlerInterface: The processor interface that receives notifications for DTD-related events.

Import org. xml. Sax. dtdhandler; import org. xml. Sax. saxexception; public class mydtdhandler implements dtdhandler {/** receives notifications about Annotation declaration events. * The parameter indicates the following: * Name-Comment name. * Publicid-Public identifier of the comment. If not provided, it is null. * Systemid-annotation system identifier. If not provided, it is null. * // @ Overridepublic void notationdecl (string name, string publicid, string systemid) throws saxexception {system. out. println (">>> notation declare: (name =" + name + ", systemid =" + publicid + ", publicid =" + systemid + ")");} /** receive the notification of unresolved entity declaration events. * The parameter indicates the following: * name-the name of the unresolved object. * Publicid-Public identifier of the object. If not provided, it is null. * Systemid-system identifier of the object. * Notationname-name of the related comment. * // @ Overridepublic void unparsedentitydecl (string name, string publicid, string systemid, string notationname) throws saxexception {system. out. println (">>> unparsed entity declare: (name =" + name + ", systemid =" + publicid + ", publicid =" + systemid + ", notationname = "+ notationname + ")");}}

3. entityresolverInterface: it is the basic interface used to parse objects.

Import Java. io. ioexception; import Org. XML. sax. entityresolver; import Org. XML. sax. inputsource; import Org. XML. sax. saxexception; public class myentityresolver implements entityresolver {/** allows the application to parse external entities. * The parser will call this method before opening any external entity (except for top-level document entities) * parameter meaning: * publicid: Public identifier of the referenced external entity. If not provided, it is null. * Systemid: The system identifier of the referenced external entity. * Return: * an inputsource object that describes the new input source, or null is returned. * The regular URI connection to the system identifier is opened by the request parser. * // @ Overridepublic inputsource resolveentity (string publicid, string systemid) throws saxexception, ioexception {return NULL ;}}

4, errorhandlerInterface: Is the Basic interface of the error handling program.

Import Org. XML. sax. errorhandler; import Org. XML. sax. saxexception; import Org. XML. sax. saxparseexception; public class myerrorhandler implements errorhandler {/** receive recoverable error notifications */@ overridepublic void error (saxparseexception e) throws saxexception {system. err. println ("error (" + E. getlinenumber () + "," + E. getcolumnnumber () + "):" + E. getmessage ();}/** receives notifications of unrecoverable errors. * // @ Overridepublic void fatalError (saxparseexception e) throws saxexception {system. err. println ("fatalError (" + E. getlinenumber () + "," + E. getcolumnnumber () + "):" + E. getmessage ();}/** receives notifications of unrecoverable errors. * // @ Overridepublic void warning (saxparseexception e) throws saxexception {system. err. println ("warning (" + E. getlinenumber () + "," + E. getcolumnnumber () + "):" + E. getmessage ());}}

TestThe main method of the class to print the event information when parsing books. xml.

Import Java. io. filenotfoundexception; import Java. io. filereader; import Java. io. ioexception; import Org. XML. sax. contenthandler; import Org. XML. sax. dtdhandler; import Org. XML. sax. entityresolver; import Org. XML. sax. errorhandler; import Org. XML. sax. inputsource; import Org. XML. sax. saxexception; import Org. XML. sax. xmlreader; import Org. XML. sax. helpers. xmlreaderfactory; public class test {public static void main (string [] ARGs) throws saxexception, filenotfoundexception, ioexception {// create the processor contenthandler = new mycontenthandler () to process the event content; // create the handler errorhandler = new myerrorhandler (); // create the processor dtdhandler that processes DTD-related events = new mydtdhandler (); // create the entity parser entityresolver = new myentityresolver (); // create an XML Parser (read and parse XML using the sax method) xmlreader reader = xmlreaderfactory. createxmlreader ();/** set attributes related to the parser * http://xml.org/sax/features/validation = true indicates that the verification feature is enabled * http://xml.org/sax/features/namespaces = true indicates that the namespace feature is enabled */reader. setfeature ("http://xml.org/sax/features/validation", true); reader. setfeature ("http://xml.org/sax/features/namespaces", true); // set the processor reader for processing document content-related events. setcontenthandler (contenthandler); // sets the reader for processing the error event of the XML parser. seterrorhandler (errorhandler); // set the processor reader for processing DTD-related events of the XML parser. setdtdhandler (dtdhandler); // sets the object parser reader of the XML parser. setentityresolver (entityresolver); // parse books. XML document reader. parse (New inputsource (New filereader ("books. XML ")));}}

Books. xmlThe file content is as follows:

<?xml version="1.0" encoding="GB2312"?><books  count="3" xmlns="http://test.org/books"><!--books's comment--><book id="1"><name>Thinking in JAVA</name></book><book id="2"><name>Core JAVA2</name></book><book id="3"><name>C++ primer</name></book></books>

The console output is as follows:

 

>>> Set document_locator: (linenumber = 1, columnnumber = 1, systemid = NULL, publicid = NULL)
>>> Start document
Error (2, 7): Document is invalid: No grammar found.
Error (2, 7): Document root element "books", must match doctype root "null ".
>>> Start prefix_mapping: xmlns: = "http://test.org/books"
>>> Start element: Books (http://test.org/books)
>>> Characters (2): \ n \ t
>>> Characters (2): \ n \ t
>>> Start element: Book (http://test.org/books)
>>> Characters (3): \ n \ t
>>> Start element: Name (http://test.org/books)
>>> Characters (16): Thinking in Java
>>> End element: Name (http://test.org/books)
>>> Characters (2): \ n \ t
>>> End element: Book (http://test.org/books)
>>> Characters (2): \ n \ t
>>> Start element: Book (http://test.org/books)
>>> Characters (3): \ n \ t
>>> Start element: Name (http://test.org/books)
>>> Characters (10): Core Java2
>>> End element: Name (http://test.org/books)
>>> Characters (2): \ n \ t
>>> End element: Book (http://test.org/books)
>>> Characters (2): \ n \ t
>>> Start element: Book (http://test.org/books)
>>> Characters (3): \ n \ t
>>> Start element: Name (http://test.org/books)
>>> Characters (10): c ++ Primer
>>> End element: Name (http://test.org/books)
>>> Characters (2): \ n \ t
>>> End element: Book (http://test.org/books)
>>> Characters (1): \ n
>>> End element: Books (http://test.org/books)
>>> End prefix_mapping:
>>> End document

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.