Original: http://yangjunfeng.iteye.com/blog/401377
JAVA parsing XML is usually in two ways, DOM and SAX. Although DOM is the standard of the consortium, provides a standard parsing method, but its parsing efficiency has been unsatisfactory, because when parsing XML using the DOM, the parser reads the entire document and constructs a tree structure (the node tree) that resides in memory, and then your code can manipulate the tree structure using the standard interface of the DOM. But most of the time we're only interested in part of the document, and it's very time-consuming to index some of the data we need from the root node of the node tree without parsing the entire document first.
Sax is an alternative method of XML parsing. Compared to the Document Object Model Dom,sax is a faster, lighter volume of reading and manipulating XML data
Method. SAX allows you to process a document when it is read, so that you do not have to wait for the entire document to be stored before taking action. It does not involve the overhead and conceptual jumps necessary for DOM. The SAX API is an event-based API that works with data flow, which processes data sequentially as data flows. SAX API
Notifies you when a certain event occurs while parsing your document. When you respond to it, the data that you do not save will be discarded.
The following is an example of a SAX parsing XML (a bit long, because it details all the methods of Sax event handling), and the SAX API has four main interfaces for handling events, namely Contenthandler,dtdhandler, Entityresolver, and ErrorHandler. The following example may be a bit verbose, in fact, as long as the inheritance of the DefaultHandler class, overwriting some of the methods of handling the event can also achieve the effect of this example, but for the sake of the overall situation, look at the Sax API all the major event resolution methods. (In fact, DefaultHandler implements the four event processor interfaces above, and then provides the default implementation for each abstract method.) )
1,contenthandler interface: A processor interface that receives notification of the logical content of a document.
Import org.xml.sax.Attributes;
Import Org.xml.sax.ContentHandler;
Import Org.xml.sax.Locator;
Import org.xml.sax.SAXException;
Class Mycontenthandler implements contenthandler{StringBuffer Jsonstringbuffer;
int frontblankcount = 0;
Public Mycontenthandler () {jsonstringbuffer = new StringBuffer ();
/* * Receive notification of character data. * Ch[begin:end in DOM] corresponds to the node value of the text node (nodevalue)/@Override public void characters (char[] ch, int begin, int length) t
Hrows saxexception {StringBuffer buffer = new StringBuffer ();
for (int i = begin; I < Begin+length i++) {switch (ch[i)) {case ' \ ': Buffer.append ("\\\\");
Case ' \ R ': Buffer.append ("\ r");
Case ' \ n ': buffer.append ("\\n");
Case ' t ': buffer.append ("\\t");
Case ' \ ': Buffer.append ("\\\");
Default:buffer.append (Ch[i]); } System.out.println (This.toblankstring (this.frontblankcount) + >>> characters ("+length+"): "+buffer.to
String ());
* * * Receive notification of the end of the document. */
@Override public void Enddocument () throws Saxexception {System.out.println this.toblankstring (--this.frontblankcoun
T) + ">>> end Document");
* * * Receive notification of the end of the document. * The parameter meaning is as follows: * URI: Element's namespace * LocalName: element's local name (without prefix) * QName: element's qualified name (with prefix) */@Override public void EndElement (String uri,string localname,string qName) throws Saxexception {System.out.println (this.toblankstring (--th
Is.frontblankcount) + ">>> end element:" +qname+ "(" +uri+ ")");
} * * Ends the mapping of the prefix URI range. */@Override public void endprefixmapping (String prefix) throws saxexception {System.out.println this.toblankstring (-
-this.frontblankcount) + ">>> End prefix_mapping:" +prefix);
* * * receive notification of whitespace that can be ignored in element content. * The parameter meaning is as follows: * Ch: The character from the XML document * Start: The start position in the array * Length: Number of characters read from an array/@Override public void IG Norablewhitespace (char[] ch, int begin, int length) throws Saxexception {StringBuffer buffer = new StringbuFfer ();
for (int i = begin; I < Begin+length i++) {switch (ch[i)) {case ' \ ': Buffer.append ("\\\\");
Case ' \ R ': Buffer.append ("\ r");
Case ' \ n ': buffer.append ("\\n");
Case ' t ': buffer.append ("\\t");
Case ' \ ': Buffer.append ("\\\");
Default:buffer.append (Ch[i]); } System.out.println (This.toblankstring (this.frontblankcount) + ">>> ignorable whitespace (" +length+ "):" +
Buffer.tostring ());
* * * Receive notification of processing instructions.
* The meaning of the parameter is as follows: * Target: processing instruction target * Data: processing instructions, or null if not provided. */@Override public void ProcessingInstruction (String target,string data) throws Saxexception {System.out.println ( This.toblankstring (This.frontblankcount) + ">>> process instruction: (target = \" "+target+" \ ", data = \" "+data
+"\")");
* * * Receives the object used to find the origin of the SAX document event. * The parameter meaning is as follows: * Locator: Object that can return any SAX document event location/@Override public void Setdocumentlocator (Locator locator) {Syst Em.out.println (this.toblAnkstring (This.frontblankcount) + ">>> set Document_locator: (linenumber =" +locator.getlinenumber () + ", C Olumnnumber = "+locator.getcolumnnumber () +", SystemID = "+locator.getsystemid () +", publicID = "+locator.getpublici
D () + ")");
* * * receive notification of skipped entities. * The parameter meaning is as follows: * Name: The name of the entity skipped. If it is a parameter entity, the name begins with '% ', * if it is a subset of the external DTD, it will be the string "[DTD]"/@Override public void Skippedentity (string name) Throws Saxexception {System.out.println (this.toblankstring (this.frontblankcount) + ">>> skipped_entity:"
+name);
* * * Receive notification of the start of the document. */@Override public void Startdocument () throws Saxexception {System.out.println this.toblankstring (THIS.FRONTBLANKCO
unt++) + ">>> start document");
/* * Receive notification of element start. * The parameter meaning is as follows: * URI: Element's namespace * LocalName: element's local name (without prefix) * QName: element's qualified name (with prefix) * Atts: Element's Attribute Collection * * @Overri De public void startelement (string uri, String localname, String qName, Attributes aTTS) throws Saxexception {System.out.println (this.toblankstring (this.frontblankcount++) + ">>> start Elemen
T: "+qname+" ("+uri+") ");
/* * Start prefix URI namespace scope mapping.
* The information for this event is not required for regular namespace processing: * When the Http://xml.org/sax/features/namespaces function is True (default), the * sax XML reader will automatically replace the prefix of element and attribute names.
* parameter meaning is as follows: * prefix: Prefix * URI: namespace */@Override public void startprefixmapping (String prefix,string URI) Throws Saxexception {System.out.println (this.toblankstring (this.frontblankcount++) + ">>> start Prefix_mapp
Ing:xmlns: "+prefix+" = "+" "" "+uri+" "" ");
Private String toblankstring (int count) {StringBuffer buffer = new StringBuffer ();
for (int i = 0;i<count;i++) buffer.append ("");
return buffer.tostring ();
}
}
2,dtdhandler interface: A processor interface that receives notification of events related to DTDs.
Import Org.xml.sax.DTDHandler;
Import org.xml.sax.SAXException;
public class Mydtdhandler implements Dtdhandler {/* * receives notification of annotation declaration events.
* parameter meaning is as follows: * Name-comment name.
* publicID-The public identifier for the annotation, or null if not provided.
* SystemID-The system identifier of the annotation, or null if not provided. */@Override public void Notationdecl (string name, String publicid, String systemid) throws Saxexception {SYSTEM.O
Ut.println (">>> notation declare: (name =" +name + ", SystemID =" +publicid + ", publicID =" +systemid+ ")");
/* * Receive notification of unresolved entity declaration events.
* The parameter meaning is as follows: * Name-the name of the unresolved entity.
* publicID-The public identifier for the entity, or null if not provided.
* SystemID-The system identifier for the entity.
* Notationname-the name of the related comment.
*/@Override public void Unparsedentitydecl (string name, String publicid, String systemid, String notationname) throws Saxexception {System.out.println (">>> unparsed entity declare: (name =" +n Ame + ", SystemID =" +publicid + ", publicID =" +systemid + ", NotationnamE = "+notationname+") ");
}
}
3,entityresolver interface: is the basic interface for parsing entities.
Import java.io.IOException;
Import Org.xml.sax.EntityResolver;
Import Org.xml.sax.InputSource;
Import org.xml.sax.SAXException;
The public class Myentityresolver implements Entityresolver {/*
* Allows applications to parse external entities.
* The parser will call this method before opening any external entities (except the top-level document entities) the meaning of the
parameter is as follows:
* publicid: The public identifier of the referenced external entity, or null if not supplied.
* SystemID: The system identifier of the referenced external entity.
* return:
* A InputSource object that describes the new input source, or NULL,
* to open a regular URI connection to the system identifier with the request parser.
*
/@Override public
inputsource resolveentity (String publicid, String systemid)
throws Saxexception, IOException {return
null;
}
}
4,errorhandler interface: is the basic interface for error handlers.
Import Org.xml.sax.ErrorHandler;
Import org.xml.sax.SAXException;
Import org.xml.sax.SAXParseException;
public class MyErrorHandler implements ErrorHandler {
/
* * Notification of a recoverable error/
@Override public
Void Error (Saxparseexception e) throws saxexception {
System.err.println ("Error (" +e.getlinenumber () + ","
+ E.getcolumnnumber () + "):" +e.getmessage ());
}
/
* * Receive notification of unrecoverable errors.
*
/@Override public
void FatalError (Saxparseexception e) throws saxexception {
System.err.println ("FatalError" ("+e.getlinenumber () +", "
+e.getcolumnnumber () +"): "+e.getmessage ());
}
/
* * Receive notification of unrecoverable errors.
*
/@Override public
void Warning (saxparseexception e) throws saxexception {
System.err.println ( "Warning (" +e.getlinenumber () + ","
+e.getcolumnnumber () + "):" +e.getmessage ());
}
The main method of the Test class prints event information when parsing books.xml.
Import java.io.FileNotFoundException;
Import Java.io.FileReader;
Import java.io.IOException;
Import Org.xml.sax.ContentHandler;
Import Org.xml.sax.DTDHandler;
Import Org.xml.sax.EntityResolver;
Import Org.xml.sax.ErrorHandler;
Import Org.xml.sax.InputSource;
Import org.xml.sax.SAXException;
Import Org.xml.sax.XMLReader;
Import Org.xml.sax.helpers.XMLReaderFactory;
public class Test {public static void main (string[] args) throws Saxexception, FileNotFoundException, IOException {
Create a processor ContentHandler ContentHandler = new Mycontenthandler () for handling events related to document content;
Create handle Error event handler ErrorHandler ErrorHandler = new MyErrorHandler ();
Create a processor that handles DTD-related events dtdhandler Dtdhandler = new Mydtdhandler ();
Create the entity parser entityresolver entityresolver = new Myentityresolver ();
Create an XML parser (read parse XML through sax) XMLReader reader = Xmlreaderfactory.createxmlreader (); * * Set the relevant characteristics of the parser * Http://xml.org/sax/features/validation = True to open the validation feature * Http://xml.org/sax/features/na Mespaces = True to open the namespace attribute */reader.setfeature ("Http://xml.org/sax/features/validation", true);
Reader.setfeature ("Http://xml.org/sax/features/namespaces", true);
Sets the processor Reader.setcontenthandler (ContentHandler) that handles events related to document content in the XML parser;
Sets the processing error event handler for the XML Parser Reader.seterrorhandler (ErrorHandler);
Sets the processor Reader.setdtdhandler (Dtdhandler) that handles DTD-related events for the XML parser;
Sets the entity parser Reader.setentityresolver (Entityresolver) of the XML parser;
Resolves the books.xml document Reader.parse (New InputSource ("books.xml"));
}
}
The contents of the Books.xml file are as follows:
<?xml version= "1.0" encoding= "GB2312"?> <books count= "3" xmlns= "Http://test.org/books" >
<!--Books ' s comment-->
<book id= "1" >
<name>thinking in java</name>
</book >
<book id= "2" >
<name>core java2</name>
</book>
<book id= "3" >
<name>c++ primer</name>
</book>
</books>
The console output is as follows:
>>> set Document_locator: (linenumber = 1,columnnumber = 1,systemid = Null,publicid = null)
>>> Start Document
Error (2,7): Document is Invalid:no grammar found.
Error (2,7): Document root element "books", must match DOCTYPE root "null".
>>> start Prefix_mapping:xmlns: = "Http://test.org/books"
>>> start Element:books (http://test.org/books)
>>> characters (2): \n\t
>>> characters (2): \n\t
>>> start Element:book (http://test.org/books)
>>> characters (3): \n\t\t
>>> start Element:name (http://test.org/books)
>>> characters: Thinking in JAVA
>>> End Element:name (http://test.org/books)
>>> characters (2): \n\t
>>> End Element:book (http://test.org/books)
>>> characters (2): \n\t
>>> start Element:book (http://test.org/books)
>>> characters (3): \n\t\t
>>> start Element:name (http://test.org/books)
>>> characters (a): Core JAVA2
>>> End Element:name (http://test.org/books)
>>> characters (2): \n\t
>>> End Element:book (http://test.org/books)
>>> characters (2): \n\t
>>> start Element:book (http://test.org/books)
>>> characters (3): \n\t\t
>>> start Element:name (http://test.org/books)
>>> characters (+): C + + Primer
>>> End Element:name (http://test.org/books)
>>> characters (2): \n\t
>>> End Element:book (http://test.org/books)
>>> characters (1): \ n
>>> End Element:books (http://test.org/books)
>>> End prefix_mapping:
>>> End Document