XML (Extensible Markup Language)
How to parse XML documents: Sax and Dom
1. Dom (Document Object Model) parsing XML documents are to resolve all XML documents into a tree in the memory,
Easy to operate, but poor performance when XML documents are large
2. Sax (Simple API for XML) is a line-by-line XML scan. It can be parsed while scanning and can be stopped at any time.
Sax principle:
It refers to the sequential scanning of documents, and the notification time processing function (these
All functions are callback functions. The event processing function performs the corresponding action and continues scanning until the end of the document.
Basic events:
1. Trigger the Document Processing Event at the beginning and end of the document
2. Each XML Element in the Document receives a trigger element event before and after parsing.
3. Any metadata is usually delivered by a separate event
4. Produce a DTD or schema event when processing the document's DTD or Schema
5. An error event is generated to notify the Host application of parsing errors.
Procedure for parsing a document using sax:
1. Create an event handler.
2. Create a SAX Parser.
3. Allocate the event handler to the parser.
4. parse the document and send each event to the handler.
Common interfaces of sax: contenthandler
Void startdocument ()
Receive notification of the beginning of a document.
Void enddocument ()
Receive notification of the end of a document.
Void startelement (string Uri, string localname, string QNAME, attributes ATTS)
Receive notification of the beginning of an element.
Void endelement (string Uri, string localname, string QNAME)
Receive notification of the end of an element.
Void characters (char [] CH, int start, int length)
Receive notification of character data.
The following code describes how to use SAX:
HttpDownloader hd = new HttpDownloader();String resultStr = hd.download("http://XXX");try{SAXParserFactory factory = SAXParserFactory.newInstance();XMLReader reader = factory.newSAXParser().getXMLReader();reader.setContentHandler(new MyContentHandler());reader.parse(new InputSource(new StringReader(resultStr))); }catch(Exception e){e.printStackTrace();}
Mycontenthandler is a class of the contenthandler interface implemented.
The following describes the parameters of the callback function:
Startelement
Uri
The namespace URI, or the empty string if the element has no namespace URI or if namespace processing is not being completed MED
Localname
The local name (without prefix), or the empty string if namespace processing is not being completed MED
QNAME
The qualified name (with prefix), or the empty string if qualified names are not available
ATTS
The attributes attached to the element. If there is no attributes, it shall be an empty attributes object. The value of this object after startelement returns is undefined
Characters
Ch
The characters from the XML document
Start
The start position in the array
Length
The number of characters to read from the array
The above mainly analyzes the steps and corresponding methods for parsing an XML file. A complete code will be added later to parse an XML file.