Dom parsing and sax parsing, dom parsing and sax Parsing
There are four ways to parse xml: DOM, SAX, DOM4j, and JDOM.
We mainly learned two types: DOM and SAX.
DOM is suitable for parsing relatively simple XML, while SAX is suitable for parsing more complex XML files. Each has its own merits.
Differences Between DOM and SAX:
1. DOM is memory-based. No matter how large the file is, all content will be preloaded into the memory. This consumes a lot of memory space. However, SAX is event-based. When an event is triggered, partial XML data is obtained. Therefore, no matter how large the XML file is, only a small amount of memory space is occupied.
2. DOM can read XML or insert data into XML files, while SAX can only read XML, rather than insert data into the file. This is also a disadvantage of SAX.
3. Another disadvantage of SAX: DOM allows us to specify the elements to be accessed for random access, while SAX does not. SAX executes traversal from the beginning of the document. And can only be traversed once. That is to say, we cannot randomly access XML files. We can only traverse the XML files once from start to end (of course, we can also cut off the traversal in the middle ).
DOM and SAX Parse XML files:
1. When using DOM to parse XML, we first need to get a factory API through the DocumentBuilderFactory class so that the application can get the parser that generates the object tree from the XML document. Through the newinstance () method of this class, we can obtain a new instance of this class. Create a DocumentBuilder instance to obtain the DOM document instance from the XML document. The DocumentBuilder instance can be obtained through the newDocumentBuilder () method of the DocumentBuilderFactory class. Then, the content of the given file is parsed into an XML Document using the parse (InputStream is) method of the DocumentBuilder class, and a new DOM Document Object is returned. The procedure is as follows:
File file = new File ("D: \ encoding \ xml \ domxml. xml ");
DocumentBuilderFactory dbf = DocumentBuilderFactory. newInstance ();
Try {
DocumentBuilder db = dbf. newDocumentBuilder ();
Document dt = db. parse (file );
} Catch (Exception e ){
// TODO automatically generates catch Blocks
E. printStackTrace ();
}
In this case, we can start to parse the document. With NodeList, nodes can obtain the Node list and nodes respectively. NodeList can be used by getElementsByTagName ("element name"); of the Document Object to obtain the node list of the desired element. The node is obtained by the item (integer parameter) method of NodeList. You can use the getFirstChild (). getNodeValue () method of Node to obtain the value of a text Node. The method is as follows:
NodeList nl = dt. getElementsByTagName ("star"); // star is an element name.
Node nd = nl. item (0); // obtain the first Node of Node list nl.
In addition, we can use loops to retrieve all nodes in the node list.
When an element has child elements, we can obtain the child node list through the parent node list. The method is as follows:
NodeList nlF = dt. getElementsByTagName ("star"); // star is an element name.
Node ndF = nl. item (0); // obtain the first Node of Node list nl.
For (int I = 0; I <nlF. getLength (); I ++)
{
Node ndF = nlF. item (I); // retrieve all nodes of the element
NodeList nlC = ndF. getChildNodes (); // get the node list of the child element
For (int j = 0; j <nlC. getLength (); j ++)
{
Node ndC = nlC. item (j );
If (ndC. getNodeType () = ndC. ELEMENT_NODE ){
String str = ndC. getFirstChild (). getNodeValue (); // retrieves a text node
System. out. println (str); // print the text node
}
}
}
}
/** Note */The child element node obtained through the parent node list contains the nodes occupied by spaces. Therefore, if (ndC. getNodeType () = ndC. ELEMENT_NODE) to determine whether to obtain the Element Node (ndC. getNodeType () is a node type, ndC. ELEMENT_NODE node types are element nodes (Note: A text node is a subnode of an element node )).
2. The most common SAX Parser is JAXP. JAXP is provided as a javax. xml package, including the Java interfaces of SAX and DIM and the Java XML Parser must be the basic interfaces and classes for implementation. By inheriting the DefaultHandle class, you can traverse the XML document:
Void startDocument ()
// Receive the notification of the beginning of the document
Void endDocument ()
// Receive the end notification of the document
Void startElement (String uri, String localName, String qName, Attributes attributes)
// Receive notifications starting with the element
Void endElement (String uri, String localName, String qName)
// Receives the notification that the element ends.
Void characters (char [] ch, int start, int length)
// Receive notifications of character data
The method for obtaining the SAX Parser is similar to that for DOM. The method for getting the SAX Parser is as follows:
SAXParserFactory spf = SAXParserFactory. newInstance (); // get a new instance of the factory.
Try {
SAXParser sp = spf. newSAXParser (); // get the instance of the SAXParser class
File file = new File ("E: \ yuxin_document \ xml \ mytest. xml"); // XML File
SaxHandle handle = new SaxHandle (); // SaxHandle This is a custom class that inherits DefaultHandle and implements XML traversal.
Sp. parse (file, handle); // parse an XML document
} Catch (ParserConfigurationException e ){
// TODO automatically generates catch Blocks
E. printStackTrace ();
} Catch (SAXException e ){
// TODO automatically generates catch Blocks
E. printStackTrace ();
} Catch (IOException e ){
// TODO automatically generates catch Blocks
E. printStackTrace ();
}
** To clearly display the content in XML, we also need to display the parsed content on the GUI. Generally, tables are used.
Combination of SAX and DOM
We can use SAX to parse XML and then use DOM to write the parsed data into an XML file.
You can use DOM to create an XML file as follows:
First, you need to get a Document Object written into the XML file:
Document doc = incluenthelper. createDocument ();
Then you can add content in this document, for example:
Element root = doc. addElement ("root"); // write a root Element "root"
Element el = root. addElement ("child Element"); // write a child Element to the root Element
After writing all the information to the Document, you can write the Document into the XML file. The method is as follows:
FileWriter file = new FileWriter ("D: \ doc. xml"); // get a file output stream object
Document. write (file); // write the document content to the file (Document is the document obtained above
Object)
This is a simple write method.
If you want to format an XML file, you can:
FileOutputStream fos = new FileOutputStream ("D: \ doc. xml ");//
Get a file output stream object
OutputFormat format = OutputFormat. createPrettyPrint (); // get OutputFormat
Format A Class Object
XMLWriter xw = new XMLWriter (fos, format); // obtain one of the XMLWriter interfaces of the XML output stream.
Instances
Xw. write (doc); // write the document into XML
Fos. close (); // close the output stream
Xw. close ();
What is the difference between Dom and SAX in xml parsing?
SAX concepts
SAX is the abbreviation of Simple API for XML. It is not a standard officially proposed by W3C. It can be said that it is a "folk" fact standard. In fact, it is a product of community discussions. Even so, there is no less DOM than the application of SAX in XML, and almost all XML parser will support it.
Compared with DOM, SAX is a lightweight method. We know that when processing the DOM, We need to read the entire XML document, and then create a DOM tree in the memory to generate each Node object on the DOM tree. When the document is small, this will not cause any problems, but once the document is large, it will become quite time-consuming and laborious to process the DOM. In particular, its demand for memory will also multiply, making it uneconomical to use DOM in some applications (such as in the applet ). At this time, a better alternative solution is SAX.
SAX is conceptually different from DOM. First of all, unlike the DOM document driver, it is event-driven, that is, it does not need to read the entire document, and the document reading process is the parsing process of SAX. Event-driven is a program running method based on the callback mechanism. (If you have a clear understanding of the new proxy event model in Java, this mechanism will be easily understood)
When XMLReader accepts XML documents, the XML documents are parsed during reading. That is to say, the process of reading the documents and the process of parsing are performed at the same time, which is very different from the DOM. Before parsing, You need to register a ContentHandler with XMLReader, which is equivalent to an event listener. Many methods are defined in ContentHandler, such as startDocument (), which is customized During the parsing process, something that should be handled at the beginning of the document. When XMLReader reads the appropriate content, it will throw the corresponding event and delegate the event processing permission to ContentHandler, and call the corresponding method to respond.
Describe the parsing steps of the DOM parser and the SAX Parser respectively
Wait .. Estimated tragedy ..