XML parsing --- dom parsing and sax Parsing

Source: Internet
Author: User
Currently, two methods are used for XML parsing: 1. dom parsing: (using entobjectmodel, that is, the Document Object Model) is a method recommended by W3C for XML parsing. Use dom to parse XML documents. The parser first loads the XML documents into the memory, generates the document objects corresponding to the XML documents, and then converts each Tag Element in the XML documents

Currently, two methods are used for XML parsing: 1. dom parsing: (Document Object Model, that is, Document Object Model) is a method recommended by W3C for XML parsing. Use dom to parse XML documents. The parser first loads the XML documents into the memory, generates the document objects corresponding to the XML documents, and then converts each Tag Element in the XML documents

Currently, two methods are used for XML parsing:

1. dom parsing: (Document Object Model, that is, Document Object Model) is a method recommended by W3C for parsing XML.
Use dom to parse an XML document. The parser first loads the XML document into the memory and generates the document Object corresponding to the XML document, then, the tag elements in the XML document are converted into the corresponding Element object, the Text will become the Text object, and the Attribute will become the Attribute object, and save the relationships of these objects based on the relationships between the tags, text, and attributes in the XML document.

Disadvantage: memory consumption, so too many XML documents cannot be parsed when using dom to parse XML documents; otherwise, memory overflow may occur.
Advantage: Using dom to parse XML documents can easily perform addition, deletion, modification, and query operations (operations can be performed directly based on the object corresponding to the node ).

2. sax parsing: Simple API for XML is not an official standard, but it is a de facto standard of the XML community. Almost all XML parsers support it.

Use sax to parse XML documents. The parser reads from the top down, reads a row, and parses a row;

Advantage: Because it parses XML documents by reading a row and parsing a row, it will not put pressure on memory.
Disadvantage: it is not suitable for adding, deleting, modifying, and querying operations (it is also because it reads a row to parse a row when parsing the XML document, so it cannot perform back operations ), it is only suitable for reading XML documents.

========================================================== ========================================================== ======================================

Supplement:

XML parsing Development Kit: Jaxp (sun), Jdom, dom4j;

========================================================== ========================================================== ======================================

Adjust the JVM memory size:


When the memory of the XML file to be parsed is large and the node data in the XML file needs to be operated, it is obviously inconvenient to use these two parsing methods, in this case, you need to adjust the JVM memory size.


By default, the maximum memory capacity allowed by JVM is 64 MB. (the default maximum capacity value varies depending on jdk versions. jdk 170 is 64 MB, and jdk 7 is MB ).

To adjust the JVM memory size, run the following command:-Xmx memory size unit ):

In the Eclipse project navigation box, right-click the corresponding Java program, Run As, Open Run Dialog...> open the Run dialog box. Select the Arguments option. There are two input boxes in the window. The first is the program parameter input box, and the second is the VM parameter input box, enter Xmx200M In the parameter input box of the second VM and click the Run button in the lower right corner to execute the corresponding Java program. The error of OutOfMemoryError will not be reported.

========================================================== ========================================================== ======================================

XML parsing development kit:
1. JAXP: The JAXP Development Kit is part of J2SE. It consists of javax. xml, org. w3c. dom, org. xml. sax package and its sub-packages.
In javax. xml. the parsers package defines several factory classes. programmers can call these factory classes to obtain the dom or sax Parser of the XML document to parse the XML document.

First, create a factory:
DocumentBuilderFactory factory = DocumentBuilderFactroy. newInstance (); // because the DocumentBuilderFactory class is an abstract class, its object cannot be new and can only be obtained by calling its static method.
Secondly, the dom parser is obtained:
DocumentBuilder builder = factory. newDocumentBuilder ();
Then, load the XML Document to obtain the Document object representing the Document:
Document document = builder. parse ("*. xml ");
After obtaining the document object representing the XML document, you can operate on each node in the XML document.

========================================================== ========================================================== ======================================

Supplement:
In dom parsing, each component of an XML document is represented by an object. For example, the tag uses Element and Attribute, but no matter what object is, it is a subclass of Node, therefore, any Node obtained can be treated as a Node during development.

XML programming (CRUD)
Create, read, update, delete
Add, query, update, and delete;

In addition to the two parsing methods, there are also other parsing methods...
========================================================== ========================================================== ======================================

When adding, modifying, and deleting an XML document, you must update the document object and the XML document (rewrite the updated document object to the XML document ).

The Transformer class in the javax. xml. transform package is used to convert the Document object representing the XML Document into a certain format and then output it. For example, the XML Document is converted into an HTML Document after applying the style sheet. This object can also be used to re-write the Document object to an XML Document. Source and destination. You can use:
Javax. xml. transform. dom. DOMSource class to associate the document object to be converted,
Use the javax. xml. transform. stream. StreamResult object to represent the data destination.
The Transformer object is obtained through TransformerFactory.
Transformer class completes the conversion operation through the transform method, which receives
(TransformerFactory) Transformer conversion method (DOMSource source, StreamResult destination );))
========================================================== ========================================================== ======================================

SAX parsing:

The event processing method is used to parse XML files. The XML file is parsed using the SAX method, which involves two parts: the parser and the event processor:
The parser can be created using the jaxp api. After creating a SAX Parser, you can specify a parser to parse an XML document.
When the parser parses an XML document using the SAX method, it will call a method of the event processor as long as it is parsed to a specified part of the XML document, when the parser calls the method of the event processor, it will pass the content of the XML file currently parsed as a method parameter to the event processor.
The event processor is compiled by the programmer. The programmer can easily obtain the data parsed by the SAX Parser through the parameters of the method in the event processor, thus determining how to process the data.

1. Create a resolution factory;
SAXParserFactory fac = SAXParserFactory. newInstance ();

2. Get the parser;
SAXParser sp = fac. newSAXParser ();

3. Get the reader;
XMLReader re = sp. getXMLReader ();

4. Set the content processor;
Re. setContentHandler (new ContentHandler () {/* code block implementing the interface */});
(Or: re. setContentHandler (new DefaultHandler ();/* the parameter is a subclass of the DefaultHandler class */)
The first method is to parse the entire XML document, and the second method can only parse a tag;
In fact, there is also a content processor that first inherits the DefaultHandler class and then encapsulates the parsed content into the bean object.

5. Read XML documents;
Re. parse ("*. xml ");

========================================================== ========================================================== ======================================

XML parsing development kit:

2. dom4j:

SAXReader saxReader = new SAXReader ();
Document doc = saxReader. read (new File ());

OutputFormat format = OutputFormat. createPrettyPrint (); // This object indicates that the format is output in beautiful format. Another object is output in compact format;
Format. setEncoding ("UTF-8 ");

XMLWriter xmlWriter = new XMLWriter (new FileOutputStream (), format );
XmlWriter. write (doc); // If the xmlWriter object uses a byte stream, the object first converts the doc object to a byte according to the encoding format specified by the format object, then, the data is handed over to the byte stream for operation.
Writer. close (); // closes the resource.

========================================================== ========================================================== ======================================

XPath:
You can use XPath to quickly locate a node;
List list = document. selectNodes ("// foo/bar"); // obtain all bar nodes under the foo node;

Node node = document. selectSingleNode ("// foo/bar"); // obtain the first bar Node under the foo node;

A single slash is an absolute path starting from the root node;
A double slash is a relative path that starts from all current nodes;

The asterisk (*) indicates that all elements located in the path prior to the asterisk are selected;
For example:
/Aa/bb/* indicates that all elements with paths attached to/aa/bb are selected;
/*/Bbb indicates Selecting All bbb elements with three ancestor elements;
// Bb [@ *] indicates selecting the bb element with any attribute;
// Bb [not (@ *)] indicates that bb elements without attributes are selected;
// Bb [@ id = 'b1 '] indicates selecting the bb element containing the attribute id = 'b1;

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.