First, we will introduce the basic knowledge of sax, Dom, JAXP, JDOM, and dom4j:
(Note: For JAXP | jaxb | jaxm | jaxr | what the JAX-RPC is, view the http://gceclub.sun.com.cn/staticcontent/html/xml/faq/#jaxr)
1. Sax and Dom are two methods for analyzing XML documents (no specific implementation, only interfaces)
So it's not an interpreter. If you have them alone, you cannot process the XML document.
The sax package is org. xml. Sax.
The Dom package is org. W3C. Dom.
The package name is very important and helps you understand the relationship between them.
2. JAXP is an API, which encapsulates the interfaces of Sax and Dom. Based on the sax/DOM, a set of simple APIs are provided for developers.
The JAXP package is javax. xml. parsers.
Let's take a look at the JAXP source file. Its file contains references to the sax or DOM (import)
JAXP is not a specific implementation, but an API. It cannot work if you only have JAXP.
(In fact, JAXP only completes the packaging of Sax and Dom, and generates documentbuilderfactory/documentbuilder
And saxparserfactory saxparser. That is, the factory mode in the design mode. Its advantage is that the specific object (Interpreter) is created by the subclass)
3. xerces interpreter (the fastest XML interpreter on the Earth)
In xerces, The saxparser saxparserfactory documentbuilder documentbuilderfactory defined in JAXP is inherited (extends) and corresponds to saxparserimpl saxparserfactoryimpl documentbuilderimpl documentbuilderfactoryimpl
This is why your classpath only requires xerces. Jar (which includes the sax Dom JAXP) and xercesimpl. jar.
4. When can I use another interpreter, such as crimson?
He is also an interpreter like xerces. It is very simple to replace xercesimpl. jar with crimson. jar.
5. JDOM and dom4j
W3C standard DOM APIs are difficult to use, so some people develop Java-specific XML APIs for ease of use. This is the origin of JDOM, another part of the people came up with their own ideas, so they went to develop dom4j and formed the two APIs today. As for the performance between them, JDOM suffered a huge defeat, dom4j wins. I think JDOM and dom4j are equivalent to sax/DOM + JAXP. You can choose a specific interpreter.
Second: Introduce the technical features of Dom, sax, JDOM, and dom4j:
1: Dom
Dom is the official W3C standard for XML documents in a way unrelated to the platform and language. Dom is a collection of nodes or information fragments organized by hierarchies. This hierarchy allows developers to search for specific information in the tree. To analyze this structure, you usually need to load the entire document and construct a hierarchy before you can do any work. Because it is based on information layers, Dom is considered to be tree-based or object-based. Dom and tree-based processing in the broad sense have several advantages. First, because the tree is persistent in the memory, you can modify it so that the application can change the data and structure. It can also navigate up and down the tree at any time, rather than one-time processing like sax. Dom is much easier to use.
On the other hand, parsing and loading a very large document may be slow and resource-consuming, so it is better to use other methods to process such data. These event-based models, such as sax.
2: Sax
The advantages of this processing are very similar to those of streaming media. The analysis can start immediately, rather than waiting for all data to be processed. In addition, because the application only checks data when reading data, it does not need to store the data in the memory. This is a huge advantage for large documents. In fact, the application does not even have to parse the entire document; it can stop parsing when a condition is met. In general, Sax is much faster than its replacement Dom.
3: Select Dom or sax?
For developers who need to write their own code to process XML documents, choosing Dom or the sax Parsing Model is a very important design decision.
Dom uses a tree structure to access XML documents, while sax uses an event model.
The Dom parser converts an XML document into a tree containing its content and can traverse the tree. The advantage of using Dom to parse the model is that programming is easy. Developers only need to call the build instruction and then use navigation APIs to access the desired Tree node to complete the task. You can easily add and modify elements in the tree. However, because the DOM parser needs to process the entire XML file, the performance and memory requirements are high, especially when a large XML file is encountered. Due to its traversal capability, Dom parser is often used in services that require frequent changes in XML documents.
The SAX Parser uses an event-based model. It triggers a series of events when parsing XML documents. When a given tag is found, it can activate a callback method, tell the method that the label has been found. The memory requirements of sax are usually relatively low, because it allows developers to decide the tag to be processed by themselves. Especially when developers only need to process part of the data contained in the document, the extension capability of Sax is better reflected. However, it is difficult to use the SAX Parser to encode data, and it is difficult to access multiple different data in the same document at the same time.
4. JDOM http://www.jdom.org
JDOM aims to become a Java-specific document model, which simplifies interaction with XML and is faster than Dom. Since JDOM is the first specific Java model, JDOM has been vigorously promoted and promoted. Considering using the Java specification request JSR-102 to ultimately use it as the java standard extension ". JDOM development has started since the beginning of 2000.
JDOM and Dom are mainly different in two aspects. First, JDOM only uses a specific class instead of an interface. This simplifies APIs in some ways, but also limits flexibility. Second, the API uses a large number of collections classes to simplify the use of Java developers who are already familiar with these classes.
The purpose of the JDOM Document declaration is to "use 20% (or less) effort to solve 80% (or more) Java/XML problems" (assumed as 20% based on the learning curve ). JDOM is certainly useful for most Java/XML applications, and most Developers find that APIs are much easier to understand than Dom. JDOM also includes extensive checks on program behavior to prevent users from doing anything meaningless in XML. However, it still requires you to fully understand XML in order to do more than basic work (or even understand errors in some situations ). This may be more meaningful than learning Dom or JDOM interfaces.
JDOM does not contain a parser. It usually uses the sax2 parser to parse and verify the input XML document (although it can also use the previously constructed DOM Representation as the input ). It contains some converters that output the JDOM representation into the sax2 event stream, Dom model, or XML text document. JDOM is an open source code released under the Apache license variant.
5: dom4j http://dom4j.sourceforge.net/
Although dom4j represents completely independent development results, it was originally a smart branch of JDOM. It combines many functions beyond the representation of basic XML documents, including integrated XPath support, XML Schema support, and event-based processing for large or streaming documents. It also provides the option to build document representation. It provides parallel access through the dom4j API and standard DOM interface. It has been under development since the second half of 2000.
To support all these functions, dom4j uses interfaces and abstract basic class methods. Dom4j uses a large number of collections classes in APIs, but in many cases, it also provides alternative methods to allow better performance or more direct encoding methods. The direct advantage is that although dom4j pays for more complex APIs, it provides much greater flexibility than JDOM.
When adding flexibility, XPath integration, and processing large documents, dom4j has the same goals as JDOM: ease of use and intuitive operations for Java developers. It is also committed to becoming a more complete solution than JDOM to achieve the goal of essentially handling all Java/XML problems. When this goal is achieved, it places less emphasis on preventing incorrect application behavior than JDOM.
Dom4j is a very good Java xml api with excellent performance, powerful functionality and extreme ease of use. It is also an open source software. Now you can see that more and more Java software are using dom4j to read and write XML. It is particularly worth mentioning that Sun's jaxm is also using dom4j.
Finally, we recommend that you use dom4j.
JDOM and Dom do not perform well in performance tests, and memory overflow occurs when testing 10 m documents. Dom and JDOM are also worth considering in the case of small documents. Although JDOM developers have already stated that they want to focus on performance issues before the official release, from the performance perspective, it is indeed not recommendable. In addition, Dom is still a good choice. Dom implementation is widely used in multiple programming languages. It is also the basis of many other XML-related standards, because it is officially recommended by W3C (relative to a non-standard Java model ), so it may also be required in some types of projects (such as using DOM in Javascript ).
Sax performs well, depending on its specific parsing method. A sax detects the upcoming XML Stream but does not load it into the memory (of course, some documents are temporarily hidden in the memory when the XML Stream is read ).
Undoubtedly, dom4j is the best. Currently, dom4j is widely used in many open-source projects. For example, the well-known hibernate also uses dom4j to read xml configuration files. If portability is not considered, use dom4j!