Java API for XML Processing (JAXP) allows you to use several different APIs to verify, parse, and convert XML. JAXP provides both ease of use and developer neutrality. This series introduces JAXP, which consists of two parts. This article is the first part to show you how to use API resolution and verification features. The second section describes how to use JAXP for XSL conversion.
Java and XML are undoubtedly the most important programming and development tools in the past five years. Therefore, the APIs used to process XML in the Java language are developed. Two of the most popular ones-the Document Object Model (DOM) and Simple API for XML (SAX)-have had a huge impact, and JDOM and Data Binding APIs have also been generated (seeReferences). It is necessary to thoroughly understand one or two APIs. correct use of all APIs will make you an authoritative one. However, more and more java Developers find that they no longer need to have a broad understanding of Sax and Dom-mainly because of Sun Microsystems's JAXP toolkit.Java API for XML Processing (JAXP)This makes XML even easy to master for Java beginners, and greatly improves the capabilities of senior developers. That is to say, even senior developers who use JAXP have misunderstandings about the APIS they depend on.
This document assumes that you have a basic understanding of Sax and Dom. If you do not understand XML parsing at all, you may need to first read the information about Sax and Dom in the online references, or browse my books (seeReferences). You do not need to be proficient in callback or DOMNode
But you must at least understand that sax and Dom are parsing APIs. This article also helps you understand the differences between them. If you have mastered these basic knowledge, this article will be more helpful to you.
JAXP: API or abstract?
Strictly speaking, JAXP is an API, but more accuratelyAbstraction Layer. It does not provide a new method for parsing XML, is not added to sax or DOM, and does not provide new functions for Java and XML processing. (If you do not believe this, it is correct to read this article .) However, JAXP makes it easier to use Dom and sax to process some difficult tasks. It also allows developers to handle some developer-specific tasks that may be encountered when using Dom and sax APIs in a neutral manner.
|
Gradually advance In earlier versions of the Java platform, JAXP was separately downloaded from the core platform. In Java 5.0, JAXP is already a major Java product. If the latest JDK version already exists (seeReferences), And you have obtained JAXP. |
|
There is no sax, Dom, or another XML parsing API,XML cannot be parsed.. I have seen many requests that compare sax, Dom, JDOM, and dom4j with JAXP, but such a comparison is impossible, the preceding four APIs have different purposes than JAXP. Sax, Dom, JDOM, and dom4j Parse XML. JAXP provides a method to reach the parser and the data involved, but does not provide a new method to parse XML documents. To use JAXP correctly, it is necessary to understand the difference. This also makes it possible that you are far ahead of your XML Development peers.
If you still have questions, make sure that you have the JAXP release (seeGradually advance). Start the web browser and load the jaxp api documentation. Navigatejavax.xml.parsers
The parsing part of the API in the software package. Surprisingly, you will find only six classes. What is the problem with this API? All these classes are located at the top of the existing parser. The two classes are only used for error handling. JAXP is much simpler than people think. So why is there confusion?
|
Located on top Even JDOM and dom4j (seeReferences) And JAXP are located at the top of other resolution APIs. However, both APIs provide different models for accessing data from the sax or DOM, and they use the sax internally (with some tips and modifications) to reach the data they provide to users. |
|
Sun's JAXP and Sun's parser
Many parser/API obfuscation comes from the default parser used by Sun Software Package JAXP and JAXP. In earlier versions of JAXP, Sun includes JAXP APIs (with the six classes and some classes commonly used for conversion)AndA parser called Crimson. Crimson iscom.sun.xml
Part of the software package. In the new version of JAXP-including in JDK-sun has repackaged the Apache xerces Parser (seeReferences). In both cases, although the parser is part of the JAXP release, it is not part of the jaxp api.
It can be considered that JDOM comes with the Apache xerces parser. This parser is not part of JDOM, but is used by JDOM, so it is included to ensure that JDOM can be installed out-of-the-box. The same principle applies to JAXP, but it is not clearly stated that JAXP comes with a parser for immediate use. However, many people use the classes included in Sun's parser as part of the jaxp api itself. For example, the FAQ in newsgroups is usually "How do I use JAXPXMLDocument
Class? What is its role ?" The answer is complicated.
|
What is the package name? When I first opened the source code in Java 1.5, I was amazed at what I saw -- or, more importantly, me.NoSee. No normal Software Packageorg.apache.xerces Xerces is found, because Sun has reassigned the xerces classcom.sun.org.apache.xerces.internal . (I found it a bit abnormal, but no one asked me .) In any case, you can find xerces in JDK. |
|
First,com.sun.xml.tree.XMLDocument
Class is not part of JAXP. It is part of Sun's crimson parser and packaged in early versions of JAXP. Therefore, this problem was misunderstood from the very beginning. The main purpose of JAXP is to provide developer independence when processing the parser. With JAXP, you can use sun's XML parser, Apache's xerces XML parser, and Oracle's XML parser to process the same code. Therefore, using sun-specific classes violates the key points of using JAXP. Have you figured out how this topic becomes complex? APIs and Resolvers in the jaxp release have been combined. Some developers mistakenly use the classes and features in the parser as part of the API, and vice versa.
Now that you have understood all the obfuscation, you can gain a deeper understanding of some code and concepts.
Getting started with Sax
Sax is an event-driven XML processing method. It consists of many callbacks. For example,startElement()
The callback is called every time the SAX Parser encounters the starting mark of an element.characters()
The callback is called by the character data, and thenendElement()
It is called by the end mark of the element. Many Callbacks are used for document processing, errors, and other vocabulary structures. You understand. A sax programmer implements a sax interface to define these callbacks. Sax also providesDefaultHandler
(Inorg.xml.sax.helpers
In the software package) to implement all these callbacks, and provide the default empty implementation of all callback methods. (You will see that this is for the next sectionProcess domDom is the focus .) The sax developer only needs to inherit the class and then implement the method that needs to insert specific logic. Therefore, the key in Sax is to provide the various callback code, and then let the parser trigger one of them when appropriate. The following is a typical sax routine:
- Use the parser of a specific developer to create
SAXParser
Instance.
- Registration callback implementation (for example, by using inheritance
DefaultHandler
Class ).
- Start parsing and stop when the callback implementation starts.
The JAXP sax component provides a simple way to complete all these operations. Without JAXP, The SAX Parser instance must either be from the developer class (for exampleorg.apache.xerces.parsers.SAXParser
), Or you must useXMLReaderFactory
.org.xml.sax.helpers
Software Package ). The problem with the first method is obvious: it is not developer neutral. The problem with the second method is that the factory needs to use the parser classString
The name is used as a parameter (it is also an Apache classorg.apache.xerces.parsers.SAXParser
). Different parser classes can be passedString
And change the parser. If you change the parser name, you do not need to change any import statements, but you still need to re-compile the class. This is obviously not the best solution. It is much easier to change the parser without re-compiling the class.
JAXP provides a better alternative: it allows the parser as a Java System feature. Of course, when downloading the release from sun, you can get the JAXP implementation using sun's xerces version. To change the Parser (for example, to an oracle parser), you need to change the class path settings from one parser implementation to another. HoweverNoYou need to re-compile the code. This is the full magic of JAXP-abstraction.
|
Strange sax developers Open a topic. With Clever coding, you can make the sax application select the parser class to be used from the system features or feature files. However, JAXP provides the same behavior without any work, so most people prefer the JAXP route. |
|
Overview of the sax Parser Factory
JAXPSAXParserFactory
Class is the key to easily changing the parser implementation. You must create a new instance of this class (it will be used later ). After a new instance is created, the factory provides a method for obtaining a parser with the sax function. In fact, JAXP protects the developer-related code so that your code is completely contaminated. The factory also has some other useful features.
In addition to the basic work of creating a SAX Parser instance, the factory allows you to set configuration options. These options affect all parser instances obtained through the factory. Two common options in JAXP 1.3 are:setNamespaceAware(boolean awareness)
AndsetValidating(boolean validating)
. Remember, once these options are set, they will affect all instances obtained from the factory after the method is called.
After the factory is set, callnewSAXParser()
JAXP is returned.SAXParser
Immediately available instance. This class encapsulates the underlying SAX Parser (SAX classorg.xml.sax.XMLReader
). It also prevents you from using any developer-specific additional items of the parser class. (Do you remember the information above?XmlDocument
ClassDiscussion?) This class allows you to start the actual parsing behavior. Listing 1 shows how to create, configure, and use a sax Factory:
List 1. UseSAXParserFactory
import java.io.OutputStreamWriter;import java.io.Writer;// JAXPimport javax.xml.parsers.FactoryConfigurationError;import javax.xml.parsers.ParserConfigurationException;import javax.xml.parsers.SAXParserFactory;import javax.xml.parsers.SAXParser;// SAXimport org.xml.sax.Attributes;import org.xml.sax.SAXException;import org.xml.sax.helpers.DefaultHandler;public class TestSAXParsing { public static void main(String[] args) { try { if (args.length != 1) { System.err.println ("Usage: java TestSAXParsing [filename]"); System.exit (1); } // Get SAX Parser Factory SAXParserFactory factory = SAXParserFactory.newInstance(); // Turn on validation, and turn off namespaces factory.setValidating(true); factory.setNamespaceAware(false); SAXParser parser = factory.newSAXParser(); parser.parse(new File(args[0]), new MyHandler()); } catch (ParserConfigurationException e) { System.out.println("The underlying parser does not support " + " the requested features."); } catch (FactoryConfigurationError e) { System.out.println("Error occurred obtaining SAX Parser Factory."); } catch (Exception e) { e.printStackTrace(); } }}class MyHandler extends DefaultHandler { // SAX callback implementations from ContentHandler, ErrorHandler, etc.} |
InListing 1You can see two JAXP-specific problems occurred when using the factory: unable to obtain or configure the sax factory, and unable to configure the SAX Parser. The first problem is causedFactoryConfigurationError
It usually occurs when you cannot obtain the parser specified in the JAXP implementation or system features. The second problem is causedParserConfigurationException
Indicates that the request feature is unavailable in the parser used. Both problems are easy to handle and should not cause any difficulties when using JAXP. In fact, you may want to write code to try to set several features and skillfully handle the unavailability of a feature.
SAXParser
The instance is obtained when the factory is obtained, the namespace is disabled, and the verification is enabled. Then the resolution starts. Theparse()
MethodHandlerBase
An instance of the help class. The custom processor class inherits from the class. See the code release to view the complete Java list of the implementation of this class (seeDownload). Also passFile
For parsing. However,SAXParser
Class not only contains this method.
Use the SAX Parser
OnceSAXParser
You can do more than passFile
. Because of the Communication Mode of Components in large applications, it is assumed that the creator of the object instance is its user and it is not always secure. A component may be createdSAXParser
Instance, while another component (which may be encoded by another developer) may need to use the same instance. Therefore, JAXP provides a method to determine the parser settings. For example, you can useisValidating()
To determine whether the parser will perform verification, useisNamespaceAware()
To check whether the parser can process namespaces in the XML document. These methods provide you with information about what the parser can do, but only containSAXParser
Instance insteadSAXParserFactory
Users cannot change these features. You must complete this operation at the parser factory level.
There are still many ways to request document parsing. Not acceptableFile
And saxDefaultHandler
Instance,SAXParser
Ofparse()
The method can also accept string-format SaxInputSource
, JavaInputStream
OrURL
, All of them haveDefaultHandler
Instance. Therefore, documents packaged in various formats can still be parsed.
Finally, you can obtain the underlying SAX Parser (org.xml.sax.XMLReader
Instance), and directly throughSAXParser
OfgetXMLReader()
Method to use it. Once the underlying instance is obtained, common sax methods are available. Listing 2 shows the core classes in JAXPSAXParser
Examples of usage of the class in Sax parsing:
Listing 2. Using JAXPSAXParser
Class
// Get a SAX Parser instanceSAXParser saxParser = saxFactory.newSAXParser();// Find out if validation is supportedboolean isValidating = saxParser.isValidating();// Find out if namespaces are supportedboolean isNamespaceAware = saxParser.isNamespaceAware();// Parse, in a variety of ways// Use a file and a SAX DefaultHandler instancesaxParser.parse(new File(args[0]), myDefaultHandlerInstance);// Use a SAX InputSource and a SAX DefaultHandler instancesaxParser.parse(mySaxInputSource, myDefaultHandlerInstance);// Use an InputStream and a SAX DefaultHandler instancesaxParser.parse(myInputStream, myDefaultHandlerInstance);// Use a URI and a SAX DefaultHandler instancesaxParser.parse("http://www.newInstance.com/xml/doc.xml", myDefaultHandlerInstance);// Get the underlying (wrapped) SAX parserorg.xml.sax.XMLReader parser = saxParser.getXMLReader();// Use the underlying parserparser.setContentHandler(myContentHandlerInstance);parser.setErrorHandler(myErrorHandlerInstance);parser.parse(new org.xml.sax.InputSource(args[0])); |
So far, I have talked a lot about sax, but it hasn't shown anything significant or amazing. JAXP has a relatively small number of additional functions, especially when it comes to sax. This minimal feature makes code easier to transplant and allows other developers to use it freely or commercially with any sax-compatible XML parser. Okay. There is no other content when using Sax and JAXP. If you have already learned about sax, you have already succeeded about 98%. You only need to learn two new classes and a pair of Java exceptions, and then you can start to act. If you have never used sax, It is enough from now on.
Process dom
If you think you need to take a break to deal with the DOM challenge, take a rest. Using Dom is almost the same as using JAXP and sax. The only thing you need to do is to change the class name and return type, which is sufficient. If you understand how sax works and what Dom is, there is no problem at all.
The main difference between Dom and sax is the API structure. Sax is composed of event-based callback, and Dom has a memory tree structure. In sax, there is no data structure to be processed (unless a developer manually creates one ). As a result, Sax does not provide the ability to modify XML documents. Dom provides this function.org.w3c.dom.Document
Class represents an XML document, which consists of DOM nodes that represent elements, attributes, and other XML structures. Therefore, JAXP does not need to start the sax callback; it is only responsible for returning DOM from ParsingDocument
Object.
Dom parser factory Overview
After learning about the differences between Dom and sax, you don't need to know anything else. Listing 3 looks likeListing 1The sax code in is very similar. First, obtainDocumentBuilderFactory
(Obtained from listing 1SAXParserFactory
). Then, configure the factory to process the verification and namespace (same as the method in SAX ). Secondly, retrieve andSAXParser
SimilarDocumentBuilder
Instance (same as the method in SAX ). Then parse and get the DOMDocument
Method for passing an object to the output DOM tree:
Listing 3. Using documentbuilderfactory
import java.io.File;import java.io.IOException;import java.io.OutputStreamWriter;import java.io.Writer;// JAXPimport javax.xml.parsers.FactoryConfigurationError;import javax.xml.parsers.ParserConfigurationException;import javax.xml.parsers.DocumentBuilderFactory;import javax.xml.parsers.DocumentBuilder;// DOMimport org.w3c.dom.Document;import org.w3c.dom.DocumentType;import org.w3c.dom.NamedNodeMap;import org.w3c.dom.Node;import org.w3c.dom.NodeList;public class TestDOMParsing { public static void main(String[] args) { try { if (args.length != 1) { System.err.println ("Usage: java TestDOMParsing " + "[filename]"); System.exit (1); } // Get Document Builder Factory DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); // Turn on validation, and turn off namespaces factory.setValidating(true); factory.setNamespaceAware(false); DocumentBuilder builder = factory.newDocumentBuilder(); Document doc = builder.parse(new File(args[0])); // Print the document from the DOM tree and // feed it an initial indentation of nothing printNode(doc, ""); } catch (ParserConfigurationException e) { System.out.println("The underlying parser does not " + "support the requested features."); } catch (FactoryConfigurationError e) { System.out.println("Error occurred obtaining Document " + "Builder Factory."); } catch (Exception e) { e.printStackTrace(); } } private static void printNode(Node node, String indent) { // print the DOM tree }} |
This code can have two problems (the same as sax in JAXP ):FactoryConfigurationError
AndParserConfigurationException
. The cause of each problem is the same as that in Sax. A problem occurs in the implementation class (resulting inFactoryConfigurationError
), Another problem is that the provided parser does not support the request features (resulting inParserConfigurationException
). In this regard, the only difference between Dom and sax is that in DomDocumentBuilderFactory
SubstitutionSAXParserFactory
, UseDocumentBuilder
SubstitutionSAXParser
. Only. (You can view the complete code list, including the methods used to output the DOM tree. SeeDownload.)
Use the DOM parser
Once you have a DOM factory, you can obtainDocumentBuilder
Instance. AvailableDocumentBuilder
The instance method is very similar to the method that can be used to correspond to a sax instance. The main difference is thatparse()
Method variants do not accept SaxDefaultHandler
Class. Instead, return a Dom that represents the parsed XML document.Document
Instance. The only difference is that the two methods provide functions similar to sax:
setErrorHandler()
, Execute the saxErrorHandler
Implementation to handle possible problems during parsing.
setEntityResolver()
, Execute the saxEntityResolver
To process object parsing.
Listing 4 shows the actual examples of these methods:
Listing 4. Using JAXPDocumentBuilder
Class
// Get a DocumentBuilder instanceDocumentBuilder builder = builderFactory.newDocumentBuilder();// Find out if validation is supportedboolean isValidating = builder.isValidating();// Find out if namespaces are supportedboolean isNamespaceAware = builder.isNamespaceAware();// Set a SAX ErrorHandlerbuilder.setErrorHandler(myErrorHandlerImpl);// Set a SAX EntityResolverbuilder.setEntityResolver(myEntityResolverImpl);// Parse, in a variety of ways// Use a fileDocument doc = builder.parse(new File(args[0]));// Use a SAX InputSourceDocument doc = builder.parse(mySaxInputSource);// Use an InputStreamDocument doc = builder.parse(myInputStream, myDefaultHandlerInstance);// Use a URI Document doc = builder.parse("http://www.newInstance.com/xml/doc.xml"); |
If you are bored when reading the DOM section, you are not alone. I am also bored when writing, it is so easy to apply the learned knowledge about Sax to Dom.