Http://www-128.ibm.com/developerworks/cn/xml/x-jaxpval.html
JAXP Verification Use the new features of JAXP 1.3 to verify XML |
|
|
Level: Intermediate Brett McLaughlin (brett@newInstance.com), author/Editor, O 'Reilly media, Inc. November 03, 2005 |
The latest version of Java programming language Java 5.0 includes the improved and extended Java API for XML Processing (JAXP) version. JAXP mainly adds a new verification API, which provides better interactivity, supports XML schema and Relax NG, and can be modified at the same time of verification. These improvements provide Java developers with an industry-intensive XML verification solution. This article details this new API, including basic features and more advanced features.
Java API for XML Processing (JAXP) has been a stable and dull API for several years. This is not a bad thing. Being dull often means reliability, which is always good for software. However, the slowness of JAXP has made developers no longer look for new features. From Java 1.3 to 1.4, in addition to supporting the latest versions of the sax and Dom specifications (see references), JAXP has not significantly changed. But in Java 5.0 and JAXP 1.3, Sun has greatly expanded JAXP. In addition to supporting XPath, verification is also worth mentioning. This article describes the verification features of JAXP 1.3 in detail.javax.xml.validation
Package.
Brief historical review
|
Ubiquitous Mode In this article (and in general ),Schema)Any constraint model that follows an XML format. XML schema is a schema, but it is not necessarily an XML schema (defined according to W3C specifications ). For example,ModeIt can also be used in the Relax NG mode. General PurposeModeIt makes it easier to reference a specific method (XML-based constraint model) without being limited to specific implementations. |
|
Before you fully understand the details of such API verification, you must fully understand how the verification was completed before JAXP 1.3. In addition, it is clear that sun will still support the past DTD verification methods, but we recommend that you use a new mode-based verification API. Therefore, even if you want to usejavax.xml.validation
And you still need to understand how to use the DTD to verify the document.
Create a parser factory
In general JAXP processingFactory.SAXParserFactory
Used for Sax parsing,DocumentBuilderFactory
It is used for Dom parsing. Both factories use static methods.newInstance()
Create, as shown in Listing 1.
Listing 1. Creating saxparserfactory and documentbuilderfactory
// Create a new SAX Parser factorySAXParserFactory factory = SAXParserFactory.newInstance();// Create a new DOM Document Builder factoryDocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); |
Open Verification
|
One factory, Multiple Resolvers The options set for the factory affect all Resolvers created by the factory. If you usetrue CallsetValidating() , It explicitly tells the factory that all the Resolvers created must be verified. Remember, this is easy to happen: open verification in the factory, forget this setting after writing 100 lines of code, and forget that the generated parser is verified. |
|
AlthoughSAXParserFactory
AndDocumentBuilderFactory
There are different features and properties that are suitable for Sax and Dom respectively, but for verification, they all have a common method:setValidating()
. As expected, to enable verification, you only needtrue
To this method. However, the factory is used to create a parser instead of directly parsing documents. After creating a factory, you can callnewSAXParser()
(SAX) ornewDocumentBuilder()
(DOM ). Listing 2 shows that both methods enable verification.
Listing 2. enable verification (DTD)
// Create a new SAX Parser factorySAXParserFactory factory = SAXParserFactory.newInstance();// Turn on validationfactory.setValidating(true);// Create a validating SAX parser instanceSAXParser parser = factory.newSAXParser();// Create a new DOM Document Builder factoryDocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();// Turn on validationfactory.setValidating(true);// Create a validating DOM parserDocumentBuilder builder = factory.newDocumentBuilder(); |
In either case, you will get an object that can parse the XML and verify the XML During the parsing process (SAXParser
OrDocumentBuilder
). But rememberOnlyLimited to DTD parsing.setValidating(true)
The call has no effect on XML-based parsing.
Introduction to javax. xml. Validation
Five years ago, it was enough to open the DTD verification with a pretty method. Even two years ago, schema languages like XML schema and Relax NG were still busy solving their own problems. However, today, pattern verification documents are as common as DTD. These two methods exist at the same time largely because legacy documents still use DTD. In the next few years, DTD will disappear like lisp and become a historical relic rather than a mainstream technology.
JAXP 1.3 is introducedjavax.xml.validation
Package Support Mode verification has aroused great repercussions among developers. This package is easy to use and compact, and has become a standard component of the Java language. Better yet, if you have used Sax and Dom through JAXP, it is easier to master how to verify it. The model is similar, and you will find it easy to use this API for verification.
Use schemafactory
With a brief historical review, you know that the first step in using Sax is to create a newSAXParserFactory
. If Dom is used, createDocumentBuilderFactory
. Therefore, it is not surprising that you must first create a mode for verification.SchemaFactory
As shown in listing 3.
Listing 3. Creating schemafactory
import javax.xml.XMLConstants;import javax.xml.validation.SchemaFactory;...SchemaFactory schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_SCHEMA_NS_URI); |
This is similar to the creation of other factories, but addsnewInstance()
Method parameters. You must pass the constant defined in another class to this method, that isjavax.xml.XMLConstants
Class. This class defines all constants used in the jaxp application, but now you only need to know two:
- Used for Relax NG Mode
XMLConstants.RELAXNG_NS_URI
- For W3C XML Schema
XMLConstants.W3C_XML_SCHEMA_NS_URI
BecauseSchemaFactory
Is associated with a specific constraint model, so this value must be provided during factory construction.
SchemaFactory
Class has several other options. These content will be introduced later in the in-depth understanding and Verification Section. For general XML verification, the default factory is enough.
Verify the Mode
|
Use the source, Luke Despite the majestic title and a pair of Chinese characters, the title is actually in the whole JAXPSource Interfaces are very important. This interface is derived from XML Conversion Processing and has become an input standard for various JAXP structures. This is the case if Java I/O classes are not directly used. If you have never usedSource For more information about XML Conversion, see references. |
|
After the factory is created, you also need to load the required consumer set. You can usenewSchema()
Method. Howeverjavax.xml.transform.Source
Therefore, an intermediate step is required to convert the modeSource
. This process is simple, as shown in Listing 4.
Listing 4. From constraints to schema
import javax.xml.XMLConstants;import javax.xml.transform.Source;import javax.xml.transform.stream.StreamSource;import javax.xml.validation.SchemaFactory;import javax.xml.validation.Schema;...SchemaFactory schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_SCHEMA_NS_URI);Source schemaSource = new StreamSource(new File("constraints.xml"));Schema schema = schemaFactory.newSchema(schemaSource); |
These codes are very intuitive if you are familiar with JAXP. In Listing 4, a file named constraints. XML is loaded. You can use any method to obtainSource
Data in, includingSAXSource
AndDOMSource
) Read constraints, or even use URLs.
Once you getSource
To pass it to the factorynewSchema()
Method. The returned result isSchema
. Now, it is easy to verify the document. See listing 5.
Listing 5. Verify XML
import javax.xml.XMLConstants;import javax.xml.transform.Source;import javax.xml.transform.stream.StreamSource;import javax.xml.validation.SchemaFactory;import javax.xml.validation.Schema;import javax.xml.validation.Validator;...SchemaFactory schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_SCHEMA_NS_URI);Source schemaSource = new StreamSource(new File("constraints.xml"));Schema schema = schemaFactory.newSchema(schemaSource);Validator validator = schema.newValidator();validator.validate(new StreamSource("my-file.xml")); |
There is no major change here. It is easy to know the class to be used and the method to be called. You must useValidator
Class. AvailablenewValidator()
Method slaveSchema
Obtain the instance of this class. Finally, you can callvalidate()
And pass it againSource
Implementation, but this time it represents the XML to be parsed and verified.
After this method is called, the target XML is parsed and verified. Remember to useDOMSource
Provides XML (parsed XML Representation), and the parsing may happen again. Verification is still closely linked with resolution, so the verification process takes a little time.
If an error occurs, an exception is thrown, indicating that the problem has occurred. Most implementations of JAXP include row numbers and sometimes column numbers to help locate locations that violate the constraints model. Of course, throwing an exception is not necessarily the best way to solve the problem. I will introduce a better method in the next section.
It seems to work a lot: get the factory, get the mode, get the validators. It is entirely possible for JAXP to provide a factory method to accomplish this, for examplevalidate(Source schema, Source xmlDocument)
This method. However, modularity has some benefits. In the next section, we will see that it is used at the same time.Schema
AndValidator
Class, which can solve some very strange situations in XML processing. If you do need to write it yourself, you may wish to use it as a good exercise!
In-depth understanding and Verification
For many applications, the content described above is enough. You can give the input document and mode a method for verification. SimpleException
It tells you that you have encountered a problem and even provides some basic information to solve the problem. For applications that use XML as the data format, it may only pass some information, and the JAXP verification function may be sufficient.
However, we live in a world where XML editors, files, code generators, and Web services are everywhere. For such applications, XML not only plays a secondary role,YesBasic verification is often not enough for the application itself. JAXP provides many features for such applications, which will be discussed below.
Handling error
First, people think thatException
Indicates that an exception has occurred. However, for XML-based applications, file verification failure may not be an exception at all, but only one possible result. For example, an XML editor or IDE is supported. In these environments, invalid XML should not cause system crash or shutdown. In additionException
It is too heavy to report errors.
Of course, this is not new to JAXP veterans, and you may have become accustomedSAXParser
OrDocumentBuilder
Provideorg.xml.sax.ErrorHandler
. Three methods provided by this interfacewarning()
,error()
AndfatalError()
It simplifies error handling in parsing. Fortunately, the same facility is available for XML verification. It is better to use the same interface. This is exactly the case,ErrorHandler
The interface is as useful as parsing in verification. Listing 6 provides a simple example.
Listing 6. Handling verification errors
import javax.xml.XMLConstants;import javax.xml.transform.Source;import javax.xml.transform.stream.StreamSource;import javax.xml.validation.SchemaFactory;import javax.xml.validation.Schema;import javax.xml.validation.Validator;import org.xml.sax.ErrorHandler;...SchemaFactory schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_SCHEMA_NS_URI);Source schemaSource = new StreamSource(new File("constraints.xml"));Schema schema = schemaFactory.newSchema(schemaSource);Validator validator = schema.newValidator();ErrorHandler mySchemaErrorHandler = new MySchemaErrorHandler();validator.setErrorHandler(mySchemaErrorHandler);validator.validate(new StreamSource("my-file.xml")); |
Like sax, you can use this interface to customize error handling. This allows the application to exit verification, print error messages, and even try to recover from the error and continue verification. If you are familiar with this interface, you do not need to learn it again!
Load multiple modes
|
One seterrorhandler () If you readjavax.xml.validation Javadoc of the package, which may be noticedSchemaFactory AndSchema ClasssetErrorHandler() Method. IfSchemaFactory Set the exception handler to processnewSchema() The parsing mode error during the call. Therefore, this is part of the authentication API, but it is not applicable to mode verification errors but for mode resolution errors. |
|
In some rare cases, it may need to be constructed from multiple modes.Schema
Object. This is a bit confusing;Schema
NoCorresponds to a mode or file. On the contrary, this object represents a group of constraints. These constraints can come from one file or multiple files. Therefore, you can usenewSchema(Source[] sourceList)
IsnewSchema()
Method providesSource
Implement arrays (representing multiple constraints ). The returned result is stillSchema
Object, indicating the combination of the provided modes.
It is expected that many errors will occur in this case. Therefore, we recommend that youSchemaFactory
SetErrorHandler
(For more information, see error handling ). Problems may occur in many places, so you must be prepared to solve the problem when it appears.
Integrate verification into Parsing
So far, verification has been taken as an independent part of resolution. But not necessarily. GetSchema
Object, you can assign itSAXParserFactory
OrDocumentBuilderFactory
, All passsetSchema()
Method (see listing 7 ).
Listing 7. Integrate verification into resolution
// Load up the documentDocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();// Set up an XML Schema validator, using the supplied schemaSource schemaSource = new StreamSource(new File(args[1]));SchemaFactory schemaFactory = SchemaFactory.newInstance( XMLConstants.W3C_XML_SCHEMA_NS_URI);Schema schema = schemaFactory.newSchema(schemaSource);// Instead of explicitly validating, assign the Schema to the factoryfactory.setSchema(schema);// Parsers from this factory will automatically validate against the// associated schemaDocumentBuilder builder = factory.newDocumentBuilder();Document doc = builder.parse(new File(args[0])); |
Note:NoNeed to usesetValidating()
Open the verification explicitly. AnySchema
Nonull
The parser created by the factorySchema
For verification. As expected, verification errors are reported to the parser.ErrorHandler
.
Important warnings
Although it looks good, I think it is not good enough. There are some serious problems with JAXP's new verification API. First, even in Java 5.0 and JAXP 1.3 official versions, I found many errors and strange behaviors. New APIs are still being supported by the parser, which means that some (rarely used features) are only partially implemented (sometimes not implemented at all ). I have found that many times documents that can be verified by an independent validator such as xmllint (see references) cannot pass JAXP verification.
Direct useValidator
Class andvalidate()
MethodSchema
AssignedSAXParserFactory
OrDocumentBuilderFactory
It seems more reliable than others. We recommend that you use a safer method. Instead of asking you to bypass this API, I suggest you use as many sample documents as possible, check the verification results twice, and be careful when handling errors.
Conclusion
Frankly speaking, JAXP does not have any obvious new things to verify the API. You can continue to use SAX or Dom to parse and verify XML, and combineErrorHandler
Class. Through clever programming, verification errors can be processed in real time. However, you need to have a full understanding of Sax and spend a lot of time testing and debugging and carefully managing the memory (if you finally create the DOMDocument
Object ). This is exactly where the JAXP authentication API flash. It provides a carefully tested and ready-to-use solution, not just a switch that enables mode verification. It is easy to combine with existing JAXP code, and it is very easy to add mode verification. I believe that Java developers who have been using XML for a long time will surely find some advantages of JAXP verification.
References
Learning
- For more information, see the original article on the developerworks global site.
- "JAXP full introduction, Part 1" and "JAXP full content, Part 2" (developerworks, 1st): Brett McLaughlin wrote two articles about JAXP, describes how to use this API to parse and verify features and support XSLT conversion.
- "New Features of JAXP 1.3, new features of JAXP 1st" and "New Features of JAXP 1.3, new features of JAXP 2nd" (developerworks, November 2004 and December) go deep into the new features of JAXP 1.3.
- "Tips: verification and the sax errorhandler interface" (developerworks, November June 2001): learn more about the verification feature and
ErrorHandler
Interface.
- "Install and configure the xerces2 Java parser" (developerworks, November July 2002): this tutorial by Nicholas chase describes how to use xerces-J for mode verification.
- Sun's Java technology and XML headquarters: a good start for JAXP.
- Java 2 platform Standard Edition 5.0 API specification: JAXP javadoc is now integrated with Java 5.0 core API documentation.
- Simple API for XML (SAX): learn more about the APIS behind JAXP. First, start with Sax 2 for Java.
- W3C Document Object Model (DOM): Take a look at another XML view supported by JAXP, Dom.
- Apache xerces2 Java Parser: Sun uses the xerces parser in its JDK 5.0 implementation.
- Getting started with developerworks XML: If you need a more basic introduction to XML, there are a lot of useful references here, including Doug Tidwell's tutorial "getting started with XML" (developerworks, August 2002 ).
- Ibm xml Certification: Learn How to become an IBM-certified XML and related technology developer.
Obtain products and technologies
- Java 2 platform Standard Edition 5.0: if you are not familiar with Java programming, you can download JAXP and the complete JDK.
- Libxml2: libxml2 is an xml c parser and Toolbox developed for the gnome project. This includes the xmllint verification program.
Discussion
- Participate in Forum discussions.
About the author
|
|
|
Brett McLaughlin has been using computers since the log age. (Remember the triangle ?) In recent years, he has become the most popular author and programmer in the Java and XML communities. He used to implement complex enterprise systems at Nextel communications, write application servers at lutris technologies, and recently started at o''reilly media, Inc. continue to write and edit books in this area. His latest bookJava 5.0 Tiger: A developer's notebookIs the first monograph on the latest version of Java technology, classic worksJava and XMLIt is still one of the authoritative writings on the use of XML technology in Java. |