New Features of JAXP 1.3, part 1

Source: Internet
Author: User
Tags xsl xslt ibm developerworks
Technical Review, analysis of API changes and new verification APIs

Level: Intermediate

Neil Graham, XML Parser Development Manager, IBM

Elena Litani (elitani@ca.ibm.com), software developer, IBM

November 2004

As a mature technology, XML spaces are very active. Java API for XML Processing (JAXP) 1.3 was finalized recently. It is a channel for many of the latest XML-related open standards to enter the j2se platform. This article consists of two parts. The first part describes the JAXP 1.3 API. The authors Neil Graham and Elena Litani will briefly introduce the JAXP specifications and describe them in detail.javax.xml.parsersPackage changes will also discuss powerful mode buffering and verification frameworks.

JAXP was initially namedJava API for XMLParsingJAXP 1.0 only provides vendor-neutral methods for applications to create a DOM level 1 or a sax 1.0 parser. With the release of JAXP 1.1 in 2001, "P" was changed to representativeProcessingParsingAPI functions are further extended, providing a standard method for applications to interact with XSLT processing programs. JAXP 1.1 is part of Java 2 Standard Edition (j2se) 1.4 and Java 2 Enterprise Edition (J2EE) 1.3. With the slight revision of the specification, JAXP 2002 was released in 1.2, and some standard methods for calling W3C XML Schema verification in the JAXP compatible parser were also added. Instead

JAXP 1.3 will be part of j2se 5 and J2EE 4. It is the most important version of jaxp api in the past three years. In the following two articles, we will explore various new features added in the new version of JAXP.

JAXP 1.3 Overview

JAXP specifications support and are based on the following specifications (see references ):

  • XML 1.0 (3rd edition) and XML 1.1, W3C recommendation specifications.
  • Namespaces in XML 1.0 (including errata) and namespaces 1.1, W3C recommendation specifications.
  • XML Schema (including errata), W3C recommended specification.
  • XSL transformations (XSLT) Version 1.0, W3C-recommended specification.
  • XML Path Language (XPath) Version 1.0 (including errata), W3C recommended specification.
  • XML comprehensions (xinclude) Version 1.0, which is the recommendation specification proposed by W3C when writing this article.
  • Simple API for XML (SAX) 2.0.2 (sax2r3) and sax Extensions 1.1.

All implementations compatible with JAXP 1.3 must support the above specifications.

JAXP APIs include several Java packages which provide some functions of JAXP:

  • javax.xml: This is a root package. It only includes one class (XMLConstants), Which defines some useful constants.
  • javax.xml.parsers: This package has existed since JAXP 1.0. It defines a vendor-neutral API for parsing and verifying XML documents using sax or Dom.
  • javax.xml.transform: This package appears from JAXP 1.1 and defines the XSL transformation API.
  • javax.xml.namespace: This is a new package added in JAXP 1.3, which definesQNameClass andNamespaceContextInterface. These classes were originally defined in the Java API for XML-based RPC (JAX-RPC) specification (see references.
  • javax.xml.datatype: This is a new package added in JAXP 1.3. It defines the new Java type and completes the W3C ing between W3C schema data types and Java types.
  • javax.xml.validation: It is also a new package added in JAXP 1.3, which defines the APIs used for the application buffer mode (such as W3C schema) and the validation XML document.
  • javax.xml.xpath: A new package added in JAXP 1.3, which defines a data model and an independent implementation API to apply the XPath expression in the document.

JAXP also includesorg.xml.saxPackages (including the sax API) andorg.w3c.domPackage (including Dom Level 3 API, see references ).

JAXP 1.3 and XML Parsing

To ensure the maximum portability of applications dependent on a specific JAXP version, the JAXP standard version is consistent with the specific Dom and sax versions from the very beginning, and the underlying XML and XML namesapces specifications are closely linked. Since the previous major version of JAXP (JAXP 1.1) was released, none of these specifications have remained intact for the past three years, so JAXP has also been upgraded to version 1.3, to enable these specifications to enter j2se and J2EE.

XML standard evolution

W3C finalized XML 2004 3rd edition, XML 1.0, and XML Namespaces 1.1 in early 1.1. JAXP 1.3 requires that all compliant resolutions comply with the preceding three standards. Although XML 1.0 3rd edition contains a majority of clarification instructions, only applications that have the most detailed understanding of XML will notice these instructions, however, XML 1.1 will have a positive impact on the XML world because it greatly extends the characters that can be used by XML names. It allows XML to be forward compatible with Unicode standard, agrees with the definition of the line tail flag in XML and Unicode, and specifies references to all ASCII characters except 0 (including all control characters ). XML Namespaces 1.1 allows a namespace prefix not declared in a document segment. Of course, it references XML 1.1. For more information about these specifications, see the developerworks article "XML 1.1 and namespaces 1.1 ".

Another W3C product is XML comprehensions (xinclude) 1.0. Currently, this standard is recommended. Xinclude provides methods for XML documents to fully or partially contain other XML documents and document nature resources. Different from XML entities, xinclude is completely completed in addition to Document Type Definitions (DTD). Therefore, xinclude is friendly for XML Schema verification. The authors of XML resources that share content among multiple documents will discover the extraordinary value of xinclude. JAXP 1.3 requires that all implementations that comply with the specifications should follow the development of the specification until it becomes the W3C recommendation specification.

As for the XML parsing API itself, JAXP supports the sax 2.0.2 and sax Extensions 1.1, as well as Dom Level 3 core and Dom Level 3 load and save. The Dom Level 3 specification is an important part of new functions, but it is not discussed in this article. IBM developerworks has some good articles on Dom Level 3 core (see references ). Interested readers can read these articles.

As the minor changes in the version number indicate, there is no significant change in Sax 2.0.2 compared to the sax 1.1 supported by JAXP 2.0. Sax 2.0.1 contains some changes in signature compatibility (so it is not supported by JAXP 1.2). For example, it is basically the same as that of sax 2.0, it adds the default constructor for the exception class defined by saxEntityResolver#resolveEntityThe throws clause of the callback function is added.IOExceptions. In the new feature, Sax 2.0.2 defines the following:

  • Allows applications to query whether the SAX Parser supports the XML 1.1 feature.
  • Indicates that the parser keeps the XML name and namespace in JVM. To determine whether internal strings are equal, you can use==String.equals(). Instead
  • Supports XML 1.1 standardization check. Note that JAXP 1.3 does not require the compatible parser to support this feature.

Extensions 1.1 is an important improvement for the original sax extension. It adds the following content:

  • EntityResolver2InterfaceYesEntityResolverIt provides a callback function for the external subset of the DTD, andresolveEntityThe parameter list of the method is added.baseURIAnd object name.
  • Attributes2ExtendedAttributesProvides information such as whether to declare an attribute in the DTD, or whether the attribute value is the default value in the DTD.
  • Locator2ExtendedLocator, AddedgetXMLVersion()AndgetEncoding()Provides full access to the pseudo attributes on the XML Declaration of the currently processed object.

New features in javax. xml. parsers

JAXP 1.3 does not modify the resolution-related interfaces directly defined by JAXP. Which of the following statements is most commonly used?reset()Method,DocumentBuilderAndSAXParserThis method is added to make the object return to the default state. Because the factory mechanism of JAXP parser objects is very expensive, applications often need to implementSAXParserAndDocumentBuilderSo that these objects can be used when a parsing task is encountered, but they do not have to be destroyed after the parsing is complete. If you can reset these objects to an unknown state, the buffer pool does not need to know the information about the object used by the request code, the parser code is not required to know the previous usage of the parser. This makes the buffer pool more efficient and easier to implement. For how to implement the parser buffer pool, see "improve performance in your XML applications, Part 2 ".

PassSAXParserFactoryAndDocumentBuilderFactoryAddedsetSchema()Method, you can connect the parser to the mode (see the following aboutjavax.xml.validationPackage discussion ). In this way, the constructor of the parser can be tailored to the specific mode (javax.xml.validation.Schemas), Compared with standard parser objects that do not have built-in validation Document Syntax knowledge, this can significantly improve performance. The application can also configure its parser factory so that the generated parser can pass throughget/setXIncludeAwareMethod awareness xinclude, which is newly added by the factory. Both the parser and the factory can be queried to determine whether the parser passesisXIncludeAware()Method to perceive xinclude, you can also usegetSchema()Method to obtain the currently associatedSchema(If any ).

Jaxp api authentication and mode Buffer

Many applications need to validate XML documents in the schema, for example, the schema defined according to W3C XML schema recommendation standards. To verify the document, the parser needs to parse the mode document first, then the mode representation inside the component in the memory, and finally verify the XML document using the mode in the memory. Therefore, if each XML document needs to be represented in the memory of the build mode before being verified, the verification may cause a high performance cost. Generally, an application uses only a limited pattern, so you want the processor to build only one in-memory representation (in-memory representation) in the given pattern and then use it to verify the document.

So far, the implementation must provide its own mode buffer mechanism. For example, the Apache xerces-J parser defines its own syntax buffer API (see references ). JAXP 1.3 now defines a standard API (javax.xml.validationApplication Reuse mode, thus improving the overall performance.

Now let's further test and verify the API. To retrieve the representation in the memory of the mode, you must first obtain the instance of the mode Factory (javax.xml.validation.SchemaFactory. Compatible JAXP implementations must support W3C XML schema, and other languages (such as Relax NG) are optional. You can use the feature and attribute configuration factory in a way similar to configuring the XML parser, and finally require the factory to build memory representations for the given mode. The memory in the mode is definedSchemaClass, which is constant and therefore thread-safe. The API does not provide a query mode structure or attribute method.

SchemaThere are two usage types:

  • You can construct a parser validation that indicates optimization in the given schema memory (as described above ).
  • PassSchemaClass to create the verification program,SchemaVerify different XML input resources (such as Dom or SAX ).

First, we will illustrate how to reuse the memory representation in a given mode to improve resolution performance. For simplicity, the sample code in Listing 1 uses an XML document (PO. XML) describing an order and an order mode (PO. XSD ). Both the documentation and the schema are defined by W3C XML Schema primer recommendation (see references ).

First, build a mode factory and use it to establish the memory representation of the order mode. Then retrieve the DOM factory instance, set the order schema on the factory, and use the DOM factory to create the DOM parser. The new parser can verify the XML document in order mode.

Listing 1. parsing and verifying XML documents in Reuse Mode


// create a SchemaFactory that conforms to W3C XML Schema
SchemaFactory sf = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);

// set your error handler to catch errors during schema construction
sf.setErrorHandler(myErrorHandler);

// parse the purchase order schema
Schema schema = sf.newSchema("po.xsd");

// get a DOM factory
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();

// configure the factory
dbf.setNamespaceAware(true);

// set schema on the factory
dbf.setSchema(schema);

// create a new parser that validates documents against
// the schema specified (po.xsd)
DocumentBuilder db = dbf.newDocumentBuilder();

// attach an error handler to detect document validation errors
db.setErrorHandler(myErrorHandler);

// parse and validate against po.xsd an XML document
Document purchaseOrderDoc = db.parse("po.xml");

Now let's take a look at how to use the validators. You can create two types of validators in the given mode:

  • It can verify the Dom or sax SourceValidatorsAnd generate Dom and sax events respectively.
  • VerifyValidatorHandler. The validator is like a saxContentHandler. If you set your ownorg.xml.sax.ContentHandlerThe validators handler will act as a filter to verify the received sax event and forward the eventContentHandler. This validators also allow you to useTypeInfoProviderInterface (seeValidatorHandler.getTypeInfoProvider()Method.

Neither of the two validators is thread-safe. The validator may change the result data and add new information to the original data. For example, the default attribute may appear in the DOM tree or a new sax event may appear as the verification result. You can set different features and properties to configure the validators and register the object Parser (org.w3c.dom.ls.LSResourceResolver) Help validators parse external entities, or attach error handlers (org.xml.sax.ErrorHandler). NOTE: If no error handler is specified, the default implementation will throw any verification error.SAXParseExceptionException.

Listing 2 describes how to useValidatorInterface to verify the DOM document. Here we assume that the application wants to validate the DOM document for two types of patterns: Po. XSD and IPO. XSD. The application may have received the DOM document from another application or modified the original Dom document. Now make sure that the DOM document is still valid according to Po. XSD or IPO. XSD.

List 2. UseValidatorInterface Verification Dom document


// create JAXP transformation sources to specify
// schema sources you want to use
StreamSource po = new StreamSource("po.xsd");
StreamSource ipo = new StreamSource("ipo.xsd");

// build in-memory representation for po.xsd and ipo.xsd
Schema schemas = sf.newSchema(new Source[]{po, ipo});

// create a validator that will be able to validate
// against po.xsd and ipo.xsd
Validator validator = schemas.newValidator();

// configure this validator
validator.setErrorHandler(myErrorHandler);

// specify a DOM tree that you want to validate
DOMSource docSource = new DOMSource(purchaseOrderDoc);

// validate the source
validator.validate(docSource, null);

Conclusion

This article provides an overview of JAXP APIs, including the revision of basic XML standards and some changes in parsing APIs. We also discussed in detail the newjavax.xml.validationPackage, describes how an application uses it to improve performance. Section 2nd describes some common tools for new data types and namespace support provided in JAXP 1.3,javax.xml.transformPackage changes, brand newjavax.xml.xpathPackages, their data models, and vendor-neutral XPath 1.0 APIs.

References

  • For more information, see the original article on the developerworks global site.

  • Learn more about Java API for XML Processing (JAXP ).

  • All W3C specifications can be found in W3C technical reports.

  • XML documents Po. xml and order mode Po. XSD are both defined in W3C XML Schema primer recommendation.

  • Learn about Java API for XML-based RPC (JAX-RPC ).

  • Understand the Simple API for XML (SAX) and Document Object Model (DOM) specifications.

  • See the article about Dom Level 3 co-authored by Elena Litani and Arnaud le hors. This article consists of two parts:
    • Part 2 describes how to manipulate and compare nodes and how to process text and user data (developerworks, 1st ).
    • Part 2 discusses in depth the self-raising, infoing to XML infoset, access type information, and xerces usage (developerworks, November 2nd ).



  • Understand what XML 1.1 and namespaces 1.1 are, what changes they bring, and how they affect other specifications and users, for more information, see "XML 1.1 and namespaces 1.1" by Arnaud le Hors (developerworks, November May 2004 ).

  • See all articles in the "improve XML application performance" series: Part 1 (1st), part 2 (2nd ). These articles are published on developerworks. You can read these articles to see how to fully explore the performance of XML applications.

    (July 2004) and

  • Understand the xerces2 Java parser and Its Syntax buffer API.

  • Further study, Relax NG, which is maintained by Organization for the Advancement of Structured Information Standards (OASIS) and is also the oasis and International Organization for Standardization (ISO) standards.

  • If you are confused by these XML standards, the developerworks series of articles on XML standards written by uche ogbuji can help you clarify your ideas:
    • Part 2-core standards
    • Part 2-XML processing standards
    • Part 2-the most important vocabulary
    • Part 2-detailed cross-reference of major XML standards



  • You can find more related resources in the developerworks XML and Java technology area.

  • Learn how to become an IBM-certified XML and related technology developer.

Author Profile

Neil Graham is the manager of ibm xml Parser development. He is the owner of Apache xerces-Java and xerces-C ++ XML Parser and is mainly engaged in the implementation of XML schema, XML 1.1, and syntax buffer. He is also an ibm representative of the jaxp 1.3 expert group.



Elena Litani is an IBM software developer. She is one of the main developers of eclipse.org's eclipse modeling framework (EMF) project. This project provides reference implementation for Service Data Objects (Service data object, SDO. In the past, Elena was one of the main developers of the Apache xerces2 project and engaged in implementation of xerces2 XML schema and Dom Level 3 and Analysis and Improvement of parser performance. Elena also participated in the W3C Dom Working Group on behalf of IBM and developed the DOM Level 3 standard.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.