[Go] Tip: Use SAX inputsource effectively

Source: Internet
Author: User

Turn from: http://www.ibm.com/developerworks/cn/xml/tips/x-tipsaxis/index.html

When you use the SAX API, all input starts with the Org.xml.sax.InputSource class. This class is contained in the SAX API and provides input specifications (through standard Java constructs similar to file or I/O streams) and also provides a common system identity. Then, SAX extracts this information from the InputSource when parsing, allowing it to resolve external entities and other source-specific resources.

Similarly, when you use wrappers for SAX (similar to the JAXP API), you can invoke different methods. Finally, Parse uses SAX inputsource. For example, consider the code snippet shown in Listing 1, which uses JAXP to start the SAX parsing.


Listing 1. Using JAXP for SAX parsing

saxparserfactory SPF = saxparserfactory.newinstance (); SAXParser parser = Spf.newsaxparser (); Parser.parse (MyFile, MyHandler);

Even if the Java.io.File file is entered, it is converted to a sax InputSource before being forwarded to the underlying SAX implementation. This transformation occurs because the JAXP code eventually accesses the Org.xml.sax.XMLReader class, which provides only the two signatures shown in Listing 2 for startup resolution.


Listing 2. Parse entry point for XMLReader

Public void Parse (InputSource inputsource), public void Parse (String systemid) ;

On this basis, most SAX parser implementations (such as Apache Xerces) actually convert the string system identity to InputSource and assign it to the parse () version that receives InputSource. Regardless of how you encode your own application, SAX eventually receives InputSource for resolution. However, not all of these methods are equally well handled.

To avoid unpleasant surprises in your code, it's best to use the SAX InputSource class directly, rather than having JAXP or sax handle the task for you. Because the implementation must handle every possible situation, you will often see the code that constructs the InputSource instance, similar to the one shown in Listing 3.


Listing 3. General method of InputSource construction

InputSource InputSource = new InputSource ();//may be a null parameter Inputsource.setbytestream (InputStream); Be a null parameter inputsource.setcharacterstream (reader); May be a null parameter Inputsource.setsystemid (SYSTEMID); May be a null parameter inputsource.setpublicid (PUBLICID); Derived parameter inputsource.setencoding (encoding);

As you can see from the comments, many of these methods are passed null parameters. Although it doesn't take much time to execute these methods, every second of the XML parsing application is critical; Unfortunately, these methods of not doing anything waste valuable time. By constructing the InputSource instance yourself, you can simplify the process to one to two method invocations, as shown in Listing 4.


Listing 4. Improved InputSource construction

InputSource InputSource = new InputSource (myinputstream); Inputsource.setsystemid ("http://www.oreilly.com"); Inputsource.setencoding ("UTF-8");

I also used the setencoding () method to tell the SAX parser what encoding to use, which is important in XML applications that involve internationalization or use of multibyte character sets.

However, there is another problem: it is common for character encodings to encode code with a manual encoding that is different from the one provided by the input stream (via Java.io.InputStream or Java.io.Reader). If these encodings do not match, various parsing problems may occur. To avoid this, always create your inputsource with the Java InputStream instead of Reader or String system identities (these are all possible options for the JAXP API). When you provide InputStream, the sax implementation encapsulates the stream in InputStreamReader, and Sax automatically detects the correct character encoding from the stream. You can then omit the setencoding () step and reduce the method call again. The result is that the application runs faster and the character encoding is always correct.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.