[Go] Tip: Use SAX inputsource effectively

Last Update:2018-07-25 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Turn from: http://www.ibm.com/developerworks/cn/xml/tips/x-tipsaxis/index.html

When you use the SAX API, all input starts with the Org.xml.sax.InputSource class. This class is contained in the SAX API and provides input specifications (through standard Java constructs similar to file or I/O streams) and also provides a common system identity. Then, SAX extracts this information from the InputSource when parsing, allowing it to resolve external entities and other source-specific resources.

Similarly, when you use wrappers for SAX (similar to the JAXP API), you can invoke different methods. Finally, Parse uses SAX inputsource. For example, consider the code snippet shown in Listing 1, which uses JAXP to start the SAX parsing.

Listing 1. Using JAXP for SAX parsing

saxparserfactory SPF = saxparserfactory.newinstance (); SAXParser parser = Spf.newsaxparser (); Parser.parse (MyFile, MyHandler);

Even if the Java.io.File file is entered, it is converted to a sax InputSource before being forwarded to the underlying SAX implementation. This transformation occurs because the JAXP code eventually accesses the Org.xml.sax.XMLReader class, which provides only the two signatures shown in Listing 2 for startup resolution.

Listing 2. Parse entry point for XMLReader

Public void Parse (InputSource inputsource), public void Parse (String systemid) ;

On this basis, most SAX parser implementations (such as Apache Xerces) actually convert the string system identity to InputSource and assign it to the parse () version that receives InputSource. Regardless of how you encode your own application, SAX eventually receives InputSource for resolution. However, not all of these methods are equally well handled.

To avoid unpleasant surprises in your code, it's best to use the SAX InputSource class directly, rather than having JAXP or sax handle the task for you. Because the implementation must handle every possible situation, you will often see the code that constructs the InputSource instance, similar to the one shown in Listing 3.

Listing 3. General method of InputSource construction

InputSource InputSource = new InputSource ();//may be a null parameter Inputsource.setbytestream (InputStream); Be a null parameter inputsource.setcharacterstream (reader); May be a null parameter Inputsource.setsystemid (SYSTEMID); May be a null parameter inputsource.setpublicid (PUBLICID); Derived parameter inputsource.setencoding (encoding);

As you can see from the comments, many of these methods are passed null parameters. Although it doesn't take much time to execute these methods, every second of the XML parsing application is critical; Unfortunately, these methods of not doing anything waste valuable time. By constructing the InputSource instance yourself, you can simplify the process to one to two method invocations, as shown in Listing 4.

Listing 4. Improved InputSource construction

InputSource InputSource = new InputSource (myinputstream); Inputsource.setsystemid ("http://www.oreilly.com"); Inputsource.setencoding ("UTF-8");

I also used the setencoding () method to tell the SAX parser what encoding to use, which is important in XML applications that involve internationalization or use of multibyte character sets.

However, there is another problem: it is common for character encodings to encode code with a manual encoding that is different from the one provided by the input stream (via Java.io.InputStream or Java.io.Reader). If these encodings do not match, various parsing problems may occur. To avoid this, always create your inputsource with the Java InputStream instead of Reader or String system identities (these are all possible options for the JAXP API). When you provide InputStream, the sax implementation encapsulates the stream in InputStreamReader, and Sax automatically detects the correct character encoding from the stream. You can then omit the setencoding () step and reduce the method call again. The result is that the application runs faster and the character encoding is always correct.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

[Go] Tip: Use SAX inputsource effectively

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

[Go] Tip: Use SAX inputsource effectively

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support