Turn from: http://www.ibm.com/developerworks/cn/xml/tips/x-tipsaxis/index.html
When you use the SAX API, all input starts with the Org.xml.sax.InputSource class. This class is contained in the SAX API and provides input specifications (through standard Java constructs similar to file or I/O streams) and also provides a common system identity. Then, SAX extracts this information from the InputSource when parsing, allowing it to resolve external entities and other source-specific resources.
Similarly, when you use wrappers for SAX (similar to the JAXP API), you can invoke different methods. Finally, Parse uses SAX inputsource. For example, consider the code snippet shown in Listing 1, which uses JAXP to start the SAX parsing.
Listing 1. Using JAXP for SAX parsing
saxparserfactory SPF = saxparserfactory.newinstance (); SAXParser parser = Spf.newsaxparser (); Parser.parse (MyFile, MyHandler);
Even if the Java.io.File file is entered, it is converted to a sax InputSource before being forwarded to the underlying SAX implementation. This transformation occurs because the JAXP code eventually accesses the Org.xml.sax.XMLReader class, which provides only the two signatures shown in Listing 2 for startup resolution.
Listing 2. Parse entry point for XMLReader
Public void Parse (InputSource inputsource), public void Parse (String systemid) ;
On this basis, most SAX parser implementations (such as Apache Xerces) actually convert the string system identity to InputSource and assign it to the parse () version that receives InputSource. Regardless of how you encode your own application, SAX eventually receives InputSource for resolution. However, not all of these methods are equally well handled.
To avoid unpleasant surprises in your code, it's best to use the SAX InputSource class directly, rather than having JAXP or sax handle the task for you. Because the implementation must handle every possible situation, you will often see the code that constructs the InputSource instance, similar to the one shown in Listing 3.
Listing 3. General method of InputSource construction
InputSource InputSource = new InputSource ();//may be a null parameter Inputsource.setbytestream (InputStream); Be a null parameter inputsource.setcharacterstream (reader); May be a null parameter Inputsource.setsystemid (SYSTEMID); May be a null parameter inputsource.setpublicid (PUBLICID); Derived parameter inputsource.setencoding (encoding);
As you can see from the comments, many of these methods are passed null parameters. Although it doesn't take much time to execute these methods, every second of the XML parsing application is critical; Unfortunately, these methods of not doing anything waste valuable time. By constructing the InputSource instance yourself, you can simplify the process to one to two method invocations, as shown in Listing 4.
Listing 4. Improved InputSource construction
InputSource InputSource = new InputSource (myinputstream); Inputsource.setsystemid ("http://www.oreilly.com"); Inputsource.setencoding ("UTF-8");
I also used the setencoding () method to tell the SAX parser what encoding to use, which is important in XML applications that involve internationalization or use of multibyte character sets.
However, there is another problem: it is common for character encodings to encode code with a manual encoding that is different from the one provided by the input stream (via Java.io.InputStream or Java.io.Reader). If these encodings do not match, various parsing problems may occur. To avoid this, always create your inputsource with the Java InputStream instead of Reader or String system identities (these are all possible options for the JAXP API). When you provide InputStream, the sax implementation encapsulates the stream in InputStreamReader, and Sax automatically detects the correct character encoding from the stream. You can then omit the setencoding () step and reduce the method call again. The result is that the application runs faster and the character encoding is always correct.