Sax for Java and XML joint programming

Source: Internet
Author: User
Tags array exit count event listener getmessage implement integer list of attributes
Xml| Programming Sax Concept
Sax is the abbreviation of simple API for XML, it is not the standard that is put forward by the official of the consortium, it can be said as the "folk" fact standard. In fact, it is a community-nature discussion product. Even so, the application of Sax in XML is no less than DOM, and almost all XML parsers will support it.

Sax is a lightweight method compared to DOM. We know that when we work with DOM, we need to read the entire XML document and then create the DOM tree in memory to generate each node object on the DOM tree. This is not a problem when the document is small, but once the document is large, processing the DOM becomes quite time-consuming and laborious. In particular, the need for memory will be multiplied, so that using DOM in some applications is not a good deal (as in an applet). At this point, a better alternative solution is sax.

Sax is conceptually completely different from DOM. First, unlike the DOM document driver, it is event-driven, that is, it does not need to read the entire document, and the reading process of the document is the parsing process of sax. Event-driven refers to a program running method based on callback (callback) mechanism. (This mechanism is easy to understand if you are more aware of the Java new Proxy event model)


The XmlReader accepts XML documents and parses them as they are read into the XML document, which means that the process of reading the document and the parsing process are simultaneous, which differs greatly from the DOM. Before parsing begins, you need to register a contenthandler with XmlReader, which is equivalent to an event listener, and a number of methods are defined in ContentHandler, such as Startdocument (), which customize the process of parsing What you should handle when you start a document. When XmlReader reads the appropriate content, it throws the corresponding event, and the handler for the event is given to ContentHandler to invoke its corresponding method to respond.

This may not be easy to understand in general, don't worry, the following example will let you understand the parsing process of sax. Look at this simple XML file:

<POEM>
<author>ogden nash</author>
<TITLE>Fleas</TITLE>
<LINE>Adam</LINE>
</POEM>

When XmlReader reads the <POEM> tag, it calls the Contenthandler.startelement () method and passes the label name POEM as an argument. In the Startelement () method you implement you need to do the appropriate action to handle what you should do when <POEM> appears. Each event with the process of parsing (that is, the process of reading the document) is thrown in sequence, the corresponding method will be called in sequence, and finally, when the resolution is completed, the method is called, the document processing is completed. The following table lists the methods that are called sequentially when parsing the XML file above:


ContentHandler is actually an interface that, when dealing with a particular XML file, needs to create a class that implements ContentHandler to handle a particular event, which is actually the core of the Sax processing XML file. Let's take a look at some of the methods defined here:

void characters (char[] ch, int start, int length):

This method is used to handle reading a string in an XML file, its argument is an array of characters, and the starting position and length of this string in this array, we can easily use a constructor of string class to get this string class: string Charencontered=new String (ch,start,length).

void Startdocument ():

When you encounter the beginning of the document, call this method, you can do some preprocessing work.

void Enddocument ():

Corresponding to the above method, when the end of the document, call this approach, you can do some work in the aftermath.

void Startelement (java.lang.String NamespaceURI, java.lang.String localname, java.lang.String qName, Attributes atts)

This method is triggered when a start tag is read. Name domains are not supported in the SAX1.0 version, and support for name domains is provided in the new version 2.0, where the NamespaceURI in the parameter is the name domain, the localname is the label name, and the QName is the decorated prefix of the label, neither of which is null when the name field is not used. And Atts is the list of attributes that this tag contains. By Atts, you get all the property names and the corresponding values. Note that one of the important features of Sax is its flow processing, when a tag is encountered, it does not record the previously encountered tag, that is, in the Startelement () method, all you know is the name and attributes of the tag, and the nested structure of the tag, The name of the top tag, whether there are any other structure-related information, etc., are unknown and require your program to complete. This makes it so easy for sax to have no DOM on the programming process.

void EndElement (java.lang.String NamespaceURI, java.lang.String localname, java.lang.String qName)

This method corresponds to the above square method, which is called when the end tag is encountered.

Because ContentHandler is an interface that may be inconvenient when used, Sax also has a helper class for it: DefaultHandler, which implements the interface, but all of its method bodies are empty, and when implemented, You just have to inherit this class and then overload the corresponding method.

OK, here's the basics of sax, so let's take a look at two concrete examples to better understand sax usage.

Sax Programming Examples
We still follow the example of the document used when we talk about DOM, but first, let's look at a simple application that we want to be able to count the number of occurrences of each label in the XML file. This example is simple enough to illustrate the basic idea of sax programming.

In the beginning, of course, the import statement:

Import Org.xml.sax.helpers.DefaultHandler;
Import javax.xml.parsers.*;
Import org.xml.sax.*;
Import org.xml.sax.helpers.*;
Import java.util.*;
Import java.io.*;

Then, we create a class that inherits from DefaultHandler, where the specific program logic can be put aside for a while, and note the structure of the program:

public class Saxcounter extends DefaultHandler {
private Hashtable tags; This hashtable is used to record the number of times tag appears
Work before working with a document \
public void Startdocument () throws Saxexception {
tags = new Hashtable ();//Initialize Hashtable
}
Dealing with each of the starting meta genera
public void Startelement (string NamespaceURI, String localname,
String Rawname, Attributes atts)
Throws Saxexception
{
String key = LocalName;
Object value = Tags.get (key);
if (value = = null) {
If it's a new hit tag, add a record to the hastable
Tags.put (Key, New Integer (1));
} else {
If you've encountered it before, get its count and add 1.
int count = ((Integer) value). Intvalue ();
count++;
Tags.put (Key, New Integer (count));
}
}
Analyze the statistical work after completion
public void Enddocument () throws Saxexception {
Enumeration E = Tags.keys ();
while (E.hasmoreelements ()) {
String tag = (string) e.nextelement ();
int count = ((Integer) tags.get (tag)). Intvalue ();
System.out.println ("tag <" + Tag + "> occurs" + Count
+ "Times");
}
}
Program entry, to complete the parsing work \
static public void Main (string[] args) {
String filename = null;
Boolean validation = false;
Filename= "Links.xml";
SAXParserFactory SPF = saxparserfactory.newinstance ();
XMLReader XMLReader = null;
SAXParser Saxparser=null;
try {
Create a parser SAXParser object \
SAXParser = Spf.newsaxparser ();
Get the sax XMLReader encapsulated in SAXParser
XmlReader = Saxparser.getxmlreader ();
catch (Exception ex) {
System.err.println (ex);
System.exit (1);
}
try {
Using the specified ContentHandler to parse the XML file, it should be noted that in order to
For the simplicity of the program, here's the main program and the ContentHandler. As a matter of fact
All the things that are done in the main method have nothing to do with ContentHandler.
Xmlreader.parse (new File (filename), new Saxcounter ());
catch (Saxexception se) {
System.err.println (Se.getmessage ());
System.exit (1);
catch (IOException IoE) {
System.err.println (IoE);
System.exit (1);
}
}
}

Let's take a look at what this program does, in the main () method, the main thing is to create a parser and then parse the document. In fact, when creating a SAXParser object here, in order to make the program code irrelevant to a specific parser, you use the same design techniques as in the DOM: Create concrete SAXParser objects with a saxparserfactory class, so that When you need to use a different parser, the change is only the value of an environment variable, and the code of the program can remain unchanged. This is the thought of FactoryMethod mode. No more specifics here, and if there's something you don't understand, see the explanation in the DOM above, the principle is the same.

But there's a little bit more to note here, the relationship between the SAXParser class and the XmlReader class. You may be a little confused, actually SAXParser is a encapsulated class of XmlReader in Jaxp, and XmlReader is an interface defined in SAX2.0 to parse a document. You can also call SAXParser or XmlReader in the parser () method to parse the document, the effect is exactly the same. However, the parser () method in SAXParser accepts more parameters and can parse different XML document data sources, so it is more convenient to use than XmlReader.

This example involves only a little bit of sax, and the bottom one is more advanced. The functionality that we're going to implement here is already implemented in the DOM example, which is to read the content from the XML document and format the output, although the program logic looks simple, but sax is no better than Dom Oh, look at it.

As I said earlier, when we encounter a start tag, in the Startelement () method, we don't get the position of the tag in the XML document. This is a big hassle when working with XML documents, because the semantics of tags in XML are partly determined by where they are located. And in some programs that need to validate the structure of the document, this is a more problematic issue. Of course, there is no solution to the problem, we can use a stack to achieve the document structure record.

The feature of the stack is FIFO, and our idea is to add the name of the tag to the stack in the Startelemnt () method and pop it out in the EndElement () method. We know that for a well-formed XML, the nesting structure is complete, each start tag always corresponds to an end tag, and there is no mismatch between the label nesting. Therefore, every time the Startelement () method of the call, will necessarily correspond to a endelement () method of the call, so that the push and pop also appear in pairs, we only need to analyze the structure of the stack, you can easily know the current label in the document structure position.

public class Saxreader extends DefaultHandler {
Java.util.Stack tags=new java.util.Stack ();
--------------XML Content-------------
String Text=null;
String Url=null;
String Author=null;
String Description=null;
String Day=null;
String Year=null;
String Month=null;
//----------------------------------------------
public void Enddocument () throws Saxexception {
SYSTEM.OUT.PRINTLN ("------Parse end--------");
}
public void Startdocument () throws Saxexception {
SYSTEM.OUT.PRINTLN ("------Parse Begin--------");
}
public void Startelement (string p0, String p1, String p2, Attributes p3) throws Saxexception {
Tags.push (p1);
}
public void EndElement (string p0, String p1, String p2) throws Saxexception {
Tags.pop ();
A link node's information is collected, formatted for output \
if (P1.equals ("link")) printout ();
}
public void characters (char[] p0, int p1, int p2) throws Saxexception {
Get the current node information from the stack
String tag= (String) Tags.peek ();
if (tag.equals ("text")) Text=new String (P0,P1,P2);
else if (tag.equals ("url")) url=new String (P0,P1,P2);
else if (tag.equals ("author")) Author=new String (P0,P1,P2);
else if (tag.equals ("Day")) Day=new String (P0,P1,P2);
else if (Tag.equals ("month")) Month=new String (P0,P1,P2);
else if (Tag.equals ("Year")) Year=new String (P0,P1,P2);
else if (tag.equals ("description")) Year=new String (P0,P1,P2);
}
private void printout () {
System.out.print ("Content:");
System.out.println (text);
System.out.print ("URL:");
System.out.println (URL);
System.out.print ("Author:");
System.out.println (author);
System.out.print ("Date:");
System.out.println (day+ "-" +month+ "-" +year);
System.out.print ("Description:");
System.out.println (description);
System.out.println ();
}
static public void Main (string[] args) {
String filename = null;
Boolean validation = false;
Filename= "Links.xml";
SAXParserFactory SPF = saxparserfactory.newinstance ();
SAXParser Saxparser=null;
try {
SAXParser = Spf.newsaxparser ();
catch (Exception ex) {
System.err.println (ex);
System.exit (1);
}
try {
Saxparser.parse (new File (filename), new Saxreader ());
catch (Saxexception se) {
System.err.println (Se.getmessage ());
System.exit (1);
catch (IOException IoE) {
System.err.println (IoE);
System.exit (1);
}
}
}

Although there is no use of the analysis of the stack, but in fact the stack analysis is a very easy thing, The Java.util.Vector class should be inherited for Java.util.Stack, and the elements in stack are arranged from bottom to top by the stack structure, because we can use the size () method of the vector class to get the number of elements in the stack. You can also use the vector get (int) method to obtain each of the specific genera. In fact, if the stack elements are sorted from bottom to top, we get a unique path from the XML root node to the current node, and with this path, the structure of the document is clear.

Section
Well, so far, we've mastered two great tools for XML programming: DOM and sax, and how to use them in a Java program. DOM programming is relatively simple, but it is slower, consumes more memory, and sax programming is complex, but fast and consumes less memory. Therefore, we should choose to use different methods according to different environment. Most of the XML applications can be solved by using them basically. Specifically, Dom and sax are language-independent, not unique to Java, which means that Dom and sax can be applied in any object-oriented language, as long as there is a corresponding language implementation.

Above we introduced the XML document reads, the content extraction, as well as the document addition and the modification some methods. Another type of problem is the conversion of XML documents, which can be solved with DOM and sax, but it is complex to implement, and the application of XSLT is much simpler. This issue, the author will be in the future of the article and we discuss in detail


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.