Java and XML joint programming in Sax

Source: Internet
Author: User
Tags list of attributes xml parser

The previous article mentioned how to use Dom to parse and operate XML in Java. This article explains how to use SAX to parse and operate XML in Java, which is a little clear on the mobile platform, the XML-based event mechanism can save a lot of memory.

----- Text -----
Source: http://www0.ccidnet.com/tech/guide/2001/10/08/58_3392.html

Sax concepts

Sax is the abbreviation of Simple API for XML. It is not a standard officially proposed by W3C. It can be said that it is a "folk" fact standard. In fact, it is a product of community discussions. Even so, there is no less Dom than the application of sax in XML, and almost all XML parser will support it.

Compared with Dom, Sax is a lightweight method. We know that when processing the Dom, We need to read the entire XML document, and then create a DOM tree in the memory to generate each node object on the DOM tree. When the document is small, this will not cause any problems, but once the document is large, it will become quite time-consuming and laborious to process the Dom. In particular, its demand for memory will also multiply, making it uneconomical to use DOM in some applications (such as in the applet ). At this time, a better alternative solution is sax.

Sax is conceptually different from Dom. First of all, unlike the DOM document driver, it is event-driven, that is, it does not need to read the entire document, and the document reading process is the parsing process of sax. Event-driven is a program running method based on the callback mechanism. (If you have a clear understanding of the new proxy event model in Java, this mechanism will be easily understood)


When xmlreader accepts XML documents, the XML documents are parsed during reading. That is to say, the process of reading the documents and the process of parsing are performed at the same time, which is very different from the Dom. Before parsing, You need to register a contenthandler with xmlreader, which is equivalent to an event listener. Many methods are defined in contenthandler, such as startdocument (), which is customized During the parsing process, something that should be handled at the beginning of the document. When xmlreader reads the appropriate content, it will throw the corresponding event and delegate the event processing permission to contenthandler, and call the corresponding method to respond.

In this case, it may not be easy to understand. Don't worry. The following example will help you understand the parsing process of sax. Let's take a look at this simple XML file:

<; Poem>
<Author> Ogden Nash </author>
<Title> fleas </title>
<Line> Adam </line>
</Poem>

When xmlreader reads the <; poem> tag, it calls the contenthandler. startelement () method and passes the tag name poem as a parameter. In the startelement () method you implement, You need to perform corresponding actions to deal with what should be done when <; poem> appears. Each event is thrown out sequentially along with the parsing process (that is, the process of reading the document), and the corresponding methods are called sequentially. Finally, when the parsing is complete, after the methods are called, the document processing is completed. The following table lists the methods called in sequence when parsing the preceding XML file:

Project encountered

Method callback

{Document start} Startdocument ()
<; Poem> Startelement (null, "poem", null, {attributes })
"/N" Characters ("<; poem>/n...", 6, 1)
<Author> Startelement (null, "author", null, {attributes })
"Ogden Nash" Characters ("<; poem>/n...", 15, 10)
</Author> Endelement (null, "author", null)
"/N" Characters ("<; poem>/n...", 34, 1)
<Title> Startelement (null, "title", null, {attributes })
"Fleas" Characters ("<; poem>/n...", 42, 5)
</Title> Endelement (null, "title", null)
"/N" Characters ("<; poem>/n...", 55, 1)
<Line> Startelement (null, "line", null, {attributes })
"Adam" Characters ("<; poem>/n...", 62, 4)
</Line> Endelement (null, "line", null)
"/N" Characters ("<; poem>/n...", 67, 1)
</Poem> Endelement (null, "poem", null)
{Document ended} Enddocument ()

Contenthandler is actually an interface. when processing a specific XML file, you need to create a contenthandler class for it to process specific events. It can be said that, this is actually the core of processing XML files by sax. Let's take a look at some of the methods defined here:

Void characters (char [] CH, int start, int length ):

This method is used to process the reading of a string in an XML file. Its parameter is a character array and the start position and length of the string to be read in this array, we can easily use a constructor of the string class to obtain the string class of this string: String charencontered = new string (CH, start, length ).

Void startdocument ():

When a document starts, call this method to perform preprocessing.

Void enddocument ():

In contrast to the above method, when the document ends, you can call this method to do some aftercare work.

Void startelement (Java. Lang. String namespaceuri, java. Lang. String localname, java. Lang. String QNAME, attributes ATTS)

This method is triggered when a start tag is read. The sax1.0 version does not support domain names, but the new 2.0 version provides support for domain names. Here, the namespaceuri parameter is the domain name, And localname is the tag name, QNAME is the modifier prefix of the tag. When the domain name is not used, neither of the two parameters is null. ATTS is the list of attributes contained in this tag. Through ATTS, you can get all the attribute names and corresponding values. It should be noted that an important feature of Sax is its stream processing. When encountering a tag, it will not record the previously encountered tag, that is, in startelement () all the information you know in the method is the name and attribute of the tag. As for the nested structure of the tag, the name of the Upper-layer tag, whether there is sub-meta, and other information related to the structure, they are all unknown, and they all need your program to complete. This makes it easier to program and process sax than Dom.

Void endelement (Java. Lang. String namespaceuri, java. Lang. String localname, java. Lang. String QNAME)

This method corresponds to the above method and calls this method when an end tag is encountered.

Contenthandler is an interface that may be inconvenient to use. Therefore, a helper class: defaulthandler is also developed for it in sax, which implements this interface, however, all its method bodies are empty. During implementation, you only need to inherit the class and then reload the corresponding method.

Okay. Now the basic knowledge of Sax is almost finished. Let's take a look at two specific examples to better understand the usage of sax.

Sax programming example

We still use the document example we used when talking about Dom, but first, let's look at a simple application. We hope to count the number of times each tag appears in the XML file. This example is simple, but it is enough to explain the basic idea of sax programming.

At the beginning, the Import Statement was used:

Import org. xml. Sax. helpers. defaulthandler;
Import javax. xml. parsers .*;
Import org. xml. Sax .*;
Import org. xml. Sax. helpers .*;
Import java. util .*;
Import java. Io .*;

Then, we create a class that inherits from defaulthandler. The specific program logic can be put aside for the moment. Note the program structure:

Public class saxcounter extends defaulthandler {
Private hashtable tags; // This hashtable is used to record the number of tag occurrences
// Process the work before the document
Public void startdocument () throws saxexception {
Tags = new hashtable (); // initialize hashtable
}
// Process each starting Element
Public void startelement (string namespaceuri, string localname,
String rawname, attributes ATTS)
Throws saxexception
{
String key = localname;
Object value = tags. Get (key );
If (value = NULL ){
// If it is a new tag, add a record to hastable.
Tags. Put (Key, new INTEGER (1 ));
} Else {
// If you have encountered it before, get its Count value and Add 1
Int COUNT = (integer) value). intvalue ();
Count ++;
Tags. Put (Key, new INTEGER (count ));
}
}
// Statistics after resolution
Public void enddocument () throws saxexception {
Enumeration E = tags. Keys ();
While (E. hasmoreelements ()){
String tag = (string) E. nextelement ();
Int COUNT = (integer) tags. Get (TAG). intvalue ();
System. Out. println ("tag <" + tag + "> occurs" + count
+ "Times ");
}
}
// Program entry for parsing
Static public void main (string [] ARGs ){
String filename = NULL;
Boolean validation = false;
Filename = "links. xml ";
Saxparserfactory SPF = saxparserfactory. newinstance ();
Xmlreader = NULL;
Saxparser = NULL;
Try {
// Create a parser saxparser object
Saxparser = SPF. newsaxparser ();
// Obtain the sax xmlreader encapsulated in saxparser.
Xmlreader = saxparser. getxmlreader ();
} Catch (exception ex ){
System. Err. println (Ex );
System. Exit (1 );
}
Try {
// Use the specified contenthandler to parse the content to the XML file. Note that
// For the sake of program simplicity, the main program and contenthandler are put together here. Actually
// All the tasks in the main method are irrelevant to contenthandler.
Xmlreader. parse (new file (filename), new saxcounter ());
} Catch (saxexception SE ){
System. Err. println (SE. getmessage ());
System. Exit (1 );
} Catch (ioexception IOE ){
System. Err. println (IOE );
System. Exit (1 );
}
}
}

Let's take a look at what this program has done. In the main () method, the main task is to create a parser and then parse the document. In fact, when creating a saxparser object here, we used the same design skills as in Dom to make the program code irrelevant to the specific Parser: A saxparserfactory class is used to create a specific saxparser object. In this way, when different Resolvers need to be used, all that needs to be changed is the value of an environment variable, the code of the program can remain unchanged. This is the idea of factorymethod. I will not talk about it here. If you still don't understand it, refer to the explanation in the DOM above. The principle is the same.

However, the relationship between the saxparser class and the xmlreader class is a little more important here. You may be confused. In fact, saxparser is an encapsulation class for xmlreader in JAXP, and xmlreader is an interface defined in sax2.0 to parse documents. You can call the saxparser or the Parser () method in xmlreader to parse the document. The results are exactly the same. However, the Parser () method in saxparser accepts more parameters and can parse different XML document data sources. Therefore, it is easier to use than xmlreader.

This example only involves a little bit of sax, and the following is more advanced. The following functions have been implemented in the DOM example, that is, reading the content from the XML document and formatting the output. Although the program logic looks simple, but Sax is no better than Dom. Look at it.

As mentioned above, when a start tag is encountered, in the startelement () method, we cannot obtain the position of this tag in the XML document. This is a big headache when processing XML documents, because part of the tag semantics in XML is determined by its location. This is also a problem in some programs that need to verify the document structure. Of course, we can use a stack to record the document structure.

Stack features FIFO. Our current idea is to add the label name to the stack using push in the startelemnt () method, in endelement () in the method, pop it out. We know that for a well-structured XML, its nested structure is complete, and each start tag will always correspond to an end tag, and there will be no dislocation between tag nesting. Therefore, every call to the startelement () method is bound to correspond to a call to the endelement () method. In this way, push and pop appear in pairs. We only need to analyze the stack structure, you can easily know where the current tag is in the document structure.

Public class saxreader extends defaulthandler {
Java. util. Stack tags = new java. util. Stack ();
// -------------- XML content -------------
String text = NULL;
String url = NULL;
String author = NULL;
String description = NULL;
String day = NULL;
String year = NULL;
String month = NULL;
//----------------------------------------------
Public void enddocument () throws saxexception {
System. Out. println ("------ parse end --------");
}
Public void startdocument () throws saxexception {
System. Out. println ("------ parse begin --------");
}
Public void startelement (string P0, string P1, string P2, attributes P3) throws saxexception {
Tags. Push (P1 );
}
Public void endelement (string P0, string P1, string P2) throws saxexception {
Tags. Pop ();
// The information of a link node is collected and formatted.
If (p1.equals ("Link") Printout ();
}
Public void characters (char [] P0, int P1, int P2) throws saxexception {
// Obtain information about the current node from the stack
String tag = (string) tags. Peek ();
If (tag. Equals ("text") TEXT = new string (P0, P1, P2 );
Else if (tag. Equals ("url") url = new string (P0, P1, P2 );
Else if (tag. Equals ("author") Author = new string (P0, P1, P2 );
Else if (tag. Equals ("day") Day = new string (P0, P1, P2 );
Else if (tag. Equals ("month") month = new string (P0, P1, P2 );
Else if (tag. Equals ("year") year = new string (P0, P1, P2 );
Else if (tag. Equals ("Description") year = new string (P0, P1, P2 );
}
Private void printout (){
System. Out. Print ("content :");
System. Out. println (text );
System. Out. Print ("url :");
System. Out. println (URL );
System. Out. Print ("Author :");
System. Out. println (author );
System. Out. Print ("Date :");
System. Out. println (day + "-" + month + "-" + year );
System. Out. Print ("Description :");
System. Out. println (description );
System. Out. println ();
}
Static public void main (string [] ARGs ){
String filename = NULL;
Boolean validation = false;
Filename = "links. xml ";
Saxparserfactory SPF = saxparserfactory. newinstance ();
Saxparser = NULL;
Try {
Saxparser = SPF. newsaxparser ();
} Catch (exception ex ){
System. Err. println (Ex );
System. Exit (1 );
}
Try {
Saxparser. parse (new file (filename), new saxreader ());
} Catch (saxexception SE ){
System. Err. println (SE. getmessage ());
System. Exit (1 );
} Catch (ioexception IOE ){
System. Err. println (IOE );
System. Exit (1 );
}
}
}

Although the stack analysis is not used here, it is very easy to analyze the stack. util. stack inherits Java. util. vector class, and the elements in the stack are arranged from bottom to top by stack structure. For each element, we can use the size () method of the Vector class to obtain the number of elements in the stack, you can also use the get (INT) method of vector to obtain each element. In fact, if we arrange the elements of the stack from the bottom up one by one, we will get a unique path from the XML root node to the current node, with this path information, the structure of the document is clear.

Section

So far, we have mastered the two major XML programming tools: Dom and sax and how to use them in a Java program. Dom programming is relatively simple, but it is slow and occupies a large amount of memory, while sax programming is more complex, but it is faster and consumes less memory. Therefore, we should choose different methods based on different environments. Most XML applications can be solved with them. It should be particularly noted that Dom and sax are actually independent of languages and are not unique to Java. That is to say, as long as there is a corresponding language implementation, dom and sax can be applied in any object-oriented language.

The preceding section describes how to read, extract, and add and modify an XML document. Another type of problem is the conversion of XML documents. Although Dom and sax can also be used, the implementation is complicated, and the application of XSLT is much simpler. I will discuss this issue in detail in future articles.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.