One of the solutions to text loss caused by parsing XML using Sax

Source: Internet
Author: User

Recently, I am working on a small project. I found a solution to the problem of using the sax to parse XML-text data parsing is missing. Here, I would like to share with you some suggestions.
The following XML file:
<? XML version = "1.0" encoding = "UTF-8"?>
<Entry xmlns = "http://www.w3.org/2005/Atom"
Xmlns: Gd = "http://schemas.google.com/g/2005"
Xmlns: opensearch = "http://a9.com/-/spec/opensearchrss/1.0"
Xmlns: DB = "http://www.douban.com/xmlns/">
<Category Scheme = "http://www.douban.com/2007#kind"
Term = "http://www.douban.com/2007#book"/>
<DB: Tag COUNT = "15" name = "Yishan Gongyi"/>
<DB: Tag COUNT = "6" name = "Fiction"/>
<DB: Tag COUNT = "5" name = "Japanese novel"/>
<DB: Tag COUNT = "2" name = "Japanese literature"/>
<DB: Tag COUNT = "2" name = "Japan"/>
<Title> if I am on the other side-Japan's best-selling love novels </title>
<Author>
<Name> Yishan Gongyi </Name>
</Author>
<Summary>
This book is composed of three beautiful love stories that seem to be independent and actually are connected. The hero is a teacher and enjoys a certain kind of sports. They have personally experienced or witnessed a moment approaching death and learned something in life.
This is the first love novel set of the author after "Call for love at the center of the world. In the age of students, I read the complete set of Modern Japanese literature, including Xia Mushu Shi and Da Jiang Jian sanlang. I also read modern European philosophy from Descartes, levenitz to capitalism and Marx. He has created novels since he was 22 years old and 3 years old. "Breath", "the world runs in a place you don't know", and "Don't trust John Ranon" are all his masterpiece.
</Summary>
<LINK rel = "self" href = "http://api.douban.com/book/subject/2023013"/>
<LINK rel = "Collection" href = "http://api.douban.com/collection/1234567"/> <! -- Included only after API authentication and authorization -->
<LINK rel = "alternate" href = "http://book.douban.com/subject/2023013/"/>
<LINK rel = "image" href = "http://t.douban.com/spic/s2328836.jpg"/>
<DB: attribute name = "isbn10"> 7543639130 </DB: attribute>
<DB: attribute name = "isbn13"> 9787543639133 </DB: attribute>
<DB: attribute name = "pages"> 193 </DB: attribute>
<DB: attribute name = "tranlator"> Zhang Xing </DB: attribute>
<DB: attribute name = "price"> 14 </DB: attribute>
<DB: attribute name = "author"> Yishan Gongyi </DB: attribute>
<DB: attribute name = "publisher"> Qingdao press </DB: attribute>
<DB: attribute name = "binding"> flat </DB: attribute>
<DB: attribute name = "author-Intro">
Yi Shan Gong Yi, born in Aishu County, Japan in 1959, graduated from the Department of Agriculture of Kyushu University with a major in agricultural economics. In the student age, I read the complete set of Modern Japanese literature, including Xia Mushu Shi and DA jiangjian sanlang. I also read modern European philosophy from Descartes, levenes to capitalism. He also read Marx. The bachelor's thesis is written by Marx, and the master's thesis is written by Engels. He started to create novels when he was 22 years old and 3 years old. Masterpiece includes "Call for love in the center of the world", "the world operates in places you don't know", "the night of the full moon", "EMPTY shot", and the new work "if I am on the other side >.
</DB: attribute>
Http://api.douban.com/book/subject/2023013
<GD: rating min = "1" numraters = "12" average = "4.00" max = "5"/>
</Entry>
The problem is as follows:
When parsing a node <DB: attribute name = "author-Intro">, the text is always missing. Later, the program was improved as follows:
View code
Package com. jftt. Douban. parser;
Import java. util. arraylist;
Import org. xml. Sax. attributes;
Import org. xml. Sax. saxexception;
Import org. xml. Sax. helpers. defaulthandler;
Import com. jftt. Douban. Bean. Book;
Import com. jftt. Douban. util. doubanutil;
/**
*
* @ Author changjianlong@cn.fujitsu.com
*
*/
Public class doubansearchparser extends defaulthandler {
Private arraylist <book> Booklist;
Private book;
Private Boolean isisbn = false;
Private Boolean isprice = false;
Private Boolean isauthor = false;
Private Boolean istitle = false;
Private string curelement;
@ Override
Public void startdocument () throws saxexception {
Booklist = new arraylist <book> ();
Super. startdocument ();
}
@ Override
Public void startelement (string Uri, string localname, string QNAME,
Attributes attributes) throws saxexception {
Curelement = localname;
Super. startelement (Uri, localname, QNAME, attributes );
If ("entry". Equals (localname )){
Book = New Book ();
} Else if ("attribute". Equals (localname )){
If ("isbn13". Equals (attributes. getvalue ("name "))){
Isisbn = true;
}
If ("price". Equals (attributes. getvalue ("name "))){
Isprice = true;
}
If ("author". Equals (attributes. getvalue ("name "))){
Isauthor = true;
}
} Else if (book! = NULL & "title". Equals (localname )){
Istitle = true;
} Else if ("Link". Equals (localname )){
If ("image". Equals (attributes. getvalue ("rel "))){
Book. setpicsrc (attributes. getvalue ("href "));
Book. setbitpic (doubanutil. queryimagebyuri (book. getpicsrc ()));
}
}
}
@ Override
Public void characters (char [] CH, int start, int length)
Throws saxexception {
Super. characters (CH, start, length );
If (isisbn ){
Book. setisbn (new string (CH, start, length ));
Isisbn = false;
}
If (isauthor ){
Book. setauthor (new string (CH, start, length ));
Isauthor = false;
}
If (isprice ){
Book. setprice (new string (CH, start, length ));
Isprice = false;
}
If (istitle ){
Book. settitle (new string (CH, start, length ));
Istitle = false;
}
If (curelement. Equals ("name "))
Book. setauthor (new string (CH, start, length ));
}
@ Override
Public void endelement (string Uri, string localname, string QNAME)
Throws saxexception {
Super. endelement (Uri, localname, QNAME );
If ("entry". Equals (localname )){
Booklist. Add (book );
}
}
@ Override
Public void enddocument () throws saxexception {
Super. enddocument ();
}
Public arraylist <book> getbooklist (){
Return Booklist;
}
}
Cause:
When the text data is too long to be parsed to the current node, the text callback function will be called back multiple times. Using stringbuffer to cache the data is a solution.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.