【Android】SAX解析之錯誤校正!!

來源:互聯網
上載者:User

在講這次錯誤之前,先看一下下面這段代碼。  【◆以下解析方法是錯誤的×】

import java.util.ArrayList;import java.util.List;import org.xml.sax.Attributes;import org.xml.sax.SAXException;import org.xml.sax.helpers.DefaultHandler;import android.util.Log;public class XmlHandler extends DefaultHandler{private final String TAG = this.getClass().getSimpleName();/**XML檔案中標籤定義*/private final String TAG_Article = "Article";private final String TAG_ArticleID = "ArticleID";private final String TAG_Title = "Title";private final String TAG_Date = "Date";private final String TAG_SmallPictures = "SmallPictures";private final String TAG_LargePictures = "LargePictures";private final String TAG_Category = "Category";private static final String TAG_HeadNote = "HeadNote";private static final String TAG_SubTitle = "SubTitle";private static final String TAG_Source = "Source";//當前正在解析的TAGprivate String currentName;//單個文章private News news = null;//文章列表private List<News>  newsList = null;//解析開始時間private long start_time;private boolean flag = false;@Overridepublic void characters(char[] ch, int start, int length)throws SAXException {super.characters(ch, start, length);if(!flag) {return;}// 取值String value = new String(ch, start, length);Log.d(TAG, "Element: " + currentName  + " Element Value: " + value);if(value != null) {if(TAG_ArticleID.equals(currentName)) {news.setArticleId(value);} else if(TAG_Title.equals(currentName)) {news.setTitle(value);} else if(TAG_Date.equals(currentName)) {news.setDate(value);} else if(TAG_Category.equals(currentName)) {news.setCategory(value);} else if(TAG_SmallPictures.equals(currentName)) {news.setSmallPicture(value);} else if(TAG_LargePictures.equals(currentName)) {news.setLargePicture(value);} else if(TAG_HeadNote.equals(currentName)) {news.setHeadNote(value);} else if(TAG_SubTitle.equals(currentName)) {news.setSubTitle(value);} else if(TAG_Source.equals(currentName)) {news.setSource(value);}}}@Overridepublic void startDocument() throws SAXException {super.startDocument();start_time = System.currentTimeMillis();newsList = new ArrayList<News>();}@Overridepublic void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {super.startElement(uri, localName, qName, attributes);this.currentName = localName;flag = true;if(TAG_Article.equals(localName)) {news = new News();}}@Overridepublic void endElement(String uri, String localName, String qName)throws SAXException {super.endElement(uri, localName, qName);flag = false;if(TAG_Article.equals(localName)) {newsList.add(news);}}@Overridepublic void endDocument() throws SAXException {super.endDocument();long end = System.currentTimeMillis();Log.d(TAG, "Parse List's Xml Cost: " + (end - start_time) + " !!");}}

Baidu 或者 Google 一下 “Android Sax 解析” , 給出的Sample無一例外都是如此。 坑爹啊... 甚至連有些書籍中都是這麼寫的, 比如《Android開發入門與實踐》。(本書親自確認過,其他書情況不詳)


沒錯, 一般情況下,這麼寫是可以的, 而且在大多數情況下解析出來也是正確的。 但是就是偶爾會出錯, 這個時候通常你都莫不著頭腦, 怎麼回事? 資料沒錯啊,解析部分代碼貌似也沒問題.. 真是奇了怪了。 其實問題都出在上面那段代碼上!!

大家都認為 SAX 解析過程大致如下:

startDocument  ->   startElement  -> characters -> endElement -> endDocument

沒錯,就是這樣, startElement  讀取起始標籤, endElement 讀取結束標籤,characters 呢?當然是讀取其值, 這沒錯,但是大家都天真的以為 characters 只執行一次,並且一次就讀取了全部內容。錯就錯在這!

其實characters 是很有可能會執行多次的,當遇到內容中有斷行符號,\t等等內容時,它很有可能就執行多次。 有的人可能會說,那我沒有這些是不是就只執行一次了? 看下我實測結果:

 


測試用XML如下:

<News><Article><ArticleID>1000555</ArticleID><Title><![CDATA[ 鄭州“亞洲第一橋”通車6年成危橋 ]]></Title><Date>2011-11-25 14:23:52</Date><SmallPictures>livenews/images/s20.png</SmallPictures><LargePictures>livenews/images/l20.png</LargePictures><Category>聞天下</Category><HeadNote></HeadNote><SubTitle></SubTitle><Author></Author><Source>人民日報</Source><Abstract></Abstract></Article><Article><ArticleID>1000554</ArticleID><Title><![CDATA[ 內地事業單位擬設統一工資制度 ]]></Title><Date>2011-11-25 14:22:33</Date><Category><![CDATA[ 聞天下 ]]></Category><HeadNote></HeadNote><SubTitle></SubTitle><Author></Author><Source></Source><Abstract></Abstract></Article><Article><ArticleID>1000553</ArticleID><Title></Title><Date>2011-11-25 14:21:23</Date><SmallPictures>livenews/images/s21.png</SmallPictures><LargePictures>livenews/images/l21.png</LargePictures><Category><![CDATA[ 星娛樂 ]]></Category><HeadNote></HeadNote><SubTitle></SubTitle><Author></Author><Source><![CDATA[ 鳳凰網綜合 ]]></Source><Abstract></Abstract></Article><News>

可以很明顯的看到,在解析 <ArticleID>1000553</ArticleID>  這一段時, characters執行了兩次,將內容"1000553"分兩次讀取.. 用上面那種方式的最終結果就是 ArticleID = 00553 了。 那如果你的應用需要根據這個id 進一步擷取內容豈不是死翹翹了?(比如這邊根據id擷取新聞詳細內容)

好了,廢話不多說了,看下正確的寫法!  【★以下解析方法才是正確的 √ 】

import java.util.ArrayList;import java.util.List;import org.xml.sax.Attributes;import org.xml.sax.SAXException;import org.xml.sax.helpers.DefaultHandler;import android.util.Log;public class XmlHandler extends DefaultHandler{private final String TAG = this.getClass().getSimpleName();/**XML檔案中標籤定義*/private final String TAG_Article = "Article";private final String TAG_ArticleID = "ArticleID";private final String TAG_Title = "Title";private final String TAG_Date = "Date";private final String TAG_SmallPictures = "SmallPictures";private final String TAG_LargePictures = "LargePictures";private final String TAG_Category = "Category";private static final String TAG_HeadNote = "HeadNote";private static final String TAG_SubTitle = "SubTitle";private static final String TAG_Source = "Source";//單個文章private News news = null;//文章列表private List<News>  newsList = null;//解析開始時間private long start_time;//(1)private StringBuilder sb = new StringBuilder();@Overridepublic void characters(char[] ch, int start, int length)throws SAXException {super.characters(ch, start, length);//(2)不管在startElement到endElement的過程中,執行了多少次characters, 都會將內容添加到StringBuilder中,不會丟失內容sb.append(ch, start, length);}@Overridepublic void startDocument() throws SAXException {super.startDocument();start_time = System.currentTimeMillis();newsList = new ArrayList<News>();}@Overridepublic void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {super.startElement(uri, localName, qName, attributes);//(3) 開始收集新的標籤的資料時,先清空曆史資料sb.setLength(0);if(TAG_Article.equals(localName)) {news = new News();}}@Overridepublic void endElement(String uri, String localName, String qName)throws SAXException {super.endElement(uri, localName, qName);//(4)原來在characters中取值,現改在此取值String value = sb.toString();if(TAG_ArticleID.equals(localName)) {news.setArticleId(value);} else if(TAG_Title.equals(localName)) {news.setTitle(value);} else if(TAG_Date.equals(localName)) {news.setDate(value);} else if(TAG_Category.equals(localName)) {news.setCategory(value);} else if(TAG_SmallPictures.equals(localName)) {news.setSmallPicture(value);} else if(TAG_LargePictures.equals(localName)) {news.setLargePicture(value);} else if(TAG_HeadNote.equals(localName)) {news.setHeadNote(value);} else if(TAG_SubTitle.equals(localName)) {news.setSubTitle(value);} else if(TAG_Source.equals(localName)) {news.setSource(value);}if(TAG_Article.equals(localName)) {newsList.add(news);}}@Overridepublic void endDocument() throws SAXException {super.endDocument();long end = System.currentTimeMillis();Log.d(TAG, "Parse List's Xml Cost: " + (end - start_time) + " !!");}}

歸納為三點:

1.startElement的時候, new StringBuilder(); 或者 sb.setLength(0); (我建議後者)
2.characters的時候,sb.append(ch, start, length);
3.endElement的時候,sb.toString(); 此時StringBuilder中的內容才是解析的結果


通過這種方法就不會再有資料離奇丟失的情況了(同時也不需要像錯誤方法那樣再設個currentTag之類的了,邏輯繁雜了,還出錯)! 


希望大家可以儘早看到這篇文章,不要繼續被吭了!!!


聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.