在講這次錯誤之前,先看一下下面這段代碼。 【◆以下解析方法是錯誤的×】
import java.util.ArrayList;import java.util.List;import org.xml.sax.Attributes;import org.xml.sax.SAXException;import org.xml.sax.helpers.DefaultHandler;import android.util.Log;public class XmlHandler extends DefaultHandler{private final String TAG = this.getClass().getSimpleName();/**XML檔案中標籤定義*/private final String TAG_Article = "Article";private final String TAG_ArticleID = "ArticleID";private final String TAG_Title = "Title";private final String TAG_Date = "Date";private final String TAG_SmallPictures = "SmallPictures";private final String TAG_LargePictures = "LargePictures";private final String TAG_Category = "Category";private static final String TAG_HeadNote = "HeadNote";private static final String TAG_SubTitle = "SubTitle";private static final String TAG_Source = "Source";//當前正在解析的TAGprivate String currentName;//單個文章private News news = null;//文章列表private List<News> newsList = null;//解析開始時間private long start_time;private boolean flag = false;@Overridepublic void characters(char[] ch, int start, int length)throws SAXException {super.characters(ch, start, length);if(!flag) {return;}// 取值String value = new String(ch, start, length);Log.d(TAG, "Element: " + currentName + " Element Value: " + value);if(value != null) {if(TAG_ArticleID.equals(currentName)) {news.setArticleId(value);} else if(TAG_Title.equals(currentName)) {news.setTitle(value);} else if(TAG_Date.equals(currentName)) {news.setDate(value);} else if(TAG_Category.equals(currentName)) {news.setCategory(value);} else if(TAG_SmallPictures.equals(currentName)) {news.setSmallPicture(value);} else if(TAG_LargePictures.equals(currentName)) {news.setLargePicture(value);} else if(TAG_HeadNote.equals(currentName)) {news.setHeadNote(value);} else if(TAG_SubTitle.equals(currentName)) {news.setSubTitle(value);} else if(TAG_Source.equals(currentName)) {news.setSource(value);}}}@Overridepublic void startDocument() throws SAXException {super.startDocument();start_time = System.currentTimeMillis();newsList = new ArrayList<News>();}@Overridepublic void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {super.startElement(uri, localName, qName, attributes);this.currentName = localName;flag = true;if(TAG_Article.equals(localName)) {news = new News();}}@Overridepublic void endElement(String uri, String localName, String qName)throws SAXException {super.endElement(uri, localName, qName);flag = false;if(TAG_Article.equals(localName)) {newsList.add(news);}}@Overridepublic void endDocument() throws SAXException {super.endDocument();long end = System.currentTimeMillis();Log.d(TAG, "Parse List's Xml Cost: " + (end - start_time) + " !!");}}
Baidu 或者 Google 一下 “Android Sax 解析” , 給出的Sample無一例外都是如此。 坑爹啊... 甚至連有些書籍中都是這麼寫的, 比如《Android開發入門與實踐》。(本書親自確認過,其他書情況不詳)
沒錯, 一般情況下,這麼寫是可以的, 而且在大多數情況下解析出來也是正確的。 但是就是偶爾會出錯, 這個時候通常你都莫不著頭腦, 怎麼回事? 資料沒錯啊,解析部分代碼貌似也沒問題.. 真是奇了怪了。 其實問題都出在上面那段代碼上!!
大家都認為 SAX 解析過程大致如下:
startDocument -> startElement -> characters -> endElement -> endDocument
沒錯,就是這樣, startElement 讀取起始標籤, endElement 讀取結束標籤,characters 呢?當然是讀取其值, 這沒錯,但是大家都天真的以為 characters 只執行一次,並且一次就讀取了全部內容。錯就錯在這!
其實characters 是很有可能會執行多次的,當遇到內容中有斷行符號,\t等等內容時,它很有可能就執行多次。 有的人可能會說,那我沒有這些是不是就只執行一次了? 看下我實測結果:
測試用XML如下:
<News><Article><ArticleID>1000555</ArticleID><Title><![CDATA[ 鄭州“亞洲第一橋”通車6年成危橋 ]]></Title><Date>2011-11-25 14:23:52</Date><SmallPictures>livenews/images/s20.png</SmallPictures><LargePictures>livenews/images/l20.png</LargePictures><Category>聞天下</Category><HeadNote></HeadNote><SubTitle></SubTitle><Author></Author><Source>人民日報</Source><Abstract></Abstract></Article><Article><ArticleID>1000554</ArticleID><Title><![CDATA[ 內地事業單位擬設統一工資制度 ]]></Title><Date>2011-11-25 14:22:33</Date><Category><![CDATA[ 聞天下 ]]></Category><HeadNote></HeadNote><SubTitle></SubTitle><Author></Author><Source></Source><Abstract></Abstract></Article><Article><ArticleID>1000553</ArticleID><Title></Title><Date>2011-11-25 14:21:23</Date><SmallPictures>livenews/images/s21.png</SmallPictures><LargePictures>livenews/images/l21.png</LargePictures><Category><![CDATA[ 星娛樂 ]]></Category><HeadNote></HeadNote><SubTitle></SubTitle><Author></Author><Source><![CDATA[ 鳳凰網綜合 ]]></Source><Abstract></Abstract></Article><News>
可以很明顯的看到,在解析 <ArticleID>1000553</ArticleID> 這一段時, characters執行了兩次,將內容"1000553"分兩次讀取.. 用上面那種方式的最終結果就是 ArticleID = 00553 了。 那如果你的應用需要根據這個id 進一步擷取內容豈不是死翹翹了?(比如這邊根據id擷取新聞詳細內容)
好了,廢話不多說了,看下正確的寫法! 【★以下解析方法才是正確的 √ 】
import java.util.ArrayList;import java.util.List;import org.xml.sax.Attributes;import org.xml.sax.SAXException;import org.xml.sax.helpers.DefaultHandler;import android.util.Log;public class XmlHandler extends DefaultHandler{private final String TAG = this.getClass().getSimpleName();/**XML檔案中標籤定義*/private final String TAG_Article = "Article";private final String TAG_ArticleID = "ArticleID";private final String TAG_Title = "Title";private final String TAG_Date = "Date";private final String TAG_SmallPictures = "SmallPictures";private final String TAG_LargePictures = "LargePictures";private final String TAG_Category = "Category";private static final String TAG_HeadNote = "HeadNote";private static final String TAG_SubTitle = "SubTitle";private static final String TAG_Source = "Source";//單個文章private News news = null;//文章列表private List<News> newsList = null;//解析開始時間private long start_time;//(1)private StringBuilder sb = new StringBuilder();@Overridepublic void characters(char[] ch, int start, int length)throws SAXException {super.characters(ch, start, length);//(2)不管在startElement到endElement的過程中,執行了多少次characters, 都會將內容添加到StringBuilder中,不會丟失內容sb.append(ch, start, length);}@Overridepublic void startDocument() throws SAXException {super.startDocument();start_time = System.currentTimeMillis();newsList = new ArrayList<News>();}@Overridepublic void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {super.startElement(uri, localName, qName, attributes);//(3) 開始收集新的標籤的資料時,先清空曆史資料sb.setLength(0);if(TAG_Article.equals(localName)) {news = new News();}}@Overridepublic void endElement(String uri, String localName, String qName)throws SAXException {super.endElement(uri, localName, qName);//(4)原來在characters中取值,現改在此取值String value = sb.toString();if(TAG_ArticleID.equals(localName)) {news.setArticleId(value);} else if(TAG_Title.equals(localName)) {news.setTitle(value);} else if(TAG_Date.equals(localName)) {news.setDate(value);} else if(TAG_Category.equals(localName)) {news.setCategory(value);} else if(TAG_SmallPictures.equals(localName)) {news.setSmallPicture(value);} else if(TAG_LargePictures.equals(localName)) {news.setLargePicture(value);} else if(TAG_HeadNote.equals(localName)) {news.setHeadNote(value);} else if(TAG_SubTitle.equals(localName)) {news.setSubTitle(value);} else if(TAG_Source.equals(localName)) {news.setSource(value);}if(TAG_Article.equals(localName)) {newsList.add(news);}}@Overridepublic void endDocument() throws SAXException {super.endDocument();long end = System.currentTimeMillis();Log.d(TAG, "Parse List's Xml Cost: " + (end - start_time) + " !!");}}
歸納為三點:
1.startElement的時候, new StringBuilder(); 或者 sb.setLength(0); (我建議後者)
2.characters的時候,sb.append(ch, start, length);
3.endElement的時候,sb.toString(); 此時StringBuilder中的內容才是解析的結果
通過這種方法就不會再有資料離奇丟失的情況了(同時也不需要像錯誤方法那樣再設個currentTag之類的了,邏輯繁雜了,還出錯)!
希望大家可以儘早看到這篇文章,不要繼續被吭了!!!