Android XML Parser

Source: Internet
Author: User
Tags xml parser

Recently, I encountered an XML parsing problem in the project. We used the DOM parser that comes with android to parse XML, but found a problem with android, that is, on SDK 2.3, strings such as <,> and cannot be parsed.

Although the data we return from the server should not contain such characters and should be escaped, sometimes, due to historical reasons, the server cannot make such correction, therefore, this problem can only be solved on the client. Next I will talk about how we solve this problem.

1. symptom
Our parsing code is:
[Java] DocumentBuilderFactory factory = DocumentBuilderFactory. newInstance ();
DocumentBuilder builder = factory. newDocumentBuilder ();
Document construct net = builder. parse (in );
Element root = fig ();

DocumentBuilderFactory factory = DocumentBuilderFactory. newInstance ();
DocumentBuilder builder = factory. newDocumentBuilder ();
Document construct net = builder. parse (in );
Element root = jsonnet. getDocumentElement (); The in builder. parse (in) is an InputStream input stream. For example, there is an XML section as follows:


[Html] <? Xml version = "1.0"?>
<Data>
<Success> 1 </success>
<Error>
<Code> </code>
<Message> </message>
</Error>
<Result>
<History_info_list>
<Row>
<Purchase_info_id> dnrxmauxecj3z6e4 </purchase_info_id>
<Title_id> 134051 </title_id>
<Title> day of every week! <Symbol uvailles> </title>
<Volume_number> 001 </volume_number>
<Author_name> the specified field has been modified </author_name>
<Contents_name> Please wait until the day of the month expires! <Symbol uvailles> 1 character </contents_name>
<Date_open> 2011-12-02 </date_open>
<Purchase_date> 18:39:48 </purchase_date>
<Image_url>/resources/c_media/images/thumb/262/134051 _01_0000l.jpg </image_url>
<Contents>
<Story_number> 1 </story_number>
<Contents_id> bt1_13405100100101500014 </contents_id>
<File_size> 34168162 </file_size>
<Within_Wifi> 0 </Within_Wifi>
</Contents>
<Text_to_speech_flg> 0 </text_to_speech_flg>
<Restrict_num>-1 </restrict_num>
<Issue> 3 </issue>
<Subpartition> 0 </subpartition>
<Adult_flg> 0 </adult_flg>
</Row>
</History_info_list>
</Result>
</Data>

<? Xml version = "1.0"?>
<Data>
<Success> 1 </success>
<Error>
<Code> </code>
<Message> </message>
</Error>
<Result>
<History_info_list>
<Row>
<Purchase_info_id> dnrxmauxecj3z6e4 </purchase_info_id>
<Title_id> 134051 </title_id>
<Title> day of every week! <Symbol uvailles> </title>
<Volume_number> 001 </volume_number>
<Author_name> the specified field has been modified </author_name>
<Contents_name> Please wait until the day of the month expires! <Symbol uvailles> 1 character </contents_name>
<Date_open> 2011-12-02 </date_open>
<Purchase_date> 18:39:48 </purchase_date>
<Image_url>/resources/c_media/images/thumb/262/134051 _01_0000l.jpg </image_url>
<Contents>
<Story_number> 1 </story_number>
<Contents_id> bt1_13405100100101500014 </contents_id>
<File_size> 34168162 </file_size>
<Within_Wifi> 0 </Within_Wifi>
</Contents>
<Text_to_speech_flg> 0 </text_to_speech_flg>
<Restrict_num>-1 </restrict_num>
<Issue> 3 </issue>
<Subpartition> 0 </subpartition>
<Adult_flg> 0 </adult_flg>
</Row>
</History_info_list>
</Result>
</Data>

There is a title node that contains <> in the middle, but the XML has been escaped, so it should be parsed normally, but in SDK2.3 (it should be less than 3.0 ), it specially processes these escape characters. It regards the text in the title as four text nodes and the content is:

1. Every day is the day of every month! Review

2, <

3, Retrouvailles

4,> 1 hour

Therefore, this is incorrect. In fact, it should be a node with the content of [every day, every day! <Symbol uvailles> 1 character]. However, in SDK 3.0, this problem was fixed.

 


2. cause of the problem
Well, the above is a phenomenon. Let's talk about the causes and solutions for this phenomenon.

Looking at the android source code, we found that:

Android uses apache harmony code for XML parsing. I think the dalvik of android should be apache's harmonyxml parser.

In fact, the XML parsing of harmony uses KXML. It seems that android is a bunch of open-source code.

 

[Java] row 113: XmlPullParser parser = new KXmlParser ();
Row 3: else if (token = XmlPullParser. TEXT)
Node. appendChild (document. createTextNode (parser. getText ()));
Row 277: else if (token = XmlPullParser. ENTITY_REF)
String entity = parser. getName (); if (entityResolver! = Null ){
// TODO Implement this...
} String replacement = resolveStandardEntity (entity );
If (replacement! = Null ){
Node. appendChild (document. createTextNode (replacement ));
} Else {
Node. appendChild (document. createEntityReference (entity ));
}

Row 3: XmlPullParser parser = new KXmlParser ();
Row 3: else if (token = XmlPullParser. TEXT)
Node. appendChild (document. createTextNode (parser. getText ()));
Row 277: else if (token = XmlPullParser. ENTITY_REF)
String entity = parser. getName (); if (entityResolver! = Null ){
// TODO Implement this...
} String replacement = resolveStandardEntity (entity );
If (replacement! = Null ){
Node. appendChild (document. createTextNode (replacement ));
} Else {
Node. appendChild (document. createEntityReference (entity ));
} As you can see above, when processing the characters with & <& gt &;, it is divided into several text nodes.

 

3. Solution
We already know the cause of the problem. How can we solve it?

1. judge if the child node is a text node, combine all the text strings of the node.

2. Change the above processing method. This line of code is node. appendChild. When the first child node of this node is a text node, add the current character.

The method used in the project is the first one, because the method is simple and the implementation is as follows:

 

[Java]/**
* This method is used to indicate the specified node's all sub nodes are text node or not.
*
* @ Param node The specified node.
*
* @ Return true if all sub nodes are text type, otherwise false.
*/
Public static boolean areAllSubNodesTextType (Node node)
{
If (null! = Node)
{
Int nodeCount = node. getChildNodes (). getLength ();
NodeList list = node. getChildNodes ();
For (int I = 0; I <nodeCount; ++ I)
{
Short noteType = list. item (I). getNodeType ();
If (Node. TEXT_NODE! = NoteType)
{
Return false;
}
}
}
 
Return true;
}
 
/**
* Get the node value. If the node's all sub nodes are text type, it will append
* All sub node's text as a whole text and return it.
*
* @ Param node The specified node.
*
* @ Return The value.
*/
Private static String getNodeValue (Node node)
{
If (null = node)
{
Return "";
}
 
StringBuffer sb = new StringBuffer ();
 
Int nodeCount = node. getChildNodes (). getLength ();
NodeList list = node. getChildNodes ();
For (int I = 0; I <nodeCount; ++ I)
{
Short noteType = list. item (I). getNodeType ();
If (Node. TEXT_NODE = noteType)
{
Sb. append (list. item (I). getNodeValue ());
}
}
 
Return sb. toString ();
}
}

/**
* This method is used to indicate the specified node's all sub nodes are text node or not.
*
* @ Param node The specified node.
*
* @ Return true if all sub nodes are text type, otherwise false.
*/
Public static boolean areAllSubNodesTextType (Node node)
{
If (null! = Node)
{
Int nodeCount = node. getChildNodes (). getLength ();
NodeList list = node. getChildNodes ();
For (int I = 0; I <nodeCount; ++ I)
{
Short noteType = list. item (I). getNodeType ();
If (Node. TEXT_NODE! = NoteType)
{
Return false;
}
}
}

Return true;
}

/**
* Get the node value. If the node's all sub nodes are text type, it will append
* All sub node's text as a whole text and return it.
*
* @ Param node The specified node.
*
* @ Return The value.
*/
Private static String getNodeValue (Node node)
{
If (null = node)
{
Return "";
}

StringBuffer sb = new StringBuffer ();

Int nodeCount = node. getChildNodes (). getLength ();
NodeList list = node. getChildNodes ();
For (int I = 0; I <nodeCount; ++ I)
{
Short noteType = list. item (I). getNodeType ();
If (Node. TEXT_NODE = noteType)
{
Sb. append (list. item (I). getNodeValue ());
}
}

Return sb. toString ();
}
}

 

 

 

 

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.