XML parsing technology of Android development

Last Update:2017-01-13 Source: Internet

Author: User

Tags tagname trim

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

In Android, the common XML parsers are DOM parsers, sax parsers, and pull parsers, which I'll explain in more detail.

The first way: DOM parser:

A DOM is a collection of nodes or pieces of information based on a tree structure that allows developers to traverse an XML tree and retrieve the required data using the DOM API. Parsing the structure usually requires loading the entire document and constructing the tree structure before you can retrieve and update node information. Android fully supports DOM parsing. With objects in the DOM, you can read, search, modify, add, and delete XML documents.

How the DOM works: when working with the DOM on an XML file, you first parse the file, divide the file into separate elements, attributes, and annotations, and then represent the XML file in memory in the form of a node tree, accessing the contents of the document through the node tree, and modifying the document as needed-- That's how the DOM works.

The DOM implementation first defines a set of interfaces for parsing XML documents, the parser reads the entire document, and then constructs a tree structure that resides in memory, so that the code can manipulate the entire tree structure using a DOM interface.

Because the DOM is stored in memory in a tree structure, retrieval and update efficiency is higher. However, for exceptionally large documents, parsing and loading the entire document will be very resource intensive. Of course, if the content of the XML file is relatively small, it is possible to adopt DOM.

Common DOM interfaces and classes:

Document: This interface defines a series of methods for parsing and creating a DOM document, which is the root of the document tree and the basis for manipulating the DOM.

Element: This interface inherits the node interface and provides a way to get and modify XML element names and attributes.

Node: This interface provides methods for processing and obtaining values for nodes and child nodes.

NodeList: Provides methods for obtaining the number of nodes and the current node. This allows you to iterate over each node.

Domparser: This class is the DOM parser class in Apache Xerces, which can parse XML files directly.

The following is the parsing process for the DOM:

The second way: Sax parser:

The SAX (simple API for XML) parser is an event-based parser, and event-driven streaming parsing is done from the beginning of the file to the end of the document, not paused or regressed. The core of it is the event-handling pattern, which works around the event source and the event handler. When an event source generates an event, a handler for the event handler is invoked, and an event can be processed. When the event source invokes a particular method in the event handler, it is also passed to the event handler's state information, so that the event handler can determine its behavior based on the event information provided.

The advantage of the SAX parser is that it has a fast parsing speed and consumes less memory. Ideal for use in Android mobile devices.

How Sax works: Sax works simply by sequentially scanning the document, notifying the event handlers when the document begins and ends, the element starts and ends, the document ends, and so on. The event handler acts accordingly, and then continues the same scan until the end of the document.

In the Sax interface, the event source is the XmlReader in the Org.xml.sax package, which parses the XML document through the parser () method and generates events. The event handler is the 4 interfaces Contenthander, Dtdhander, ErrorHandler, and Entityresolver in the Org.xml.sax package. XmlReader the connection to the 4 interfaces of Contenthander, Dtdhander, ErrorHandler, and Entityresolver with the corresponding event handler registration method Setxxxx ().

Common sax Interfaces and classes:

Attrbutes: Used to get the number, name, and value of the attribute.

ContentHandler: Defines the events associated with the document itself (for example, start and end tags). Most applications register these events.

Dtdhandler: Defines the events associated with the DTD. It does not define enough events to fully report the DTD. If you need to parse the DTD, use an optional Declhandler.

Declhandler is the extension of Sax. Not all parsers support it.

Entityresolver: Defines the events associated with the Mount entity. Only a few applications register these events.

ErrorHandler: Define error events. Many applications register these events to make an error in their own way.

DefaultHandler: It provides the default implementations of these connectors. In most cases, it is easier to extend DefaultHandler and overwrite related methods for an application than to implement an interface directly.

Please refer to the table below:

We need XmlReader and DefaultHandler to work with parsing XML.

The following is the parsing process for sax:

The Third Way: Pull parser:

Android does not provide support for the Java StAX API. However, Android comes with a pull parser that works like a Stax. It allows the user's application code to get events from the parser, as opposed to the SAX parser automatically pushes events into the handler.

The pull parser runs the same way as sax, and is an event-based pattern. The difference is that the number is returned in the pull parsing process, and we need to get the resulting event and then do the appropriate action instead of executing our code like Sax by the way the processor triggers an event.

A declaration read to XML returns start_document;

The code is as follows	Copy Code
Reading to the end of XML returns end_document; The start tag read to XML returns to Start_tag The end tag read to XML returns End_tag Text read to XML returns

The pull parser is compact, lightweight, quick to parse, easy to use, ideal for Android mobile devices, and is used in the internal Android system to parse various XML with pull parsers, and Android officially recommends that developers use pull parsing techniques. Pull parsing technology is a third-party development of Open source technology, it can also be applied to javase development.

How PULL works: XML PULL provides a starting element and an end element. When an element starts, we can invoke parser. Nexttext extracts all character data from an XML document. The Enddocument event is automatically generated when interpreted to the end of a document.

Common XML pull interfaces and classes:

The Xmlpullparser:xml pull parser is an interface that provides a definition of parsing functionality in Xmlpull VlAP1.

XmlSerializer: It is an interface that defines a sequence of XML information sets.

Xmlpullparserfactory: This class is used to create an XML pull parser in the Xmpull V1 API.

Xmlpullparserexception: Throws a single XML pull parser-related error.

The pull parsing process is as follows:

[Additional] The fourth way: Android.util.Xml class

Android is also available in the Android API. Util XML classes can also parse XML files, use methods like sax, and write handler to handle XML parsing, but it's simpler to use than sax, as follows:

To Android. Util XML to implement XML parsing,

code is as follows	copy code
MyHandler myhandler=new MyHandler0; Android. Util Xm1. Parse (UR1. Openc0nnection (). Getlnputstream0,xm1. Encoding. Utf-8,myhandler)

Below is a reference document River.xml, placed in the assets directory. as follows:

code is as follows

copy code

<?xml Version= "1.0" encoding= "Utf-8"?>
<rivers>
<river name= "Lingqu" length= "605"
    <introduction>
      Lingqu is one of the world's oldest canals in the Xingan County territory of Guangxi Zhuang Autonomous Region, with the " The world's Ancient Water conservancy construction pearl "reputation. Lingqu Ancient said Qin Chisel ditch, 0 canals, Douhe, Xing ' an canal, in the 214 before the opening of the navigation, from now 2,217 years, still play a role.
     </introduction>
      <imageurl>
     http://imgsrc.baidu.com/baike/pic/item/389aa8fdb7b8322e08244d3c.jpg
      </imageurl>
   </river>

<river name= "Glue Lake Canal" Length= ">"
<introduction>
Glue-Lai Canal south of the Yellow Sea Lingshan Haikou, north of the Bohai Sea Sanshandao, flowing through the present Jiaonan, Jiaozhou, Pingdu, Gaomi, Changyi and Laizhou, length of 200 km, watershed area of 5400 square kilometers, north and south through the Shandong Peninsula, communication between the two seas. The Yao Jia Cun Canal is a watershed north-south diversion from the east of Pingdu. South Stream from the mouth of Ma Wan into the Jiaozhouwan, for the South gum lai, 30 kilometers long. Beiliu from the sea Cang Kou into the Laizhouwan, for the North gum lai he, longer than 100 kilometers.

The code is as follows

Copy Code

</ Introduction>
      <imageurl>
      http:// Imgsrc.baidu.com/baike/pic/item/389aa8fdb7b8322e08244d3c.jpg
     </imageurl>
   </river>

   <river name= "North Jiangsu Irrigation Total Canal" length= "the" >
    <introduction>
      is located in the lower reaches of the Huai River, Jiangsu Province, West Hongze Lake Gaoliang, through Hongze, Qingpu, Huaian, Funing, Sheyang, Binhai and other six counties (district), Dongzhi Pole port into the sea of large artificial river. Length of 168km.
     </introduction>
      <imageurl>
     http://imgsrc.baidu.com/baike/pic/item/389aa8fdb7b8322e08244d3c.jpg
      </imageurl>
   </river>
</rivers>

The specific processing steps when using DOM parsing are:

1 First Use Documentbuilderfactory to create a documentbuilderfactory instance
2 then use Documentbuilderfactory to create Documentbuilder

3 then load the XML document (documents),
4 then gets the root node (Element) of the document,
5 then gets the list of all the child nodes in the root node (nodelist),
6 then use to get the node that needs to be read in the list of child nodes.

Of course we observe the nodes, I need to use a river object to save the data, abstract the River class

The code is as follows

Copy Code

public class River implements Serializable {

Privatestaticfinallong serialversionuid = 1L;

private String name;

Public String GetName () {

return name;

}

public void SetName (String name) {

THIS.name = name;

}

public int GetLength () {

return length;

}

public void SetLength (int length) {

this.length = length;

}

Public String getintroduction () {

return introduction;

}

public void Setintroduction (String introduction) {

This.introduction = introduction;

}

Public String Getimageurl () {

return ImageUrl;

}

public void Setimageurl (String imageurl) {

This.imageurl = ImageUrl;

}

private int length;

Private String Introduction;

Private String ImageUrl;

}

Here we start reading the XML Document object and adding it into the list:

The code is as follows: Here we use the River.xml file in assets, then we need to read the XML file and return the input stream. The Read method is: Inputstream=this.context.getresources (). Getassets (). open (FileName); parameter is the XML file path, and of course the default is the assets directory as the root directory.

You can then parse the input stream with the parse method of the Documentbuilder object, return the Document object, and then traverse the node properties of the Doument object.

Get all the river data

The code is as follows

Copy Code

/**

* Parameter filename: path to XML document

*/

Public list<river> getriversfromxml (String fileName) {

List<river> rivers=new arraylist<river> ();

Documentbuilderfactory Factory=null;

Documentbuilder Builder=null;

Document Document=null;

InputStream Inputstream=null;

First find the XML file

Factory=documentbuilderfactory.newinstance ();

try {

Locate the XML and load the document

Builder=factory.newdocumentbuilder ();

Inputstream=this.context.getresources (). Getassets (). open (FileName);

Document=builder.parse (InputStream);

root element found

Element root=document.getdocumentelement ();

NodeList Nodes=root.getelementsbytagname (RIVER);

Traverse all child nodes of the root node, rivers all river

River River=null;

for (int i=0;i<nodes.getlength (); i++) {

River=new River ();

Get River Element Node

Element riverelement= (Element) (Nodes.item (i));

Gets the value of the Name property in river

River.setname (Riverelement.getattribute (NAME));

River.setlength (Integer.parseint (Riverelement.getattribute (LENGTH)));

Get River under Introduction label

Element introduction= (Element) Riverelement.getelementsbytagname (Introduction). Item (0);

River.setintroduction (Introduction.getfirstchild (). Getnodevalue ());

Element imageurl= (Element) Riverelement.getelementsbytagname (IMAGEURL). Item (0);

River.setimageurl (Imageurl.getfirstchild (). Getnodevalue ());

Rivers.add (river);

}

}catch (IOException e) {

E.printstacktrace ();

catch (Saxexception e) {

E.printstacktrace ();

}

catch (Parserconfigurationexception e) {

E.printstacktrace ();

}finally{

try {

Inputstream.close ();

catch (IOException e) {

E.printstacktrace ();

}

}

return rivers;

}

Add it to the list here, and then we use ListView to show them. As shown in the figure:

The specific processing steps when using SAX parsing are:

1 Creating SAXParserFactory Objects

2 Returns a SAXParser parser according to the Saxparserfactory.newsaxparser () method

3 Get the event source object according to the SAXParser parser XmlReader

4 instantiation of a DefaultHandler object

5 Connection Event Source object XmlReader to event-handling class DefaultHandler

6 Call the XmlReader parse method to obtain the XML data from the input source

7 Returns the data collection we need via DefaultHandler.

The code is as follows:

The code is as follows

Copy Code

Public list<river> Parse (String xmlpath) {

List<river> Rivers=null;

SAXParserFactory factory=saxparserfactory.newinstance ();

try {

SAXParser Parser=factory.newsaxparser ();

Get Event Source

XMLReader Xmlreader=parser.getxmlreader ();

Setting up the processor

Riverhandler handler=new Riverhandler ();

Xmlreader.setcontenthandler (handler);

Parsing XML documents

Xmlreader.parse (New InputSource (Xmlpath). OpenStream ());

Xmlreader.parse (New InputSource (This.context.getAssets (). Open (Xmlpath));

Rivers=handler.getrivers ();

catch (Parserconfigurationexception e) {

TODO auto-generated Catch block

E.printstacktrace ();

catch (Saxexception e) {

TODO auto-generated Catch block

E.printstacktrace ();

catch (IOException e) {

E.printstacktrace ();

}

return rivers;

}

The focus is on the processing of each element node, attributes, text content, and document content in the DefaultHandler object.

As I said earlier, DefaultHandler is based on the event-handling model, and the basic approach is to callback the Startdocument method when the SAX parser navigates to the document's start tag, and the callback Enddocument method when navigating to the end tag of the document. When the SAX parser navigates to the element start tag, the callback Startelement method, the callback characters method when navigating to its text content, navigates to the callback EndElement method at the end of the label.

Based on the above explanations, we can draw the following logic for handling XML documents:

1: When navigating to the beginning of the document tag, in the callback function startdocument, you can not do the processing, of course, you can verify the UTF-8 and so on.

2: When navigating to the rivers start tag, a collection can be instantiated in the callback method startelement to store the list, but we don't have to, because it's already instantiated in the constructor.

3: When you navigate to the river start tag, you need to instantiate the river object, and of course the river tag also has the NAME, length property, so after instantiating river you must also take out the property value, Attributes.getvalue (NAME), Also, add a Boolean-true identity to the river object and add the river tag that is navigated to to illustrate the navigation to the river element.

4: Of course there are river tags inside the tag (node), but the SAX parser is not know what to navigate to the label, it only understand the beginning, the end. So how do we make it recognize our labels? Of course you need to judge, so you can use the parameter string LocalName in the callback method startelement to compare our tag string with this parameter. We also have to let sax know that a tag is now being navigated to, so add a true property so that the SAX parser knows.

5: It will also navigate to the text inside the label, (that is, the contents of </img>), callback method characters, we generally take out in this method is the contents of </img> inside, and save. 6: Of course it is bound to navigate to the end tag </river> or </rivers>, if it is </river> tag, remember to add the river object to the list. If the child label in River is </introduction>, set the Boolean tag that navigates the preceding setting tag to this label to false. According to the above implementation ideas, you can implement the following code:

/** navigation to start tag trigger **/

Publicvoid startelement (String uri, String localname, String qName, Attributes Attributes) {

String tagname=localname.length ()!=0?localname:qname;

Tagname=tagname.tolowercase (). Trim ();

If the river tag is read, instantiate the river

if (Tagname.equals (RIVER)) {

Isriver=true;

River=new River ();

/** navigates to the river start node **/

River.setname (Attributes.getvalue (NAME));

River.setlength (Integer.parseint (Attributes.getvalue (LENGTH)));

}

Then read the other nodes

if (isriver) {

if (Tagname.equals (INTRODUCTION)) {

Xintroduction=true;

}else if (tagname.equals (IMAGEURL)) {

Ximageurl=true;

}

}

}

/** navigation to end tag trigger **/

public void EndElement (string uri, String localname, String qName) {

String tagname=localname.length ()!=0?localname:qname;

Tagname=tagname.tolowercase (). Trim ();

If the river tag is read, the river is added to the collection

if (Tagname.equals (RIVER)) {

Isriver=true;

Rivers.add (river);

}

Then read the other nodes

if (isriver) {

if (Tagname.equals (INTRODUCTION)) {

Xintroduction=false;

}else if (tagname.equals (IMAGEURL)) {

Ximageurl=false;

}

}

}

Here is the callback when reading to the node content

public void characters (char[] ch, int start, int length) {

Setting property values

if (xintroduction) {

Resolve NULL Problem

River.setintroduction (River.getintroduction () ==null "": River.getintroduction () +new String (ch,start,length));

}else if (ximageurl) {

Resolve NULL Problem

River.setimageurl (River.getimageurl () ==null "": River.getimageurl () +new String (ch,start,length));

}

}

The run effect is the same as the previous example Dom.

Using pull to resolve basic processing methods:

When the pull parser navigates to the document start tag, it starts instantiating the list collection to store the data objects. When navigating to an element's start tag, the element label type is judged, and if it is a river label, the river object needs to be instantiated, and if it is a different type, the label content is obtained and the river object is assigned. Of course it will also navigate to the text tag, but here we can use it.

Based on the above explanations, we can draw the following logic for handling XML documents:

1: When navigating to the xmlpullparser.start_document, you can not do the processing, of course, you can instantiate the collection objects and so on.

2: When navigating to Xmlpullparser.start_tag, determine if it is a river label, and if so, instantiate the river object and call the Getattributevalue method to get the property value in the label.

3: When navigating to other tags, such as introduction time, then determine whether the river object is empty, if not empty, then take out the contents of the introduction, Nexttext method to get the text node content

4: Of course, it will certainly navigate to the Xmlpullparser.end_tag, there is the beginning of the end of it. Here we need to interpret whether it is a river end tag, and if so, put the river object in the list collection and set the river object to null.

From the above processing logic, we can draw the following code:

The code is as follows

Copy Code

Public list<river> Parse (String xmlpath) {

List<river> rivers=new arraylist<river> ();

River River=null;

InputStream Inputstream=null;

Get Xmlpullparser Parser

Xmlpullparser Xmlparser = Xml.newpullparser ();

try {

Get the file stream and set the encoding way

Inputstream=this.context.getresources (). Getassets (). open (Xmlpath);

Xmlparser.setinput (InputStream, "utf-8");

Gets the category of events resolved to, where there are start documents, end documents, start tags, end tags, text, and so on.

int Evttype=xmlparser.geteventtype ();

Loop until the end of the document

while (evttype!=xmlpullparser.end_document) {

Switch (evttype) {

Case Xmlpullparser.start_tag:

String tag = Xmlparser.getname ();

If the river tag starts, you need to instantiate the object.

if (Tag.equalsignorecase (RIVER)) {

river = New River ();

Remove some attribute values from the river label

River.setname (Xmlparser.getattributevalue (null, NAME));

River.setlength (Integer.parseint (Xmlparser.getattributevalue (null, LENGTH));

}else if (river!=null) {

If the introduction tag is encountered, its contents are read

if (Tag.equalsignorecase (INTRODUCTION)) {

River.setintroduction (Xmlparser.nexttext ());

}else if (Tag.equalsignorecase (IMAGEURL)) {

River.setimageurl (Xmlparser.nexttext ());

}

}

Break

Case Xmlpullparser.end_tag:

If you encounter the end of the river tag, add the river object to the collection

if (Xmlparser.getname (). Equalsignorecase (RIVER) && RIVER!= null) {

Rivers.add (river);

river = null;

}

Break

Default:break;

}

If the XML does not end, navigate to the next river node

Evttype=xmlparser.next ();

}

catch (Xmlpullparserexception e) {

TODO auto-generated Catch block

E.printstacktrace ();

}catch (IOException E1) {

TODO auto-generated Catch block

E1.printstacktrace ();

}

return rivers;

}

The effect is the same as above.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More