Parsing xml with ElementTree in Python

Last Update:2015-02-25 Source: Internet

Author: User

Tags closing tag

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

"Introduction to XML Basic Concepts "
XML refers to Extensible Markup Language (extensible Markup Language).
XML is designed to transmit and store data.

Concept One :

<foo>      # foo element's start tag </foo>     # foo element's end tag           # Note: Each start tag must have a corresponding closing tag to be closed, or it can be written as a <foo/>

Concept two :

<foo>           # elements can be nested into any parameter  <bar></bar>   # bar element as a child of the Foo element </foo>          # The end tag of the parent element foo

Concept three :

<foo lang= ' en ' >                  # foo element has a lang attribute value: EN; corresponding Python dictionary (name-value) pair;                              <bar id= ' 001 ' lang= "CH" > The </bar> # bar element has a lang attribute with a value of: CH, and an id attribute with a value of: 001, placed in ' or ',                           and the lang attribute in the </foo> # Bar element does not conflict with the Foo element. Each element has a separate set of attributes;

Concept four :

<title>learning python</title> # element can have text content                                # Note: If an element has no textual content and no child elements, it is an empty element.

Concept Five:

<info>                                  # Info element is root node    <list id= ' 001 ' > A </list>           # list element is child node    <list id= ' 002 ' > B </list>     <list id= ' 003 ' > C </list></info>

Concept VI:

<feed xmlns= ' Http://www.w3.org/2005/Atom ' >  # The default namespace can be defined by declaring xmlns, and the feed element is in the http://www.w3.org/2005/ The  <title>dive into Mark</title> # title element in the Atom namespace             is also. A namespace declaration not only acts on the element that currently declares it, but also affects all child elements of that element </feed>

You can also define a namespace and take its name prefix with the Xmlns:prefix declaration.

Each element in the namespace must then be explicitly declared with this prefix (prefix).

<atom:feed xmlns:atom= ' Http://www.w3.org/2005/Atom ' >  # Feed belongs to namespace Atom  <atom:title>dive into mark</atom:title>             # Title element also belongs to the namespace           </atom:feed>                                          # xmlns (XML Name space)

" several analytic methods of XML "

The common XML programming interface has DOM and sax, and the two interfaces handle XML files in different ways, and the use cases are naturally different.

Python has three ways to parse xml: SAX,DOM, and elementtree:

1.SAX (Simple API for XML)
The Pyhton standard library contains the SAX parser, which uses the event-driven model to process XML files by triggering events and invoking user-defined callback functions during parsing of XML. Sax is an event-driven API. Parsing an XML document with sax involves two parts: the parser and the event handler.
The parser is responsible for reading the XML document and sending events to the event handler, such as element start and end events, while the event handler is responsible for handling the event.

Advantages: Sax streams read XML files faster and consumes less memory.

Disadvantages: A user implementation callback function (handler) is required.

2.DOM (Document Object Model)
Parses XML data into a tree in memory, manipulating the XML by manipulating the tree. A DOM parser parses an XML document, reads the entire document at once, stores all the elements of the document in a tree structure in memory, and then you can use the different functions provided by the DOM to read or modify the contents and structure of the document, or to write the modified content to an XML file.

Pros: The advantage of using the DOM is that you don't need to track the status because each node knows who it is and who is the child node.

Disadvantages: Dom needs to map the XML data into the tree in memory, one is slower, the other is more memory consumption, and it is more troublesome to use.

3.ElementTree (element Tree)
ElementTree is like a lightweight dom, with a convenient and friendly API. Good code availability, fast speed, low memory consumption.

Compared to the third method, that is convenient, and fast, we have been using it! The following describes how XML is parsed with the element tree:

"ElementTree parsing "

Two kinds of implementations

ElementTree is born to handle XML, and it has two implementations in the Python standard library.

One is a pure Python implementation, for example: Xml.etree.ElementTree

The other one is faster: Xml.etree.cElementTree

Try to use C as much as possible, because it's faster and consumes less memory! This can be written in the program:

Try:    import xml.etree.cElementTree as etexcept importerror:    import Xml.etree.ElementTree as ET

Common methods

# When you want to get the property value, use the Attrib method. # When you want to get the node value, use the text method. # When you want to get a node name, use the tag method.

Sample XML

<?xml version= "1.0" encoding= "Utf-8"?><info> <intro>book message</intro>    <list Id= ' 001 ' >        
###########
##   Load XML  
###########
Method One : Loading files 
root = Et.parse (' Book.xml ')
 Method Two : load the string 
Root = et.fromstring (xmltext)

###########
##  get node 
###########
 Method One : get the specified node->getiterator () method 
Book_node = root.getiterator (' list ')
 Method Two : get the specified node->findall () method 
Book_node = Root.findall (' list ')
 Method Three : get the specified node->find () method 
Book_node = root.find (' list ')
method Four : get son node->getchildren ()
For node in Book_node:    book_node_child = Node.getchildren () [0]    print book_node_child.tag, ' = = ', Book_node_ Child.text
###########
##     Example
###########
# Coding=utf-8try:                                           # Importing module import    xml.etree.cElementTree as etexcept importerror:    Import Xml.etree.ElementTree as Etroot   = Et.parse (' book.xml ')                 # Parse XML file Books  = Root.findall ('/list ')                #  Find child nodes of list under all root for book_list in books:                       # traverse The result of a lookup    print "=" *                            # Output format for book in               book_list:                    # Iterate over each sub-node and find out your property and value                             if Book.attrib.has_key (' id '):         # an ID to make the conditional judgment            print "ID:", book.attrib[' id ')    # Print        Book.tag + ' + ' + book.text    # output label and text content Printed "=" * 30 based on ID
Output Result:
==============================head=> bookonename=> python checknumber=> 001page=> 200================== ============head=> booktwoname=> python learnnumber=> 002page=> 300==============================





Parsing xml with ElementTree in Python

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More