Parse XML using ElementTree in Python

Last Update:2015-03-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

【Introduction to basic XML concepts]
XML refers to the eXtensible Markup Language ).
XML is designed to transmit and store data.

Concept 1:

 
  
# Start label of the foo Element
 # End tag of the foo element # note: Each start tag must have a corresponding end tag to close. You can also write it

Concept 2:

 
  
# Element can be nested to any parameter
  # The bar element is a child element of the foo element.
 # End label of the parent element foo

Concept 3:

 
  
# The foo element has a lang attribute. The attribute Value is EN. It corresponds to the Python Dictionary (Name-Value) pair;
  # The bar element has a lang attribute. The attribute value is CH, And the id attribute value is 001, which is placed in ''or;
 # The lang attribute in the bar element does not conflict with the foo element. Each element has an independent attribute set;

Concept 4:

Learning Python# Elements can have text content # Note: if an element has no text content or child elements, it is a null element.

Concept 5:

 
  
# The info element is the root node.
  
   
A
  # The list element is a subnode.
  
   
B
       
  
   
C

Concept 6:

 
  
# You can declare xmlns to define the default namespace. The feed element is in the http://www.w3.org/2005/atom space.
  Dive into mark# The title element is also. The namespace declaration not only applies to the elements currently declared, but also affects all child elements of the element.

You can also use the xmlns: prefix declaration to define a namespace and get its name prefix.

Then, each element in the namespace must be explicitly declared using this prefix.

# Feed is a namespace atom dive into mark# The title element also belongs to the namespace# Xmlns (XML Name Space)

【XML parsing Methods]

Common XML programming interfaces include DOM and SAX. These two interfaces process XML files in different ways, so they are naturally used differently.

Python can parse XML in three ways:SAX,DOM, AndElementTree:

1. SAX(Simple API for XML)
The Pyhton standard library contains the SAX Parser. With the event-driven model, SAX triggers events one by one during XML parsing and calls user-defined callback functions to process XML files. SAX is an event-driven API. Parsing XML documents using SAX involves two parts: the parser and the event processor.
The parser is responsible for reading XML documents and sending events to the event processor, such as element start and end events, while the event processor is responsible for processing the events.

Advantages:The SAX stream reads XML files, which is faster and consumes less memory.

Disadvantages:You need to implement the callback function (handler ).

2. DOM(Document Object Model)
Parse XML data into a tree in the memory and operate the XML in the tree. When a DOM parser parses an XML document, it reads the entire document at one time and stores all the elements in the document in a tree structure in the memory, then you can use different functions provided by DOM to read or modify the content and structure of the document, or write the modified content into the xml file.

Advantages:The advantage of using DOM is that you do not need to track the status, because every node knows who is its parent node and who is a child node.

Disadvantages:DOM needs to map XML data to the tree in the memory. First, it is slow, second, it is memory-consuming, and it is troublesome to use!

3. ElementTree(Element tree)
ElementTree is like a lightweight DOM with convenient and friendly APIs. Good code availability, fast speed, and low memory consumption.

In comparison, the third method is convenient and fast. We always use it! The following describes how to parse XML using an element tree:

【ElementTree Parsing]

Two implementations

ElementTree is born to process XML, which has two implementations in the Python standard library.

One is pure Python implementation, such as xml. etree. ElementTree.

The other is faster: xml. etree. cElementTree

Try to use the C language, because it is faster and consumes less memory! In the program, you can write as follows:

try:    import xml.etree.cElementTree as ETexcept ImportError:    import xml.etree.ElementTree as ET

Common Methods

# Use the attrib method to obtain attribute values. # Use the text method to obtain the node value. # Use the tag method to obtain the node name.

Example XML

 
    
  
   Book message
      
          bookone        
   
    python check
           
   
    001
           
   
    200
       
      
          booktwo        
   
    python learn
           
   
    002
           
   
    300

###########

##Load XML

###########
Method 1:Load files

root = ET.parse('book.xml')

Method 2: Load string

root = ET.fromstring(xmltext)

###########

##Get Node

###########
Method 1: Obtain the getiterator () method of the specified node.

book_node = root.getiterator('list')

Method 2: Obtain the findall () method of the specified node.

book_node = root.findall('list')

Method 3: Obtain the find () method of the specified node.

book_node = root.find('list')

Method 4:Get the son node-> getchildren ()

for node in book_node:    book_node_child = node.getchildren()[0]    print book_node_child.tag, '=> ', book_node_child.text

###########

##Example 01

###########

# Coding = utf-8try: # import module import xml. etree. cElementTree as ettings t ImportError: import xml. etree. elementTree as ETroot = ET. parse ('book. xml ') # analyze the XML file books = root. findall ('/list') # Find the subnode of list under all root directories for book_list in books: # print "=" * 30 # output format for book in book_list: # traverse each subnode to find your attributes and values in if book. attrib. has_key ('id'): # use an id to determine the condition. print "id:", book. attrib ['id'] # print the print book of the attribute value based on the id. tag + '=>' + book. text # print the output tag and text content "=" * 30

Output result:

==============================head=> bookonename=> python checknumber=> 001page=> 200==============================head=> booktwoname=> python learnnumber=> 002page=> 300==============================

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Parse XML using ElementTree in Python

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support