Using ElementTree to parse XML in Python

Source: Internet
Author: User

Using ElementTree to parse XML in Python

This article mainly introduces the example of using ElementTree to parse XML in Python. This article also describes the basic concepts of XML, several XML parsing methods, and ElementTree parsing examples. For more information, see

[Introduction to basic XML concepts]

XML refers to the eXtensible Markup Language ).

XML is designed to transmit and store data.

Concept 1:

The Code is as follows:

   # Start label of the foo Element

# End tag of the foo Element

# Note: Each start tag must have a corresponding end tag to close. It can also be written

Concept 2:

The Code is as follows:

   # Element can be nested to any parameter

   # The bar element is a child element of the foo element.

# End label of the parent element foo

Concept 3:

The Code is as follows:

   # The foo element has a lang attribute. The attribute Value is EN. It corresponds to the Python Dictionary (Name-Value) pair;

   # The bar element has a lang attribute. The attribute value is CH, And the id attribute value is 001, which is placed in ''or;

# The lang attribute in the bar element does not conflict with the foo element. Each element has an independent attribute set;

Concept 4:

The Code is as follows:

# Elements can contain text content

# Note: if an element has no text content or child elements, it is a null element.

Concept 5:

The Code is as follows:

   # The info element is the root node.

A # The list element is a subnode.

B

C

  

Concept 6:

The Code is as follows:

   # You can declare xmlns to define the default namespace. The feed element is in the http://www.w3.org/2005/atom space.

# The title element is also. The namespace declaration not only applies to the elements currently declared, but also affects all child elements of the element.

  

You can also use the xmlns: prefix declaration to define a namespace and get its name prefix.

Then, each element in the namespace must be explicitly declared using this prefix.

# Feed is a namespace atom

Dive into mark# The title element also belongs to the namespace

# Xmlns (XML Name Space)

[XML parsing methods]

Common XML programming interfaces include DOM and SAX. These two interfaces process XML files in different ways, so they are naturally used differently.

Python has three methods to parse XML: SAX, DOM, and ElementTree:

1. SAX (Simple API for XML)

The Pyhton standard library contains the SAX Parser. With the event-driven model, SAX triggers events one by one during XML parsing and calls user-defined callback functions to process XML files. SAX is an event-driven API. Parsing XML documents using SAX involves two parts: the parser and the event processor.

The parser is responsible for reading XML documents and sending events to the event processor, such as element start and end events, while the event processor is responsible for processing the events.

Advantage: The SAX stream reads XML files, which is faster and consumes less memory.

Disadvantage: You need to implement the callback function (handler ).

2. DOM (Document Object Model)

Parse XML data into a tree in the memory and operate the XML in the tree. When a DOM parser parses an XML document, it reads the entire document at one time and stores all the elements in the document in a tree structure in the memory, then you can use different functions provided by DOM to read or modify the content and structure of the document, or write the modified content into the xml file.

Advantage: the advantage of using DOM is that you do not need to track the status, because every node knows who is its parent node and who is its child node.

Disadvantage: DOM needs to map XML data to the tree in the memory. First, it is relatively slow, second, it is relatively memory-consuming and troublesome to use!

3. ElementTree (element tree)

ElementTree is like a lightweight DOM with convenient and friendly APIs. Good code availability, fast speed, and low memory consumption.

In comparison, the third method is convenient and fast. We always use it! The following describes how to parse XML using an element tree:

[ElementTree parsing]

Two implementations

ElementTree is born to process XML, which has two implementations in the Python standard library.

One is pure Python implementation, such as xml. etree. ElementTree.

The other is faster: xml. etree. cElementTree

Try to use the C language, because it is faster and consumes less memory! In the program, you can write as follows:

The Code is as follows:

Try:

Import xml. etree. cElementTree as ET

Failed t ImportError:

Import xml. etree. ElementTree as ET

Common Methods

The Code is as follows:

# Use the attrib method to obtain attribute values.

# Use the text method to obtain the node value.

# Use the tag method to obtain the node name.

Example XML

The Code is as follows:

  

  

   Book message

Bookone

   Python check

   001

200

  

Booktwo

   Python learn

   002

300

  

  

###########

# Load XML

###########

Method 1: load files

The Code is as follows:

Root = ET. parse ('book. xml ')

Method 2: load a string

The Code is as follows:

Root = ET. fromstring (xmltext)

###########

# Obtain a node

###########

Method 1: Obtain the getiterator () method of the specified node.

The Code is as follows:

Book_node = root. getiterator ('LIST ')

Method 2: Obtain the specified node-> findall () method

The Code is as follows:

Book_node = root. findall ('LIST ')

Method 3: Obtain the specified node-> find () method

The Code is as follows:

Book_node = root. find ('LIST ')

Method 4: Get the son node-> getchildren ()

The Code is as follows:

For node in book_node:

Book_node_child = node. getchildren () [0]

Print book_node_child.tag, '=>', book_node_child.text

###########

# Example 01

###########

Copy the Code as follows:

# Coding = UTF-8

Try: # import module

Import xml. etree. cElementTree as ET

Failed t ImportError:

Import xml. etree. ElementTree as ET

Root = ET. parse ('book. xml') # analyze the xml file

Books = root. findall ('/list') # Find list subnodes in all root directories

For book_list in books: # traverse the search results

Print "=" * 30 # output format

For book in book_list: # traverse each subnode to find your attributes and values.

If book. attrib. has_key ('id'): # use an id to determine conditions.

Print "id:", book. attrib ['id'] # print the attribute value based on the id

Print book. tag + '=>' + book. text # output tag and text content

Print "=" * 30

Output result:

The Code is as follows:

====================================

Head => bookone

Name => python check

Number => 001

Page => 200

====================================

Head => booktwo

Name => python learn

Number => 002

Page => 300

====================================

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.