Using ElementTree to parse XML samples in Python

Source: Internet
Author: User
Tags closing tag object model xmlns in python advantage

This article mainly introduces the use of ElementTree parsing XML Examples in Python, this article also explained the basic concept of XML, XML several analytic methods and elementtree analytic examples, the need for friends can refer to the following

"Introduction to Basic XML Concepts"

XML refers to Extensible Markup Language (extensible Markup Language).

XML is designed to transmit and store data.

Concept One:

The code is as follows:

   # foo element's start tag

# foo Element end tag

# Note: Each start tag must have a corresponding closing tag to close it, or it can be written as

Concept two:

The code is as follows:

   # elements can be nested to arbitrary parameters

   # bar element is child element of Foo element

# end tag for parent element foo

Concept three:

The code is as follows:

   The # foo element has an attribute of Lang, which is: EN; corresponding to the Python dictionary (name-value);

   The # bar element has an attribute of Lang, which is: CH; there is also an id attribute, the value is: 001, placed in ' or ';

# The lang attribute in the bar element does not conflict with the Foo element, and each element has a separate set of attributes;

Concept four:

The code is as follows:

# elements can have text content

# Note: If an element has no textual content and no child elements, it is an empty element.

Concept Five:

The code is as follows:

   # info element is the root node

a # list element is a child node

B

C

  

Concept Six:

The code is as follows:

   # You can define the default namespace by declaring xmlns, and the feed element is in the Http://www.w3.org/2005/Atom namespace

# The title element is also. A namespace declaration affects not only the elements that currently declare it, but also all child elements of that element

  

You can also define a namespace by Xmlns:prefix declaration and take its name as prefix.

Each element of the namespace must then be explicitly declared using this prefix (prefix).

   # Feed belongs to namespace Atom

   Dive into Mark # Title element also belongs to this namespace

# xmlns (XML Name space)

"XML several parsing methods"

Common XML programming interfaces are Dom and sax, and the two interfaces handle XML files in different ways, and the use of the context is naturally different.

Python has three ways to parse Xml:sax,dom and ElementTree:

1.SAX (simple APIs for XML)

The Pyhton standard library contains a SAX parser, which Sax uses an event-driven model to process an XML file by triggering events in parsing XML and invoking user-defined callback functions. Sax is an event-based-driven API. Using SAX to parse XML documents involves two parts: parsers and event handlers.

The parser is responsible for reading the XML document and sending events to the event handler, such as element start and end events, while the event handler is responsible for handling the event.

Advantage: Sax streams read XML files faster and consume less memory.

Disadvantage: User implementation callback function (handler) is required.

2.DOM (Document Object Model)

Parses XML data into a tree in memory, manipulating XML by manipulating the tree. When parsing an XML document, a DOM parser once you read the entire document and keep all the elements in the document in a tree structure in memory, you can then use the different functions provided by DOM to read or modify the contents and structure of the document, or you can write the modified content to an XML file.

Advantage: The advantage of using DOM is that you do not need to track the status, because each node knows who is its parent and who is the child node.

Disadvantage: DOM needs to map the XML data to the tree in memory, one is slower, the other is the memory consumption, it is more troublesome to use!

3.ElementTree (element Tree)

ElementTree is like a lightweight DOM with a user-friendly API. Code availability is good, fast, and consumes less memory.

In comparison, the third method, that is, convenient and fast, we have been using it! How to parse XML with an element tree is described below:

"ElementTree Resolution"

Two kinds of implementations

ElementTree was born to process XML, and it has two implementations in the Python standard library.

One is a pure Python implementation, for example: Xml.etree.ElementTree

The other one is a bit faster: Xml.etree.cElementTree

Try to use the kind that C language implements, because it is faster and consumes less memory! You can write this in your program:

The code is as follows:

Try

Import Xml.etree.cElementTree as ET

Except Importerror:

Import Xml.etree.ElementTree as ET

Common methods

The code is as follows:

# When you want to get the property value, use the Attrib method.

# When you want to get the node value, use the text method.

# When you want to get the section roll call, use the tag method.

Sample XML

The code is as follows:

  

  

   Book Message

Bookone

   Python check

   001

A

  

Booktwo

   Python Learn

   002

-

  

  

###########

# # Load XML

###########

Method One: Load file

The code is as follows:

root = Et.parse (' Book.xml ')

Method Two: Load string

The code is as follows:

Root = et.fromstring (xmltext)

###########

# # Get Node

###########

Method One: Get the specified node->getiterator () method

The code is as follows:

Book_node = root.getiterator (' list ')

Method Two: Get the specified node->findall () method

The code is as follows:

Book_node = Root.findall (' list ')

Method Three: Get the specified node->find () method

The code is as follows:

Book_node = root.find (' list ')

Method Four: Get son node->getchildren ()

The code is as follows:

For node in Book_node:

Book_node_child = Node.getchildren () [0]

Print Book_node_child.tag, ' => ', Book_node_child.text

###########

# # Example 01

###########

Copy code code as follows:

# Coding=utf-8

Try: # import Module

Import Xml.etree.cElementTree as ET

Except Importerror:

Import Xml.etree.ElementTree as ET

root = Et.parse (' book.xml ') # Parsing XML file

Books = Root.findall ('/list ') # Find child nodes of list in all root directories

For book_list in books: # Traversing results after lookup

print "=" * 30 # Output format

For books in Book_list: # Iterate over each child node to find out your attributes and values

If Book.attrib.has_key (' id '): # A sentence ID to make conditional judgments

Print "ID:", book.attrib[' id '] # Print out property values based on ID

Print Book.tag + ' => ' + book.text # output label and text content

print "=" * 30

Output results:

The code is as follows:

==============================

Head=> Bookone

name=> python Check

Number=> 001

Page=> 200

==============================

Head=> Booktwo

Name=> python Learn

Number=> 002

Page=> 300

==============================

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.