Parsing XML samples using ElementTree in Python

Source: Internet
Author: User
Tags closing tag
"Introduction to XML Basic Concepts"

XML refers to Extensible Markup Language (extensible Markup Language).
XML is designed to transmit and store data.
Concept One:
Copy the Code code as follows:


# The start tag of the Foo element
# The end tag of foo element
# Note: Each start tag must have a corresponding closing tag to be closed, or it can be written as


Concept two:
Copy CodeThe code is as follows:


# elements can be nested to any parameter
# bar element is a child element of Foo element
# end tag of parent element foo


Concept three:
Copy CodeThe code is as follows:


# foo element has a property of Lang, the value of which is: EN; corresponding Python dictionary (name-value) pairs;
The # bar element has a lang attribute with the value: CH; and an id attribute with a value of: 001, placed in ' or ';
The lang attribute in the # bar element does not conflict with the Foo element, and each element has its own set of attributes;

Concept four:
Copy the Code code as follows:


<title>Learning Python</title># elements can have textual content
# Note: If an element has no text content and no child elements, it is an empty element.


Concept Five:
Copy CodeThe code is as follows:


# Info element is the root node
a # list element is a child node
B
C


Concept VI:
Copy CodeThe code is as follows:


# You can define default namespaces by declaring xmlns, and feed elements are in the Http://www.w3.org/2005/Atom namespace
Dive into Mark# The title element is also. A namespace declaration not only acts on the element that currently declares it, but also affects all child elements of that element

You can also define a namespace and take its name prefix with the Xmlns:prefix declaration.
Each element in the namespace must then be explicitly declared with this prefix (prefix).
# Feed belongs to namespace Atom
Dive into Mark# The title element also belongs to the namespace
# xmlns (XML Name Space)

"Several analytic methods of XML"

The common XML programming interface has DOM and sax, and the two interfaces handle XML files in different ways, and the use cases are naturally different.

Python has three ways to parse Xml:sax,dom, and ElementTree:

1.SAX (Simple API for XML)

The Pyhton standard library contains the SAX parser, which uses the event-driven model to process XML files by triggering events and invoking user-defined callback functions during parsing of XML. Sax is an event-driven API. Parsing an XML document with sax involves two parts: the parser and the event handler.
The parser is responsible for reading the XML document and sending events to the event handler, such as element start and end events, while the event handler is responsible for handling the event.
Pros: Sax streams read XML files faster and consumes less memory.
Cons: User-implemented callback functions (handler) are required.

2.DOM (Document Object Model)

Parses XML data into a tree in memory, manipulating the XML by manipulating the tree. A DOM parser parses an XML document, reads the entire document at once, stores all the elements of the document in a tree structure in memory, and then you can use the different functions provided by the DOM to read or modify the contents and structure of the document, or to write the modified content to an XML file.
Pros: The advantage of using the DOM is that you don't need to track the status because each node knows who it is and who is the child node.
Disadvantage: DOM needs to map XML data to the tree in memory, one is slower, the other is more memory consumption, and it is more troublesome to use.

3.ElementTree (element Tree)

ElementTree is like a lightweight dom, with a convenient and friendly API. Good code availability, fast speed, low memory consumption.
Compared to the third method, that is convenient, and fast, we have been using it! The following describes how XML is parsed with the element tree:

"ElementTree parsing"

Two kinds of implementations

ElementTree is born to handle XML, and it has two implementations in the Python standard library.
One is a pure Python implementation, for example: Xml.etree.ElementTree
The other one is faster: Xml.etree.cElementTree
Try to use C as much as possible, because it's faster and consumes less memory! This can be written in the program:
Copy the Code code as follows:


Try
Import Xml.etree.cElementTree as ET
Except Importerror:
Import Xml.etree.ElementTree as ET


Common Methods
Copy CodeThe code is as follows:


# When you want to get the property value, use the Attrib method.
# When you want to get the node value, use the text method.
# When you want to get a node name, use the tag method.

Sample XML
Copy the Code code as follows:


<?xml version= "1.0" encoding= "Utf-8"?>

Book Message

Bookone
Python check
001
200


Booktwo
Python Learn
002
300



###########
## Load XML
###########

Method One: Load the file
Copy the Code code as follows:


root = Et.parse (' Book.xml ')


Method Two: Load a string
Copy CodeThe code is as follows:


Root = et.fromstring (xmltext)

###########
# # Get node
###########

Method One: Get the specified node->getiterator () method
Copy the Code code as follows:


Book_node = root.getiterator (' list ')


Method Two: Get the specified node->findall () method
Copy CodeThe code is as follows:


Book_node = Root.findall (' list ')


Method Three: Get the specified node->find () method
Copy CodeThe code is as follows:


Book_node = root.find (' list ')


Method Four: Get son node->getchildren ()
Copy CodeThe code is as follows:


For node in Book_node:
Book_node_child = Node.getchildren () [0]
Print Book_node_child.tag, ' = = ', Book_node_child.text


###########
# # Example
###########
Copy the Code code as follows:


# Coding=utf-8

Try: # import Module
Import Xml.etree.cElementTree as ET
Except Importerror:
Import Xml.etree.ElementTree as ET

root = Et.parse (' book.xml ') # Parse XML file
Books = Root.findall ('/list ') # Find child nodes of list under all root directories
For book_list in books: # Traversal of results after lookup
print "=" * 30 # Output format
In Book_list: # Iterate over each child node to find out your attributes and values
If Book.attrib.has_key (' id '): # an ID to make conditional judgments
Print "ID:", book.attrib[' id ') # Prints the attribute value based on the ID
Print Book.tag + ' = ' + book.text # output label and text content
print "=" * 30


Output Result:
Copy CodeThe code is as follows:


==============================
Head=> Bookone
name=> python Check
Number=> 001
Page=> 200
==============================
Head=> Booktwo
Name=> python Learn
Number=> 002
Page=> 300
==============================
  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.