Python network programming learning notes (8): XML generation and parsing (DOM, ElementTree)-Python tutorial

Last Update:2017-05-14 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

DOM is short for DocumentObjectModel, which is an advanced tree representation of XML documents. This model is not just for Python, but a common XML model. Python DOM packages are built based on SAX and include XML. dom in the standard xml support of Python2.0.

DOM is short for Document Object Model, which is an advanced tree representation of XML documents. This model is not just for Python, but a common XML model. The Python DOM package is built based on SAX and is included in the standard XML support of Python 2.0.

1. Introduction to xml. dom

1. main methods:

Minidom. parse (filename): load and read XML files
Doc.doc umentElement: Get the XML document object
Node. getAttribute (AttributeName): Get the XML node attribute value
Node. getElementsByTagName (TagName): Get the XML node object set
Node. childNodes: returns the list of subnodes.
Node. childNodes [index]. nodeValue: Get the XML node value.
Node. firstChild: Access the first node, equivalent to pagexml. childNodes [0]
Return the xml text of the Node:
Doc = minidom. parse (filename)
Doc. toxml ('utf-8 ')

Access element attributes:

Node. attributes ["id"]
A. name # Is the above "id"
A. value # attribute value

2. Examples

Example 1: File name: book. xml

The code is as follows:

Book message

Bookone
Python check
001
200

Booktwo
Python learn
002
300

(1) create a DOM object

The code is as follows:

Import xml. dom. minidom
Dom1 = xml. dom. minidom. parse ('Book. XML ')

(2) get the root byte

Root1_dom1.doc umentElement # The root node is obtained here.
Print root. nodeName, ',', root. nodeValue, ',', root. nodeType

The returned result is:

Info, None, 1

Where:

Info indicates the root node name root. nodeName
None indicates the root node value root. nodeValue

1 indicates the root node type root. nodeType. more node types are shown in the table below:

NodeType	Named Constant
1	ELEMENT_NODE
2	ATTRIBUTE_NODE
3	TEXT_NODE
4	CDATA_SECTION_NODE
5	ENTITY_REFERENCE_NODE
6	ENTITY_NODE
7	PROCESSING_INSTRUCTION_NODE
8	COMMENT_NODE
9	DOCUMENT_NODE
10	DOCUMENT_TYPE_NODE
11	DOCUMENT_FRAGMENT_NODE
12	NOTATION_NODE

(3) access to sub-elements and sub-nodes

A. return to the root subnode list

The code is as follows:

Import xml. dom. minidom
Dom1 = xml. dom. minidom. parse ('Book. XML ')
Root=dom1.doc umentElement
# Print root. nodeName, ',', root. nodeValue, ',', root. nodeType
Print root. childNodes

The running result is:

[ , , , , , , ]

B. obtain the XML node value. for example, return the intro value and name of the second subnode under the root node. add the following sentence:

The code is as follows:

Print root. childNodes [1]. nodeName, root. childNodes [1]. nodeValue

The running result is:

Intro None

C. access the first node

The code is as follows:

Print root. firstChild. nodeName

The running result is:

# Text

D. obtain the value of the known element name. to obtain the book message after intro, you can use the following method:

The code is as follows:

Import xml. dom. minidom
Dom1 = xml. dom. minidom. parse ('Book. XML ')
Root=dom1.doc umentElement
# Print root. nodeName, ',', root. nodeValue, ',', root. nodeType
Node = root. getElementsByTagName ('Intro') [0]
For node in node. childNodes:
If node. nodeType in (node. TEXT_NODE, node. CDATA_SECTION_NODE ):
Print node. data

The disadvantage of this method is that you need to judge the type, which is not very convenient to use. The running result is:

Book message

II. xml parsing

Parse the preceding xml

The code for Method 1 is as follows:

The code is as follows:

# @ Xiaowuyi http://www.cnblogs.com/xiaowuyi
# Xml parsing

The running result is:

================================
Id: 001
Head: bookone
Name: python check
Number: 001
Page: 200
================================
Id: 002
Head: booktwo
Name: python learn
Number: 002
Page: 300

Method 2:

Code:

The code is as follows:

# @ Xiaowuyi http://www.cnblogs.com/xiaowuyi
# Xml parsing

Import xml. dom. minidom
Dom1 = xml. dom. minidom. parse ('Book. XML ')
Root=dom1.doc umentElement
Book = {}
Booknode = root. getElementsByTagName ('list ')
For booklist in booknode:
Print '=' * 20
Print 'Id: '+ booklist. getAttribute ('id ')
Print 'head: '+ booklist. getElementsByTagName ('head') [0]. childNodes [0]. nodeValue. strip ()
Print 'name: '+ booklist. getElementsByTagName ('name') [0]. childNodes [0]. nodeValue. strip ()
Print 'number: '+ booklist. getElementsByTagName ('number') [0]. childNodes [0]. nodeValue. strip ()
Print 'page: '+ booklist. getElementsByTagName ('page') [0]. childNodes [0]. nodeValue. strip ()

The running result is the same as that of Method 1. Compared with the preceding two methods, Method 1 performs multiple cycles based on the xml tree structure, which is less readable than method 2. the method directly performs operations on each node, making it clearer. To call more methods, you can use a list and a dictionary for storage. for details, see method 3:

The code is as follows:

# @ Xiaowuyi http://www.cnblogs.com/xiaowuyi
# Xml parsing
Import xml. dom. minidom
Dom1 = xml. dom. minidom. parse ('Book. XML ')
Root=dom1.doc umentElement
Book = []
Booknode = root. getElementsByTagName ('list ')
For booklist in booknode:
Bookdict = {}
Bookdict ['id'] = booklist. getAttribute ('id ')
Bookdict ['head'] = booklist. getElementsByTagName ('head') [0]. childNodes [0]. nodeValue. strip ()
Bookdict ['name'] = booklist. getElementsByTagName ('name') [0]. childNodes [0]. nodeValue. strip ()
Bookdict ['Number'] = booklist. getElementsByTagName ('number') [0]. childNodes [0]. nodeValue. strip ()
Bookdict ['Page'] = booklist. getElementsByTagName ('page') [0]. childNodes [0]. nodeValue. strip ()
Book. append (bookdict)
Print book

The running result is:

[{'Head': u'bookone', 'page': u'000000', 'number': u'001', 'id': u'001 ', 'name': u'python check'}, {'head': u'booktwo', 'page': u'000000', 'number': u'002 ', 'id': u'002 ', 'name': u'python learn'}]

The list contains two dictionaries.

3. create an XML file
Here, we use the result obtained in method 3 to create an xml file.

The code is as follows:

#-*-Coding: cp936 -*-
# @ Xiaowuyi http://www.cnblogs.com/xiaowuyi
# Xml creation

Import xml. dom
Def create_element (doc, tag, attr ):
# Create an element node
ElementNode = doc. createElement (tag)
# Create a text node
TextNode = doc. createTextNode (attr)
# Use a text node as a subnode of an element node
ElementNode. appendChild (textNode)
Return elementNode

Dom1 = xml. dom. getDOMImplementation () # Create a document object. the document object is used to create various nodes.
Doc = dom1.createDocument (None, "info", None)
Top_element = doc.doc umentElement # obtain the root node.
Books = [{'head': u'bookone', 'page': u'000000', 'number': u'001', 'id': u'001 ', 'name': u'python check'}, {'head': u'booktwo', 'page': u'000000', 'number': u'002 ', 'id': u'002 ', 'name': u'python learn'}]
For book in books:
SNode = doc. createElement ('list ')
SNode. setAttribute ('id', str (book ['id'])
HeadNode = create_element (doc, 'head', book ['head'])
NameNode = create_element (doc, 'name', book ['name'])
NumberNode = create_element (doc, 'number', book ['Number'])
PageNode = create_element (doc, 'page', book ['Page'])
SNode. appendChild (headNode)
SNode. appendChild (nameNode)
SNode. appendChild (pageNode)
Top_element.appendChild (sNode) # add the traversal node to the root node
Xmlfile = open ('bookdate. XML', 'w ')
Doc. writexml (xmlfile, addindent = ''' * 4, newl = '\ n', encoding = 'utf-8 ')
Xmlfile. close ()

Generate the bookdate. xml file after running, which is the same as book. xml.

Xml. etree. ElementTree

Still use the example in example 1 to parse and analyze the xml.

1. load XML

Method 1: directly load files

The code is as follows:

Import xml. etree. ElementTree
Root = xml. etree. ElementTree. parse ('Book. XML ')

Method 2: load the specified string

The code is as follows:

Import xml. etree. ElementTree
Root = xml. etree. ElementTree. fromstring (xmltext) xmltext is the specified string.

2. get nodes

Method 1 use the getiterator method to obtain the specified node

Book_node = root. getiterator ("list ")

Method 2: Use the getchildren method to obtain the subnode. in example 1, you need to obtain the value of the head of the subnode under the list:

The code is as follows:

# @ Small five-definition http://www.cnblogs.com/xiaowuyiimport xml. etree. ElementTree
Root = xml. etree. ElementTree. parse ('Book. XML ')
Book_node = root. getiterator ("list ")
For node in book_node:
Book_node_child = node. getchildren () [0]
Print book_node_child.tag + ':' + book_node_child.text

The running result is:

Head: bookone
Head: booktwo

Method 3 use the find and findall methods

Find method to find the specified first node:

The code is as follows:

#-*-Coding: cp936 -*-
# @ Xiao Wuyi
Import xml. etree. ElementTree
Root = xml. etree. ElementTree. parse ('Book. XML ')
Book_find = root. find ('list ')
For note in book_find:
Print note. tag + ':' + note. text

Running result:

Head: bookone
Name: python check
Number: 001
Page: 200

The findall method will find all the specified nodes:

The code is as follows:

#-*-Coding: cp936 -*-
# @ Xiao Wuyi
Import xml. etree. ElementTree
Root = xml. etree. ElementTree. parse ('Book. XML ')
Book = root. findall ('list ')
For book_list in book:
For note in book_list:
Print note. tag + ':' + note. text

Running result:

Head: bookone
Name: python check
Number: 001
Page: 200
Head: booktwo
Name: python learn
Number: 002
Page: 300

3. example of parsing book. xml

The code is as follows:

#-*-Coding: cp936 -*-
# @ Xiao Wuyi
Import xml. etree. ElementTree
Root = xml. etree. ElementTree. parse ('Book. XML ')
Book = root. findall ('list ')
For book_list in book:
Print '=' * 20
If book_list.attrib.has_key ('id '):
Print "id:" + book_list.attrib ['id']
For note in book_list:
Print note. tag + ':' + note. text
Print '=' * 20

The running result is:

================================
Id: 001
Head: bookone
Name: python check
Number: 001
Page: 200
================================
Id: 002
Head: booktwo
Name: python learn
Number: 002
Page: 300
================================

Note:
When you want to obtain the attribute value, such as list id = '001', use the attrib method.
When you want to obtain the node value, suchBookoneUse the text method for bookone in.
Use the tag method to obtain the node name.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python network programming learning notes (8): XML generation and parsing (DOM, ElementTree)-Python tutorial

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Python network programming learning notes (8): XML generation and parsing (DOM, ElementTree)-Python tutorial

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support