Python Network Programming Learning Notes (eight): XML generation and parsing (DOM, ElementTree)

Python Network Programming Learning Notes (eight): XML generation and parsing (DOM, ElementTree) _python

Last Update:2017-01-19 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Xml.dom article

DOM is the abbreviation for document Object model, an advanced tree representation of an XML document. The model is not just for Python, but for a generic XML model. Python's DOM packages are built on SAX and are included in the standard XML support of Python 2.0.

A brief introduction of Xml.dom

1, the main methods:

Minidom.parse (filename): Loading read XML file
Doc.documentelement: Getting XML Document objects
Node.getattribute (attributename): Getting XML node property values
Node.getelementsbytagname (TagName): Getting the collection of XML node objects
Node.childnodes: Returns a list of child nodes.
Node.childnodes[index].nodevalue: Getting XML node values
Node.firstchild: Accessing the first node, equivalent to Pagexml.childnodes[0]
Returns the text of the XML representation of the node nodes:
doc = minidom.parse (filename)
Doc.toxml (' UTF-8 ')

To access element properties:

node.attributes["id"]
A.name #就是上面的 "id"
A.value #属性的值

2, the example explains

Example 1: FileName: book.xml

Copy Code code as follows:

<?xml version= "1.0" encoding= "Utf-8"?>
<info>
<intro>book message</intro>
<list id= ' 001 ' >
<name>python check</name>
<number>001</number>
<page>200</page>
</list>

<list id= ' 002 ' >
<name>python learn</name>
<number>002</number>
<page>300</page>
</list>

</info>

(1) Create a DOM object

Copy Code code as follows:

Import Xml.dom.minidom
Dom1=xml.dom.minidom.parse (' Book.xml ')

(2) Get the root byte

Root=dom1.documentelement #这里得到的是根节点
Print Root.nodename, ', ', Root.nodevalue, ', ', Root.nodetype

The result returned is:

Info, None, 1

which

Info refers to the name of the root node root.nodename
None refers to the value of the root node Root.nodevalue

1 refers to the type root.nodetype of the root node, and more node types are as follows:

NodeType	Named Constant
1	Element_node
2	Attribute_node
3	Text_node
4	Cdata_section_node
5	Entity_reference_node
6	Entity_node
7	Processing_instruction_node
8	Comment_node
9	Document_node
10	Document_type_node
11	Document_fragment_node
12	Notation_node

(3) Child-element, child-node access

A, return the root node list

Copy Code code as follows:

Import Xml.dom.minidom
Dom1=xml.dom.minidom.parse (' Book.xml ')
Root=dom1.documentelement
#print root.nodename, ', ', Root.nodevalue, ', ', Root.nodetype
Print Root.childnodes

The results of the operation are:

[<dom text node "U ' \ n '" ", <dom Element:intro at 0x124ef58>, <dom text node" U ' \ n ' "", <dom Ele Ment:list at 0x1254058>, <dom text node "U ' \ n" ", <dom element:list at 0x1254418>, <dom text nod E "u ' n ' o '" ";]

b, get the XML node value, such as returning the value and name of the second child node under the root node intro, add the following sentence

Copy Code code as follows:

Print Root.childnodes[1].nodename,root.childnodes[1].nodevalue

The results of the operation are:

Intro None

C, access to the first node

Copy Code code as follows:

Print Root.firstChild.nodeName

The results of the operation are:

#text

D, get the value of the name of the element you already know, such as the following method when you want to get the intro book message:

Copy Code code as follows:

Import Xml.dom.minidom
Dom1=xml.dom.minidom.parse (' Book.xml ')
Root=dom1.documentelement
#print root.nodename, ', ', Root.nodevalue, ', ', Root.nodetype
Node= root.getelementsbytagname (' intro ') [0]
For node in Node.childnodes:
If Node.nodetype in (node. Text_node,node. Cdata_section_node):
Print Node.data

The disadvantage of this approach is the need to judge the type, which is not very convenient to use. The results of the operation are:

Book Message

Second, XML parsing

Parsing the XML above

Method 1 code is as follows:

Copy Code code as follows:

#@ Small Five righteousness Http://www.cnblogs.com/xiaowuyi
#xml parsing

The results of the operation are:

====================
id:001
Head:bookone
Name:python Check
number:001
page:200
====================
id:002
Head:booktwo
Name:python Learn
number:002
page:300

Method Two:

Code:

Copy Code code as follows:

#@ Small Five righteousness Http://www.cnblogs.com/xiaowuyi
#xml parsing

Import Xml.dom.minidom
Dom1=xml.dom.minidom.parse (' Book.xml ')
Root=dom1.documentelement
book={}
Booknode=root.getelementsbytagname (' list ')
For Booklist in Booknode:
print ' = ' *20
print ' ID: ' +booklist.getattribute (' id ')
print ' head: ' +booklist.getelementsbytagname (' head ') [0].childnodes[0].nodevalue.strip ()
print ' Name: ' +booklist.getelementsbytagname (' name ') [0].childnodes[0].nodevalue.strip ()
print ' Number: ' +booklist.getelementsbytagname (' number ') [0].childnodes[0].nodevalue.strip ()
print ' page: ' +booklist.getelementsbytagname (' page ') [0].childnodes[0].nodevalue.strip ()

Run the same result as method one. Compared with the above two methods, a method based on the tree structure of XML for many cycles, the readability of the method two, the method directly to each node operation, clearer. For more procedural calls, you can use a list plus a dictionary for storage, see Method 3:

Copy Code code as follows:

#@ Small Five righteousness Http://www.cnblogs.com/xiaowuyi
#xml parsing
Import Xml.dom.minidom
Dom1=xml.dom.minidom.parse (' Book.xml ')
Root=dom1.documentelement
Book=[]
Booknode=root.getelementsbytagname (' list ')
For Booklist in Booknode:
bookdict={}
bookdict[' id ']=booklist.getattribute (' id ')
bookdict[' head ']=booklist.getelementsbytagname (' head ') [0].childnodes[0].nodevalue.strip ()
bookdict[' name ']=booklist.getelementsbytagname (' name ') [0].childnodes[0].nodevalue.strip ()
bookdict[' number ']=booklist.getelementsbytagname (' number ') [0].childnodes[0].nodevalue.strip ()
bookdict[' page ']=booklist.getelementsbytagname (' page ') [0].childnodes[0].nodevalue.strip ()
Book.append (bookdict)
Print book

The results of the operation are:

[{' Head ': U ' bookone ', ' page ': U ', ' number ': U ' 001 ', ' id ': U ' 001 ', ' name ': U ' python check '}, {' head ': U ' booktwo ', ' page ': U ', ' number ': U ' 002 ', ' id ': U ' 002 ', ' name ': U ' python learn '}]

The list contains two dictionaries.

Iii. Creating XML files
This creates an XML file with the result of method three.

Copy Code code as follows:

#-*-coding:cp936-*-
#@ Small Five righteousness Http://www.cnblogs.com/xiaowuyi
#xml Create

Import Xml.dom
def create_element (doc,tag,attr):
#创建一个元素节点
Elementnode=doc.createelement (TAG)
#创建一个文本节点
Textnode=doc.createtextnode (attr)
#将文本节点作为元素节点的子节点
Elementnode.appendchild (Textnode)
Return Elementnode

Dom1=xml.dom.getdomimplementation () #创建文档对象, document objects are used to create various nodes.
Doc=dom1.createdocument (None, "info", none)
Top_element = doc.documentelement# Get root node
books=[{' head ': U ' bookone ', ' page ': U ', ' number ': U ' 001 ', ' id ': U ' 001 ', ' name ': U ' python check '}, {' head ': U ' booktwo ', ' Page ': U ', ' number ': U ' 002 ', ' id ': U ' 002 ', ' name ': U ' python learn '}]
For book in books:
Snode=doc.createelement (' list ')
Snode.setattribute (' id ', str (book[' ID '))
Headnode=create_element (Doc, ' head ', book[' head '])
Namenode=create_element (Doc, ' name ', book[' name ')]
Numbernode=create_element (Doc, ' number ', book[' number '])
Pagenode=create_element (Doc, ' page ', book[' page '])
Snode.appendchild (Headnode)
Snode.appendchild (Namenode)
Snode.appendchild (Pagenode)
Top_element.appendchild (Snode) # Adds the traversed node to the root node
Xmlfile=open (' Bookdate.xml ', ' W ')
Doc.writexml (xmlfile,addindent= ' *4, newl= ' \ n ', encoding= ' utf-8 ')
Xmlfile.close ()

The Bookdate.xml file is generated after running and is the same as book.xml.

Xml.etree.ElementTree article

Still use the example of Example 1 to parse and analyze XML.

1. Loading XML

Method One: Loading files directly

Copy Code code as follows:

Import Xml.etree.ElementTree
Root=xml.etree.elementtree.parse (' Book.xml ')

Method Two: Load the specified string

Copy Code code as follows:

Import Xml.etree.ElementTree
Root = xml.etree.ElementTree.fromstring (xmltext) Here xmltext is the specified string.

2, get the node

Methods a Getiterator method is used to get the specified node

Book_node=root.getiterator ("list")

Method two uses the GetChildren method to obtain the child node, in Example 1, obtains the face node head's value under the list:

Copy Code code as follows:

#@ Small Five righteousness http://www.cnblogs.com/xiaowuyiimport Xml.etree.ElementTree
Root=xml.etree.elementtree.parse (' Book.xml ')
Book_node=root.getiterator ("list")
For node in Book_node:
Book_node_child=node.getchildren () [0]
Print book_node_child.tag+ ': ' +book_node_child.text

The results of the operation are:

Head:bookone
Head:booktwo

Method three uses Find and FindAll method

The Find method finds the first node specified:

Copy Code code as follows:

#-*-coding:cp936-*-
#@ Small Five Righteousness
Import Xml.etree.ElementTree
Root=xml.etree.elementtree.parse (' Book.xml ')
Book_find=root.find (' list ')
For Book_find:
Print note.tag+ ': ' +note.text

Run Result:

Head:bookone
Name:python Check
number:001
page:200

The FindAll method will find all the nodes specified:

Copy Code code as follows:

Run Result:

Head:bookone
Name:python Check
number:001
page:200
Head:booktwo
Name:python Learn
number:002
page:300

3. An example of book.xml analysis

Copy Code code as follows:

#-*-coding:cp936-*-
#@ Small Five Righteousness
Import Xml.etree.ElementTree
Root=xml.etree.elementtree.parse (' Book.xml ')
Book=root.findall (' list ')
For book_list in book:
print ' = ' *20
If Book_list.attrib.has_key (' ID '):
Print ID: "+book_list.attrib[' id"]
For book_list:
Print note.tag+ ': ' +note.text
print ' = ' *20

The results of the operation are:

====================
id:001
Head:bookone
Name:python Check
number:001
page:200
====================
id:002
Head:booktwo
Name:python Learn
number:002
page:300
====================

Note:
When you want to get the property value, such as List id= ' 001 ', use the Attrib method.
When you want to get the value of a node, such as the Bookone in When you want to get a section roll name, use the tag method.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python Network Programming Learning Notes (eight): XML generation and parsing (DOM, ElementTree) _python

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support