Python Network Programming Learning Note (eight): XML generation and parsing (DOM, ElementTree)

Source: Internet
Author: User
Xml.dom article

The DOM is the abbreviation for the document Object model, and the high-level tree representation of XML documents. The model is not just for Python, but a generic XML model. The Python DOM package is built on SAX and is included in the standard XML support for Python 2.0.

A brief introduction of Xml.dom

1, the Main method:

Minidom.parse (filename): Load read XML file
Doc.documentelement: Getting an XML Document object
Node.getattribute (AttributeName): Gets the XML node property value
Node.getelementsbytagname (TagName): Gets the collection of XML node objects
Node.childnodes: Returns a list of child nodes.
Node.childnodes[index].nodevalue: Gets the XML node value
Node.firstchild: Access to the first node, equivalent to Pagexml.childnodes[0]
Returns the text of the node's XML representation:
doc = minidom.parse (filename)
Doc.toxml (' UTF-8 ')

To access element properties:

node.attributes["id"]
A.name #就是上面的 "id"
A.value #属性的值

2. Examples and explanations

Example 1: File name: Book.xml

Copy the Code code as follows:




Book message

Bookone
Python check
001
200


Booktwo
Python Learn
002
300


(1) Creating DOM objects

Copy the Code code as follows:


Import Xml.dom.minidom
Dom1=xml.dom.minidom.parse (' Book.xml ')

(2) Get root byte

Root=dom1.documentelement #这里得到的是根节点
Print Root.nodename, ', ', Root.nodevalue, ', ', Root.nodetype

The returned result is:

Info, None, 1

which

Info refers to the name of the root node root.nodename
None refers to the value of the root node Root.nodevalue

1 refers to the root node type Root.nodetype, and more node types are listed as follows:

NodeType

Named Constant

1

Element_node

2

Attribute_node

3

Text_node

4

Cdata_section_node

5

Entity_reference_node

6

Entity_node

7

Processing_instruction_node

8

Comment_node

9

Document_node

10

Document_type_node

11

Document_fragment_node

12

Notation_node


(3) child element, child node access

A, return to the root child node list

Copy the Code code as follows:


Import Xml.dom.minidom
Dom1=xml.dom.minidom.parse (' Book.xml ')
Root=dom1.documentelement
#print root.nodename, ', ', Root.nodevalue, ', ', Root.nodetype
Print Root.childnodes

The result of the operation is:

[ , , , , , , ]

b, get the XML node value, such as return the second child node under the root node intro the value and name, add the following sentence

Copy the Code code as follows:


Print Root.childnodes[1].nodename,root.childnodes[1].nodevalue

The result of the operation is:

Intro None

C, access to the first node

Copy the Code code as follows:


Print Root.firstChild.nodeName

The result of the operation is:

#text

D, get the value of the element name already known, if you want to get intro after the book message can use the following method:

Copy the Code code as follows:


Import Xml.dom.minidom
Dom1=xml.dom.minidom.parse (' Book.xml ')
Root=dom1.documentelement
#print root.nodename, ', ', Root.nodevalue, ', ', Root.nodetype
Node= root.getelementsbytagname (' intro ') [0]
For node in Node.childnodes:
If Node.nodetype in (node. Text_node,node. Cdata_section_node):
Print Node.data

The disadvantage of this approach is the need to judge the type, it is not very convenient to use. The operating result is:

Book Message

Second, XML parsing

Parsing the XML above

The code for Method 1 is as follows:

Copy the Code code as follows:


#@ Xiao Wu Yi Http://www.cnblogs.com/xiaowuyi
#xml parsing

Import Xml.dom.minidom
Dom1=xml.dom.minidom.parse (' Book.xml ')
Root=dom1.documentelement
book={}
Booknode=root.getelementsbytagname (' list ')
For Booklist in Booknode:
print ' = ' *20
print ' ID: ' +booklist.getattribute (' id ')
For nodelist in Booklist.childnodes:
If Nodelist.nodetype ==1:
Print nodelist.nodename+ ': ',
For node in Nodelist.childnodes:
Print Node.data

The result of the operation is:

====================
id:001
Head:bookone
Name:python Check
number:001
page:200
====================
id:002
Head:booktwo
Name:python Learn
number:002
page:300

Method Two:

Code:

Copy the Code code as follows:


#@ Xiao Wu Yi Http://www.cnblogs.com/xiaowuyi
#xml parsing

Import Xml.dom.minidom
Dom1=xml.dom.minidom.parse (' Book.xml ')
Root=dom1.documentelement
book={}
Booknode=root.getelementsbytagname (' list ')
For Booklist in Booknode:
print ' = ' *20
print ' ID: ' +booklist.getattribute (' id ')
print ' head: ' +booklist.getelementsbytagname (' head ') [0].childnodes[0].nodevalue.strip ()
print ' Name: ' +booklist.getelementsbytagname (' name ') [0].childnodes[0].nodevalue.strip ()
print ' Number: ' +booklist.getelementsbytagname (' number ') [0].childnodes[0].nodevalue.strip ()
print ' page: ' +booklist.getelementsbytagname (' page ') [0].childnodes[0].nodevalue.strip ()

The result is the same as method one. Comparing the above two methods, the method is based on the XML tree structure of a number of cycles, less readable than method two, the method directly to each node operation, more clear. For a more method call, you can use a list plus a dictionary for storage, see Method 3:

Copy the Code code as follows:


#@ Xiao Wu Yi Http://www.cnblogs.com/xiaowuyi
#xml parsing
Import Xml.dom.minidom
Dom1=xml.dom.minidom.parse (' Book.xml ')
Root=dom1.documentelement
Book=[]
Booknode=root.getelementsbytagname (' list ')
For Booklist in Booknode:
bookdict={}
bookdict[' id ']=booklist.getattribute (' id ')
bookdict[' head ']=booklist.getelementsbytagname (' head ') [0].childnodes[0].nodevalue.strip ()
bookdict[' name ']=booklist.getelementsbytagname (' name ') [0].childnodes[0].nodevalue.strip ()
bookdict[' number ']=booklist.getelementsbytagname (' number ') [0].childnodes[0].nodevalue.strip ()
bookdict[' page ']=booklist.getelementsbytagname (' page ') [0].childnodes[0].nodevalue.strip ()
Book.append (bookdict)
Print book

The result of the operation is:

[{' Head ': U ' bookone ', ' page ': U ' + ', ' number ': U ' 001 ', ' id ': U ' 001 ', ' name ': U ' python check '}, {' head ': U ' booktwo ', ' page ': U ', ' number ': U ' 002 ', ' id ': U ' 002 ', ' name ': U ' python learn '}

The list contains two dictionaries.

III. building an XML file
Here, using the result of method three, create an XML file.

Copy the Code code as follows:


#-*-coding:cp936-*-
#@ Xiao Wu Yi Http://www.cnblogs.com/xiaowuyi
#xml Create

Import Xml.dom
def create_element (doc,tag,attr):
#创建一个元素节点
Elementnode=doc.createelement (TAG)
#创建一个文本节点
Textnode=doc.createtextnode (attr)
#将文本节点作为元素节点的子节点
Elementnode.appendchild (Textnode)
Return Elementnode

Dom1=xml.dom.getdomimplementation () #创建文档对象, document objects are used to create various nodes.
Doc=dom1.createdocument (None, "info", none)
Top_element = doc.documentelement# Get root node
books=[{' head ': U ' bookone ', ' page ': U ' $ ', ' number ': U ' 001 ', ' id ': U ' 001 ', ' name ': U ' python check '}, {' head ': U ' booktwo ', ' Page ': U ', ' number ': U ' 002 ', ' id ': U ' 002 ', ' name ': U ' python learn '}
For book in books:
Snode=doc.createelement (' list ')
Snode.setattribute (' id ', str (book[' ID '))
Headnode=create_element (Doc, ' head ', book[' head ')
Namenode=create_element (Doc, ' name ', book[' name ')
Numbernode=create_element (Doc, ' number ', book[' number ')
Pagenode=create_element (Doc, ' page ', book[' page ')
Snode.appendchild (Headnode)
Snode.appendchild (NameNode)
Snode.appendchild (Pagenode)
Top_element.appendchild (SNode) # Add the traversed node to the root node
Xmlfile=open (' Bookdate.xml ', ' W ')
Doc.writexml (xmlfile,addindent= ", newl= ' \ n ', encoding= ' utf-8 ')
Xmlfile.close ()

After running, generate the Bookdate.xml file, which is the same as book.xml.

Xml.etree.ElementTree article

Example 1 is still used to parse the XML.

1. Loading XML

Method One: Load the file directly

Copy the Code code as follows:


Import Xml.etree.ElementTree
Root=xml.etree.elementtree.parse (' Book.xml ')

Method Two: Load the specified string

Copy the Code code as follows:


Import Xml.etree.ElementTree
Root = xml.etree.ElementTree.fromstring (xmltext) Here xmltext is the specified string.

2. Get the Node

Method one uses the Getiterator method to get the specified node

Book_node=root.getiterator ("list")

Method two uses the GetChildren method to obtain the child node, as in Example 1, to get the list under the Face node head value:

Copy the Code code as follows:


#@ Xiao Wu Yi http://www.cnblogs.com/xiaowuyiimport xml.etree.ElementTree
Root=xml.etree.elementtree.parse (' Book.xml ')
Book_node=root.getiterator ("list")
For node in Book_node:
Book_node_child=node.getchildren () [0]
Print book_node_child.tag+ ': ' +book_node_child.text

The result of the operation is:

Head:bookone
Head:booktwo

Method three using the Find and FindAll methods

The Find method finds the first node specified:

Copy the Code code as follows:


#-*-coding:cp936-*-
#@ Xiao Wu Yi
Import Xml.etree.ElementTree
Root=xml.etree.elementtree.parse (' Book.xml ')
Book_find=root.find (' list ')
For note in Book_find:
Print note.tag+ ': ' +note.text

Operation Result:

Head:bookone
Name:python Check
number:001
page:200

The FindAll method will find all nodes specified:

Copy the Code code as follows:


#-*-coding:cp936-*-
#@ Xiao Wu Yi
Import Xml.etree.ElementTree
Root=xml.etree.elementtree.parse (' Book.xml ')
Book=root.findall (' list ')
For book_list in book:
For note in Book_list:
Print note.tag+ ': ' +note.text

Operation Result:

Head:bookone
Name:python Check
number:001
page:200
Head:booktwo
Name:python Learn
number:002
page:300

3, an example of book.xml analysis

Copy the Code code as follows:


#-*-coding:cp936-*-
#@ Xiao Wu Yi
Import Xml.etree.ElementTree
Root=xml.etree.elementtree.parse (' Book.xml ')
Book=root.findall (' list ')
For book_list in book:
print ' = ' *20
If Book_list.attrib.has_key (' ID '):
Print "ID:" +book_list.attrib[' ID ')
For note in Book_list:
Print note.tag+ ': ' +note.text
print ' = ' *20

The result of the operation is:

====================
id:001
Head:bookone
Name:python Check
number:001
page:200
====================
id:002
Head:booktwo
Name:python Learn
number:002
page:300
====================

Attention:
When you want to get the property value, such as List id= ' 001 ', use the Attrib method.
When you want to get the node value, such as Bookone in Bookone, use the text method.
Use the tag method when you want to get the node name.

  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.