Xml.dom article
The DOM is the abbreviation for the document Object model, and the high-level tree representation of XML documents. The model is not just for Python, but a generic XML model. The Python DOM package is built on SAX and is included in the standard XML support for Python 2.0.
A brief introduction of Xml.dom
1, the Main method:
Minidom.parse (filename): Load read XML file
Doc.documentelement: Getting an XML Document object
Node.getattribute (AttributeName): Gets the XML node property value
Node.getelementsbytagname (TagName): Gets the collection of XML node objects
Node.childnodes: Returns a list of child nodes.
Node.childnodes[index].nodevalue: Gets the XML node value
Node.firstchild: Access to the first node, equivalent to Pagexml.childnodes[0]
Returns the text of the node's XML representation:
doc = minidom.parse (filename)
Doc.toxml (' UTF-8 ')
To access element properties:
node.attributes["id"]
A.name #就是上面的 "id"
A.value #属性的值
2. Examples and explanations
Example 1: File name: Book.xml
Copy the Code code as follows:
Book message
Bookone
Python check
001
200
Booktwo
Python Learn
002
300
(1) Creating DOM objects
Copy the Code code as follows:
Import Xml.dom.minidom
Dom1=xml.dom.minidom.parse (' Book.xml ')
(2) Get root byte
Root=dom1.documentelement #这里得到的是根节点
Print Root.nodename, ', ', Root.nodevalue, ', ', Root.nodetype
The returned result is:
Info, None, 1
which
Info refers to the name of the root node root.nodename
None refers to the value of the root node Root.nodevalue
1 refers to the root node type Root.nodetype, and more node types are listed as follows:
NodeType |
Named Constant |
1 |
Element_node |
2 |
Attribute_node |
3 |
Text_node |
4 |
Cdata_section_node |
5 |
Entity_reference_node |
6 |
Entity_node |
7 |
Processing_instruction_node |
8 |
Comment_node |
9 |
Document_node |
10 |
Document_type_node |
11 |
Document_fragment_node |
12 |
Notation_node |
(3) child element, child node access
A, return to the root child node list
Copy the Code code as follows:
Import Xml.dom.minidom
Dom1=xml.dom.minidom.parse (' Book.xml ')
Root=dom1.documentelement
#print root.nodename, ', ', Root.nodevalue, ', ', Root.nodetype
Print Root.childnodes
The result of the operation is:
[ , , , , , , ]
b, get the XML node value, such as return the second child node under the root node intro the value and name, add the following sentence
Copy the Code code as follows:
Print Root.childnodes[1].nodename,root.childnodes[1].nodevalue
The result of the operation is:
Intro None
C, access to the first node
Copy the Code code as follows:
Print Root.firstChild.nodeName
The result of the operation is:
#text
D, get the value of the element name already known, if you want to get intro after the book message can use the following method:
Copy the Code code as follows:
Import Xml.dom.minidom
Dom1=xml.dom.minidom.parse (' Book.xml ')
Root=dom1.documentelement
#print root.nodename, ', ', Root.nodevalue, ', ', Root.nodetype
Node= root.getelementsbytagname (' intro ') [0]
For node in Node.childnodes:
If Node.nodetype in (node. Text_node,node. Cdata_section_node):
Print Node.data
The disadvantage of this approach is the need to judge the type, it is not very convenient to use. The operating result is:
Book Message
Second, XML parsing
Parsing the XML above
The code for Method 1 is as follows:
Copy the Code code as follows:
#@ Xiao Wu Yi Http://www.cnblogs.com/xiaowuyi
#xml parsing
Import Xml.dom.minidom
Dom1=xml.dom.minidom.parse (' Book.xml ')
Root=dom1.documentelement
book={}
Booknode=root.getelementsbytagname (' list ')
For Booklist in Booknode:
print ' = ' *20
print ' ID: ' +booklist.getattribute (' id ')
For nodelist in Booklist.childnodes:
If Nodelist.nodetype ==1:
Print nodelist.nodename+ ': ',
For node in Nodelist.childnodes:
Print Node.data
The result of the operation is:
====================
id:001
Head:bookone
Name:python Check
number:001
page:200
====================
id:002
Head:booktwo
Name:python Learn
number:002
page:300
Method Two:
Code:
Copy the Code code as follows:
#@ Xiao Wu Yi Http://www.cnblogs.com/xiaowuyi
#xml parsing
Import Xml.dom.minidom
Dom1=xml.dom.minidom.parse (' Book.xml ')
Root=dom1.documentelement
book={}
Booknode=root.getelementsbytagname (' list ')
For Booklist in Booknode:
print ' = ' *20
print ' ID: ' +booklist.getattribute (' id ')
print ' head: ' +booklist.getelementsbytagname (' head ') [0].childnodes[0].nodevalue.strip ()
print ' Name: ' +booklist.getelementsbytagname (' name ') [0].childnodes[0].nodevalue.strip ()
print ' Number: ' +booklist.getelementsbytagname (' number ') [0].childnodes[0].nodevalue.strip ()
print ' page: ' +booklist.getelementsbytagname (' page ') [0].childnodes[0].nodevalue.strip ()
The result is the same as method one. Comparing the above two methods, the method is based on the XML tree structure of a number of cycles, less readable than method two, the method directly to each node operation, more clear. For a more method call, you can use a list plus a dictionary for storage, see Method 3:
Copy the Code code as follows:
#@ Xiao Wu Yi Http://www.cnblogs.com/xiaowuyi
#xml parsing
Import Xml.dom.minidom
Dom1=xml.dom.minidom.parse (' Book.xml ')
Root=dom1.documentelement
Book=[]
Booknode=root.getelementsbytagname (' list ')
For Booklist in Booknode:
bookdict={}
bookdict[' id ']=booklist.getattribute (' id ')
bookdict[' head ']=booklist.getelementsbytagname (' head ') [0].childnodes[0].nodevalue.strip ()
bookdict[' name ']=booklist.getelementsbytagname (' name ') [0].childnodes[0].nodevalue.strip ()
bookdict[' number ']=booklist.getelementsbytagname (' number ') [0].childnodes[0].nodevalue.strip ()
bookdict[' page ']=booklist.getelementsbytagname (' page ') [0].childnodes[0].nodevalue.strip ()
Book.append (bookdict)
Print book
The result of the operation is:
[{' Head ': U ' bookone ', ' page ': U ' + ', ' number ': U ' 001 ', ' id ': U ' 001 ', ' name ': U ' python check '}, {' head ': U ' booktwo ', ' page ': U ', ' number ': U ' 002 ', ' id ': U ' 002 ', ' name ': U ' python learn '}
The list contains two dictionaries.
III. building an XML file
Here, using the result of method three, create an XML file.
Copy the Code code as follows:
#-*-coding:cp936-*-
#@ Xiao Wu Yi Http://www.cnblogs.com/xiaowuyi
#xml Create
Import Xml.dom
def create_element (doc,tag,attr):
#创建一个元素节点
Elementnode=doc.createelement (TAG)
#创建一个文本节点
Textnode=doc.createtextnode (attr)
#将文本节点作为元素节点的子节点
Elementnode.appendchild (Textnode)
Return Elementnode
Dom1=xml.dom.getdomimplementation () #创建文档对象, document objects are used to create various nodes.
Doc=dom1.createdocument (None, "info", none)
Top_element = doc.documentelement# Get root node
books=[{' head ': U ' bookone ', ' page ': U ' $ ', ' number ': U ' 001 ', ' id ': U ' 001 ', ' name ': U ' python check '}, {' head ': U ' booktwo ', ' Page ': U ', ' number ': U ' 002 ', ' id ': U ' 002 ', ' name ': U ' python learn '}
For book in books:
Snode=doc.createelement (' list ')
Snode.setattribute (' id ', str (book[' ID '))
Headnode=create_element (Doc, ' head ', book[' head ')
Namenode=create_element (Doc, ' name ', book[' name ')
Numbernode=create_element (Doc, ' number ', book[' number ')
Pagenode=create_element (Doc, ' page ', book[' page ')
Snode.appendchild (Headnode)
Snode.appendchild (NameNode)
Snode.appendchild (Pagenode)
Top_element.appendchild (SNode) # Add the traversed node to the root node
Xmlfile=open (' Bookdate.xml ', ' W ')
Doc.writexml (xmlfile,addindent= ", newl= ' \ n ', encoding= ' utf-8 ')
Xmlfile.close ()
After running, generate the Bookdate.xml file, which is the same as book.xml.
Xml.etree.ElementTree article
Example 1 is still used to parse the XML.
1. Loading XML
Method One: Load the file directly
Copy the Code code as follows:
Import Xml.etree.ElementTree
Root=xml.etree.elementtree.parse (' Book.xml ')
Method Two: Load the specified string
Copy the Code code as follows:
Import Xml.etree.ElementTree
Root = xml.etree.ElementTree.fromstring (xmltext) Here xmltext is the specified string.
2. Get the Node
Method one uses the Getiterator method to get the specified node
Book_node=root.getiterator ("list")
Method two uses the GetChildren method to obtain the child node, as in Example 1, to get the list under the Face node head value:
Copy the Code code as follows:
#@ Xiao Wu Yi http://www.cnblogs.com/xiaowuyiimport xml.etree.ElementTree
Root=xml.etree.elementtree.parse (' Book.xml ')
Book_node=root.getiterator ("list")
For node in Book_node:
Book_node_child=node.getchildren () [0]
Print book_node_child.tag+ ': ' +book_node_child.text
The result of the operation is:
Head:bookone
Head:booktwo
Method three using the Find and FindAll methods
The Find method finds the first node specified:
Copy the Code code as follows:
#-*-coding:cp936-*-
#@ Xiao Wu Yi
Import Xml.etree.ElementTree
Root=xml.etree.elementtree.parse (' Book.xml ')
Book_find=root.find (' list ')
For note in Book_find:
Print note.tag+ ': ' +note.text
Operation Result:
Head:bookone
Name:python Check
number:001
page:200
The FindAll method will find all nodes specified:
Copy the Code code as follows:
#-*-coding:cp936-*-
#@ Xiao Wu Yi
Import Xml.etree.ElementTree
Root=xml.etree.elementtree.parse (' Book.xml ')
Book=root.findall (' list ')
For book_list in book:
For note in Book_list:
Print note.tag+ ': ' +note.text
Operation Result:
Head:bookone
Name:python Check
number:001
page:200
Head:booktwo
Name:python Learn
number:002
page:300
3, an example of book.xml analysis
Copy the Code code as follows:
#-*-coding:cp936-*-
#@ Xiao Wu Yi
Import Xml.etree.ElementTree
Root=xml.etree.elementtree.parse (' Book.xml ')
Book=root.findall (' list ')
For book_list in book:
print ' = ' *20
If Book_list.attrib.has_key (' ID '):
Print "ID:" +book_list.attrib[' ID ')
For note in Book_list:
Print note.tag+ ': ' +note.text
print ' = ' *20
The result of the operation is:
====================
id:001
Head:bookone
Name:python Check
number:001
page:200
====================
id:002
Head:booktwo
Name:python Learn
number:002
page:300
====================
Attention:
When you want to get the property value, such as List id= ' 001 ', use the Attrib method.
When you want to get the node value, such as Bookone in Bookone, use the text method.
Use the tag method when you want to get the node name.