Xml.dom article
DOM is the abbreviation for document Object model, an advanced tree representation of an XML document. The model is not just for Python, but for a generic XML model. Python's DOM packages are built on SAX and are included in the standard XML support of Python 2.0.
A brief introduction of Xml.dom
1, the main methods:
Minidom.parse (filename): Loading read XML file
Doc.documentelement: Getting XML Document objects
Node.getattribute (attributename): Getting XML node property values
Node.getelementsbytagname (TagName): Getting the collection of XML node objects
Node.childnodes: Returns a list of child nodes.
Node.childnodes[index].nodevalue: Getting XML node values
Node.firstchild: Accessing the first node, equivalent to Pagexml.childnodes[0]
Returns the text of the XML representation of the node nodes:
doc = minidom.parse (filename)
Doc.toxml (' UTF-8 ')
To access element properties:
node.attributes["id"]
A.name #就是上面的 "id"
A.value #属性的值
2, the example explains
Example 1: FileName: book.xml
Copy Code code as follows:
<?xml version= "1.0" encoding= "Utf-8"?>
<info>
<intro>book message</intro>
<list id= ' 001 ' >
<name>python check</name>
<number>001</number>
<page>200</page>
</list>
<list id= ' 002 ' >
<name>python learn</name>
<number>002</number>
<page>300</page>
</list>
</info>
(1) Create a DOM object
Copy Code code as follows:
Import Xml.dom.minidom
Dom1=xml.dom.minidom.parse (' Book.xml ')
(2) Get the root byte
Root=dom1.documentelement #这里得到的是根节点
Print Root.nodename, ', ', Root.nodevalue, ', ', Root.nodetype
The result returned is:
Info, None, 1
which
Info refers to the name of the root node root.nodename
None refers to the value of the root node Root.nodevalue
1 refers to the type root.nodetype of the root node, and more node types are as follows:
NodeType |
Named Constant |
1 |
Element_node |
2 |
Attribute_node |
3 |
Text_node |
4 |
Cdata_section_node |
5 |
Entity_reference_node |
6 |
Entity_node |
7 |
Processing_instruction_node |
8 |
Comment_node |
9 |
Document_node |
10 |
Document_type_node |
11 |
Document_fragment_node |
12 |
Notation_node |
(3) Child-element, child-node access
A, return the root node list
Copy Code code as follows:
Import Xml.dom.minidom
Dom1=xml.dom.minidom.parse (' Book.xml ')
Root=dom1.documentelement
#print root.nodename, ', ', Root.nodevalue, ', ', Root.nodetype
Print Root.childnodes
The results of the operation are:
[<dom text node "U ' \ n '" ", <dom Element:intro at 0x124ef58>, <dom text node" U ' \ n ' "", <dom Ele Ment:list at 0x1254058>, <dom text node "U ' \ n" ", <dom element:list at 0x1254418>, <dom text nod E "u ' n ' o '" ";]
b, get the XML node value, such as returning the value and name of the second child node under the root node intro, add the following sentence
Copy Code code as follows:
Print Root.childnodes[1].nodename,root.childnodes[1].nodevalue
The results of the operation are:
Intro None
C, access to the first node
Copy Code code as follows:
Print Root.firstChild.nodeName
The results of the operation are:
#text
D, get the value of the name of the element you already know, such as the following method when you want to get the intro book message:
Copy Code code as follows:
Import Xml.dom.minidom
Dom1=xml.dom.minidom.parse (' Book.xml ')
Root=dom1.documentelement
#print root.nodename, ', ', Root.nodevalue, ', ', Root.nodetype
Node= root.getelementsbytagname (' intro ') [0]
For node in Node.childnodes:
If Node.nodetype in (node. Text_node,node. Cdata_section_node):
Print Node.data
The disadvantage of this approach is the need to judge the type, which is not very convenient to use. The results of the operation are:
Book Message
Second, XML parsing
Parsing the XML above
Method 1 code is as follows:
Copy Code code as follows:
#@ Small Five righteousness Http://www.cnblogs.com/xiaowuyi
#xml parsing
Import Xml.dom.minidom
Dom1=xml.dom.minidom.parse (' Book.xml ')
Root=dom1.documentelement
book={}
Booknode=root.getelementsbytagname (' list ')
For Booklist in Booknode:
print ' = ' *20
print ' ID: ' +booklist.getattribute (' id ')
For nodelist in Booklist.childnodes:
If Nodelist.nodetype ==1:
Print nodelist.nodename+ ': ',
For node in Nodelist.childnodes:
Print Node.data
The results of the operation are:
====================
id:001
Head:bookone
Name:python Check
number:001
page:200
====================
id:002
Head:booktwo
Name:python Learn
number:002
page:300
Method Two:
Code:
Copy Code code as follows:
#@ Small Five righteousness Http://www.cnblogs.com/xiaowuyi
#xml parsing
Import Xml.dom.minidom
Dom1=xml.dom.minidom.parse (' Book.xml ')
Root=dom1.documentelement
book={}
Booknode=root.getelementsbytagname (' list ')
For Booklist in Booknode:
print ' = ' *20
print ' ID: ' +booklist.getattribute (' id ')
print ' head: ' +booklist.getelementsbytagname (' head ') [0].childnodes[0].nodevalue.strip ()
print ' Name: ' +booklist.getelementsbytagname (' name ') [0].childnodes[0].nodevalue.strip ()
print ' Number: ' +booklist.getelementsbytagname (' number ') [0].childnodes[0].nodevalue.strip ()
print ' page: ' +booklist.getelementsbytagname (' page ') [0].childnodes[0].nodevalue.strip ()
Run the same result as method one. Compared with the above two methods, a method based on the tree structure of XML for many cycles, the readability of the method two, the method directly to each node operation, clearer. For more procedural calls, you can use a list plus a dictionary for storage, see Method 3:
Copy Code code as follows:
#@ Small Five righteousness Http://www.cnblogs.com/xiaowuyi
#xml parsing
Import Xml.dom.minidom
Dom1=xml.dom.minidom.parse (' Book.xml ')
Root=dom1.documentelement
Book=[]
Booknode=root.getelementsbytagname (' list ')
For Booklist in Booknode:
bookdict={}
bookdict[' id ']=booklist.getattribute (' id ')
bookdict[' head ']=booklist.getelementsbytagname (' head ') [0].childnodes[0].nodevalue.strip ()
bookdict[' name ']=booklist.getelementsbytagname (' name ') [0].childnodes[0].nodevalue.strip ()
bookdict[' number ']=booklist.getelementsbytagname (' number ') [0].childnodes[0].nodevalue.strip ()
bookdict[' page ']=booklist.getelementsbytagname (' page ') [0].childnodes[0].nodevalue.strip ()
Book.append (bookdict)
Print book
The results of the operation are:
[{' Head ': U ' bookone ', ' page ': U ', ' number ': U ' 001 ', ' id ': U ' 001 ', ' name ': U ' python check '}, {' head ': U ' booktwo ', ' page ': U ', ' number ': U ' 002 ', ' id ': U ' 002 ', ' name ': U ' python learn '}]
The list contains two dictionaries.
Iii. Creating XML files
This creates an XML file with the result of method three.
Copy Code code as follows:
#-*-coding:cp936-*-
#@ Small Five righteousness Http://www.cnblogs.com/xiaowuyi
#xml Create
Import Xml.dom
def create_element (doc,tag,attr):
#创建一个元素节点
Elementnode=doc.createelement (TAG)
#创建一个文本节点
Textnode=doc.createtextnode (attr)
#将文本节点作为元素节点的子节点
Elementnode.appendchild (Textnode)
Return Elementnode
Dom1=xml.dom.getdomimplementation () #创建文档对象, document objects are used to create various nodes.
Doc=dom1.createdocument (None, "info", none)
Top_element = doc.documentelement# Get root node
books=[{' head ': U ' bookone ', ' page ': U ', ' number ': U ' 001 ', ' id ': U ' 001 ', ' name ': U ' python check '}, {' head ': U ' booktwo ', ' Page ': U ', ' number ': U ' 002 ', ' id ': U ' 002 ', ' name ': U ' python learn '}]
For book in books:
Snode=doc.createelement (' list ')
Snode.setattribute (' id ', str (book[' ID '))
Headnode=create_element (Doc, ' head ', book[' head '])
Namenode=create_element (Doc, ' name ', book[' name ')]
Numbernode=create_element (Doc, ' number ', book[' number '])
Pagenode=create_element (Doc, ' page ', book[' page '])
Snode.appendchild (Headnode)
Snode.appendchild (Namenode)
Snode.appendchild (Pagenode)
Top_element.appendchild (Snode) # Adds the traversed node to the root node
Xmlfile=open (' Bookdate.xml ', ' W ')
Doc.writexml (xmlfile,addindent= ' *4, newl= ' \ n ', encoding= ' utf-8 ')
Xmlfile.close ()
The Bookdate.xml file is generated after running and is the same as book.xml.
Xml.etree.ElementTree article
Still use the example of Example 1 to parse and analyze XML.
1. Loading XML
Method One: Loading files directly
Copy Code code as follows:
Import Xml.etree.ElementTree
Root=xml.etree.elementtree.parse (' Book.xml ')
Method Two: Load the specified string
Copy Code code as follows:
Import Xml.etree.ElementTree
Root = xml.etree.ElementTree.fromstring (xmltext) Here xmltext is the specified string.
2, get the node
Methods a Getiterator method is used to get the specified node
Book_node=root.getiterator ("list")
Method two uses the GetChildren method to obtain the child node, in Example 1, obtains the face node head's value under the list:
Copy Code code as follows:
#@ Small Five righteousness http://www.cnblogs.com/xiaowuyiimport Xml.etree.ElementTree
Root=xml.etree.elementtree.parse (' Book.xml ')
Book_node=root.getiterator ("list")
For node in Book_node:
Book_node_child=node.getchildren () [0]
Print book_node_child.tag+ ': ' +book_node_child.text
The results of the operation are:
Head:bookone
Head:booktwo
Method three uses Find and FindAll method
The Find method finds the first node specified:
Copy Code code as follows:
#-*-coding:cp936-*-
#@ Small Five Righteousness
Import Xml.etree.ElementTree
Root=xml.etree.elementtree.parse (' Book.xml ')
Book_find=root.find (' list ')
For Book_find:
Print note.tag+ ': ' +note.text
Run Result:
Head:bookone
Name:python Check
number:001
page:200
The FindAll method will find all the nodes specified:
Copy Code code as follows:
#-*-coding:cp936-*-
#@ Small Five Righteousness
Import Xml.etree.ElementTree
Root=xml.etree.elementtree.parse (' Book.xml ')
Book=root.findall (' list ')
For book_list in book:
For book_list:
Print note.tag+ ': ' +note.text
Run Result:
Head:bookone
Name:python Check
number:001
page:200
Head:booktwo
Name:python Learn
number:002
page:300
3. An example of book.xml analysis
Copy Code code as follows:
#-*-coding:cp936-*-
#@ Small Five Righteousness
Import Xml.etree.ElementTree
Root=xml.etree.elementtree.parse (' Book.xml ')
Book=root.findall (' list ')
For book_list in book:
print ' = ' *20
If Book_list.attrib.has_key (' ID '):
Print ID: "+book_list.attrib[' id"]
For book_list:
Print note.tag+ ': ' +note.text
print ' = ' *20
The results of the operation are:
====================
id:001
Head:bookone
Name:python Check
number:001
page:200
====================
id:002
Head:booktwo
Name:python Learn
number:002
page:300
====================
Note:
When you want to get the property value, such as List id= ' 001 ', use the Attrib method.
When you want to get the value of a node, such as the Bookone in When you want to get a section roll name, use the tag method.