Xml. dom
DOM is short for Document Object Model, which is an advanced Tree Representation of XML documents. This model is not just for Python, but a common XML model. The Python DOM package is built based on SAX and is included in the standard XML support of Python 2.0.
1. Introduction to xml. dom
1. Main Methods:
Minidom. parse (filename): load and read XML files
Doc.doc umentElement: Get the XML Document Object
Node. getAttribute (AttributeName): Get the XML node Attribute Value
Node. getElementsByTagName (TagName): Get the XML node object set
Node. childNodes: returns the list of subnodes.
Node. childNodes [index]. nodeValue: Get the XML node value.
Node. firstChild: access the first node, equivalent to pagexml. childNodes [0]
Return the xml text of the Node:
Doc = minidom. parse (filename)
Doc. toxml ('utf-8 ')
Access element attributes:
Node. attributes ["id"]
A. name # Is the above "id"
A. value # Attribute value
2. Examples
Example 1: File Name: book. xml
Copy codeThe Code is as follows:
<? Xml version = "1.0" encoding = "UTF-8"?>
<Info>
<Intro> Book message </intro>
<List id = '001'>
<Head> bookone <Name> python check </name>
<Number> 001 </number>
<Page> 200 </page>
</List>
<List id = '002'>
<Head> booktwo <Name> python learn </name>
<Number> 002 </number>
<Page> 300 </page>
</List>
</Info>
(1) create a DOM object
Copy codeThe Code is as follows:
Import xml. dom. minidom
Dom1 = xml. dom. minidom. parse ('book. xml ')
(2) Get the root byte
Root1_dom1.doc umentElement # The root node is obtained here.
Print root. nodeName, ',', root. nodeValue, ',', root. nodeType
The returned result is:
Info, None, 1
Where:
Info indicates the root node name root. nodeName
None indicates the root node value root. nodeValue
1 indicates the root node type root. nodeType. More node types are shown in the table below:
NodeType |
Named Constant |
1 |
ELEMENT_NODE |
2 |
ATTRIBUTE_NODE |
3 |
TEXT_NODE |
4 |
CDATA_SECTION_NODE |
5 |
ENTITY_REFERENCE_NODE |
6 |
ENTITY_NODE |
7 |
PROCESSING_INSTRUCTION_NODE |
8 |
COMMENT_NODE |
9 |
DOCUMENT_NODE |
10 |
DOCUMENT_TYPE_NODE |
11 |
DOCUMENT_FRAGMENT_NODE |
12 |
NOTATION_NODE |
(3) access to sub-elements and sub-nodes
A. Return to the root subnode list
Copy codeThe Code is as follows:
Import xml. dom. minidom
Dom1 = xml. dom. minidom. parse ('book. xml ')
Root=dom1.doc umentElement
# Print root. nodeName, ',', root. nodeValue, ',', root. nodeType
Print root. childNodes
The running result is:
[<DOM Text node "U' \ N'">, <DOM Element: intro at 0x124ef58>, <DOM Text node "U' \ N'" >,< DOM Element: list at 0x1254058>, <DOM Text node "U' \ n \ N'">, <DOM Element: list at 0x1254418>, <DOM Text node "U' \ n \ N'">]
B. Obtain the XML node value. For example, return the intro value and name of the second subnode under the root node. Add the following sentence:
Copy codeThe Code is as follows:
Print root. childNodes [1]. nodeName, root. childNodes [1]. nodeValue
The running result is:
Intro None
C. Access the first node
Copy codeThe Code is as follows:
Print root. firstChild. nodeName
The running result is:
# Text
D. Obtain the value of the known element name. To obtain the book message after intro, you can use the following method:
Copy codeThe Code is as follows:
Import xml. dom. minidom
Dom1 = xml. dom. minidom. parse ('book. xml ')
Root=dom1.doc umentElement
# Print root. nodeName, ',', root. nodeValue, ',', root. nodeType
Node = root. getElementsByTagName ('intro') [0]
For node in node. childNodes:
If node. nodeType in (node. TEXT_NODE, node. CDATA_SECTION_NODE ):
Print node. data
The disadvantage of this method is that you need to judge the type, which is not very convenient to use. The running result is:
Book message
Ii. xml Parsing
Parse the preceding xml
The code for method 1 is as follows:
Copy codeThe Code is as follows:
# @ Xiaowuyi http://www.cnblogs.com/xiaowuyi
# Xml Parsing
Import xml. dom. minidom
Dom1 = xml. dom. minidom. parse ('book. xml ')
Root=dom1.doc umentElement
Book = {}
Booknode = root. getElementsByTagName ('LIST ')
For booklist in booknode:
Print '=' * 20
Print 'id: '+ booklist. getAttribute ('id ')
For nodelist in booklist. childNodes:
If nodelist. nodeType = 1:
Print nodelist. nodeName + ':',
For node in nodelist. childNodes:
Print node. data
The running result is:
================================
Id: 001
Head: bookone
Name: python check
Number: 001
Page: 200
================================
Id: 002
Head: booktwo
Name: python learn
Number: 002
Page: 300
Method 2:
Code:
Copy codeThe Code is as follows:
# @ Xiaowuyi http://www.cnblogs.com/xiaowuyi
# Xml Parsing
Import xml. dom. minidom
Dom1 = xml. dom. minidom. parse ('book. xml ')
Root=dom1.doc umentElement
Book = {}
Booknode = root. getElementsByTagName ('LIST ')
For booklist in booknode:
Print '=' * 20
Print 'id: '+ booklist. getAttribute ('id ')
Print 'head: '+ booklist. getElementsByTagName ('head') [0]. childNodes [0]. nodeValue. strip ()
Print 'name: '+ booklist. getElementsByTagName ('name') [0]. childNodes [0]. nodeValue. strip ()
Print 'Number: '+ booklist. getElementsByTagName ('number') [0]. childNodes [0]. nodeValue. strip ()
Print 'page: '+ booklist. getElementsByTagName ('page') [0]. childNodes [0]. nodeValue. strip ()
The running result is the same as that of method 1. Compared with the preceding two methods, method 1 performs multiple cycles Based on the xml tree structure, which is less readable than method 2. The method directly performs operations on each node, making it clearer. To call more methods, you can use a list and a dictionary for storage. For details, see method 3:
Copy codeThe Code is as follows:
# @ Xiaowuyi http://www.cnblogs.com/xiaowuyi
# Xml Parsing
Import xml. dom. minidom
Dom1 = xml. dom. minidom. parse ('book. xml ')
Root=dom1.doc umentElement
Book = []
Booknode = root. getElementsByTagName ('LIST ')
For booklist in booknode:
Bookdict = {}
Bookdict ['id'] = booklist. getAttribute ('id ')
Bookdict ['head'] = booklist. getElementsByTagName ('head') [0]. childNodes [0]. nodeValue. strip ()
Bookdict ['name'] = booklist. getElementsByTagName ('name') [0]. childNodes [0]. nodeValue. strip ()
Bookdict ['number'] = booklist. getElementsByTagName ('number') [0]. childNodes [0]. nodeValue. strip ()
Bookdict ['page'] = booklist. getElementsByTagName ('page') [0]. childNodes [0]. nodeValue. strip ()
Book. append (bookdict)
Print book
The running result is:
[{'Head': u'bookone', 'page': u'000000', 'number': u'001', 'id': u'001 ', 'name': u'python check'}, {'head': u'booktwo', 'page': u'000000', 'number': u'002 ', 'id': u'002 ', 'name': u'python learn'}]
The list contains two dictionaries.
3. Create an XML file
Here, we use the result obtained in method 3 to create an xml file.
Copy codeThe Code is as follows:
#-*-Coding: cp936 -*-
# @ Xiaowuyi http://www.cnblogs.com/xiaowuyi
# Xml Creation
Import xml. dom
Def create_element (doc, tag, attr ):
# Create an element node
ElementNode = doc. createElement (tag)
# Create a text node
TextNode = doc. createTextNode (attr)
# Use a text node as a subnode of an element node
ElementNode. appendChild (textNode)
Return elementNode
Dom1 = xml. dom. getDOMImplementation () # create a document object. The document object is used to create various nodes.
Doc = dom1.createDocument (None, "info", None)
Top_element = doc.doc umentElement # obtain the root node.
Books = [{'head': u'bookone', 'page': u'000000', 'number': u'001', 'id': u'001 ', 'name': u'python check'}, {'head': u'booktwo', 'page': u'000000', 'number': u'002 ', 'id': u'002 ', 'name': u'python learn'}]
For book in books:
SNode = doc. createElement ('LIST ')
SNode. setAttribute ('id', str (book ['id'])
HeadNode = create_element (doc, 'head', book ['head'])
NameNode = create_element (doc, 'name', book ['name'])
NumberNode = create_element (doc, 'number', book ['number'])
PageNode = create_element (doc, 'page', book ['page'])
SNode. appendChild (headNode)
SNode. appendChild (nameNode)
SNode. appendChild (pageNode)
Top_element.appendChild (sNode) # Add the traversal node to the root node
Xmlfile = open ('bookdate. xml', 'w ')
Doc. writexml (xmlfile, addindent = ''' * 4, newl = '\ n', encoding = 'utf-8 ')
Xmlfile. close ()
Generate the bookdate. xml file after running, which is the same as book. xml.
Xml. etree. ElementTree
Still use the example in Example 1 to parse and analyze the xml.
1. load XML
Method 1: directly load files
Copy codeThe Code is as follows:
Import xml. etree. ElementTree
Root = xml. etree. ElementTree. parse ('book. xml ')
Method 2: load the specified string
Copy codeThe Code is as follows:
Import xml. etree. ElementTree
Root = xml. etree. ElementTree. fromstring (xmltext) xmltext is the specified string.
2. Get nodes
Method 1 use the getiterator method to obtain the specified Node
Book_node = root. getiterator ("list ")
Method 2: Use the getchildren method to obtain the subnode. In example 1, you need to obtain the value of the head of the subnode under the list:
Copy codeThe Code is as follows:
# @ Small Five-definition http://www.cnblogs.com/xiaowuyiimport xml. etree. ElementTree
Root = xml. etree. ElementTree. parse ('book. xml ')
Book_node = root. getiterator ("list ")
For node in book_node:
Book_node_child = node. getchildren () [0]
Print book_node_child.tag + ':' + book_node_child.text
The running result is:
Head: bookone
Head: booktwo
Method 3 use the find and findall Methods
Find method to find the specified first node:
Copy codeThe Code is as follows:
#-*-Coding: cp936 -*-
# @ Xiao Wuyi
Import xml. etree. ElementTree
Root = xml. etree. ElementTree. parse ('book. xml ')
Book_find = root. find ('LIST ')
For note in book_find:
Print note. tag + ':' + note. text
Running result:
Head: bookone
Name: python check
Number: 001
Page: 200
The findall method will find all the specified nodes:
Copy codeThe Code is as follows:
#-*-Coding: cp936 -*-
# @ Xiao Wuyi
Import xml. etree. ElementTree
Root = xml. etree. ElementTree. parse ('book. xml ')
Book = root. findall ('LIST ')
For book_list in book:
For note in book_list:
Print note. tag + ':' + note. text
Running result:
Head: bookone
Name: python check
Number: 001
Page: 200
Head: booktwo
Name: python learn
Number: 002
Page: 300
3. Example of parsing book. xml
Copy codeThe Code is as follows:
#-*-Coding: cp936 -*-
# @ Xiao Wuyi
Import xml. etree. ElementTree
Root = xml. etree. ElementTree. parse ('book. xml ')
Book = root. findall ('LIST ')
For book_list in book:
Print '=' * 20
If book_list.attrib.has_key ('id '):
Print "id:" + book_list.attrib ['id']
For note in book_list:
Print note. tag + ':' + note. text
Print '=' * 20
The running result is:
================================
Id: 001
Head: bookone
Name: python check
Number: 001
Page: 200
================================
Id: 002
Head: booktwo
Name: python learn
Number: 002
Page: 300
================================
Note:
When you want to obtain the attribute value, such as list id = '001', use the attrib method.
To obtain the node value, use the text method for bookone in Use the tag method to obtain the node name.