Python network programming learning notes (8): XML generation and parsing (DOM, ElementTree)

Source: Internet
Author: User

Xml. dom

DOM is short for Document Object Model, which is an advanced Tree Representation of XML documents. This model is not just for Python, but a common XML model. The Python DOM package is built based on SAX and is included in the standard XML support of Python 2.0.

1. Introduction to xml. dom

1. Main Methods:

Minidom. parse (filename): load and read XML files
Doc.doc umentElement: Get the XML Document Object
Node. getAttribute (AttributeName): Get the XML node Attribute Value
Node. getElementsByTagName (TagName): Get the XML node object set
Node. childNodes: returns the list of subnodes.
Node. childNodes [index]. nodeValue: Get the XML node value.
Node. firstChild: access the first node, equivalent to pagexml. childNodes [0]
Return the xml text of the Node:
Doc = minidom. parse (filename)
Doc. toxml ('utf-8 ')

Access element attributes:

Node. attributes ["id"]
A. name # Is the above "id"
A. value # Attribute value

2. Examples

Example 1: File Name: book. xml

Copy codeThe Code is as follows:
<? Xml version = "1.0" encoding = "UTF-8"?>
<Info>
<Intro> Book message </intro>
<List id = '001'>
<Head> bookone <Name> python check </name>
<Number> 001 </number>
<Page> 200 </page>
</List>

<List id = '002'>
<Head> booktwo <Name> python learn </name>
<Number> 002 </number>
<Page> 300 </page>
</List>

</Info>

(1) create a DOM object

Copy codeThe Code is as follows:
Import xml. dom. minidom
Dom1 = xml. dom. minidom. parse ('book. xml ')

(2) Get the root byte

Root1_dom1.doc umentElement # The root node is obtained here.
Print root. nodeName, ',', root. nodeValue, ',', root. nodeType

The returned result is:

Info, None, 1

Where:

Info indicates the root node name root. nodeName
None indicates the root node value root. nodeValue

1 indicates the root node type root. nodeType. More node types are shown in the table below:

NodeType

Named Constant

1

ELEMENT_NODE

2

ATTRIBUTE_NODE

3

TEXT_NODE

4

CDATA_SECTION_NODE

5

ENTITY_REFERENCE_NODE

6

ENTITY_NODE

7

PROCESSING_INSTRUCTION_NODE

8

COMMENT_NODE

9

DOCUMENT_NODE

10

DOCUMENT_TYPE_NODE

11

DOCUMENT_FRAGMENT_NODE

12

NOTATION_NODE


(3) access to sub-elements and sub-nodes

A. Return to the root subnode list

Copy codeThe Code is as follows:
Import xml. dom. minidom
Dom1 = xml. dom. minidom. parse ('book. xml ')
Root=dom1.doc umentElement
# Print root. nodeName, ',', root. nodeValue, ',', root. nodeType
Print root. childNodes

The running result is:

[<DOM Text node "U' \ N'">, <DOM Element: intro at 0x124ef58>, <DOM Text node "U' \ N'" >,< DOM Element: list at 0x1254058>, <DOM Text node "U' \ n \ N'">, <DOM Element: list at 0x1254418>, <DOM Text node "U' \ n \ N'">]

B. Obtain the XML node value. For example, return the intro value and name of the second subnode under the root node. Add the following sentence:

Copy codeThe Code is as follows:
Print root. childNodes [1]. nodeName, root. childNodes [1]. nodeValue

The running result is:

Intro None

C. Access the first node

Copy codeThe Code is as follows:
Print root. firstChild. nodeName

The running result is:

# Text

D. Obtain the value of the known element name. To obtain the book message after intro, you can use the following method:

Copy codeThe Code is as follows:
Import xml. dom. minidom
Dom1 = xml. dom. minidom. parse ('book. xml ')
Root=dom1.doc umentElement
# Print root. nodeName, ',', root. nodeValue, ',', root. nodeType
Node = root. getElementsByTagName ('intro') [0]
For node in node. childNodes:
If node. nodeType in (node. TEXT_NODE, node. CDATA_SECTION_NODE ):
Print node. data

The disadvantage of this method is that you need to judge the type, which is not very convenient to use. The running result is:

Book message

Ii. xml Parsing

Parse the preceding xml

The code for method 1 is as follows:

Copy codeThe Code is as follows:
# @ Xiaowuyi http://www.cnblogs.com/xiaowuyi
# Xml Parsing

Import xml. dom. minidom
Dom1 = xml. dom. minidom. parse ('book. xml ')
Root=dom1.doc umentElement
Book = {}
Booknode = root. getElementsByTagName ('LIST ')
For booklist in booknode:
Print '=' * 20
Print 'id: '+ booklist. getAttribute ('id ')
For nodelist in booklist. childNodes:
If nodelist. nodeType = 1:
Print nodelist. nodeName + ':',
For node in nodelist. childNodes:
Print node. data

The running result is:

================================
Id: 001
Head: bookone
Name: python check
Number: 001
Page: 200
================================
Id: 002
Head: booktwo
Name: python learn
Number: 002
Page: 300

Method 2:

Code:

Copy codeThe Code is as follows:
# @ Xiaowuyi http://www.cnblogs.com/xiaowuyi
# Xml Parsing

Import xml. dom. minidom
Dom1 = xml. dom. minidom. parse ('book. xml ')
Root=dom1.doc umentElement
Book = {}
Booknode = root. getElementsByTagName ('LIST ')
For booklist in booknode:
Print '=' * 20
Print 'id: '+ booklist. getAttribute ('id ')
Print 'head: '+ booklist. getElementsByTagName ('head') [0]. childNodes [0]. nodeValue. strip ()
Print 'name: '+ booklist. getElementsByTagName ('name') [0]. childNodes [0]. nodeValue. strip ()
Print 'Number: '+ booklist. getElementsByTagName ('number') [0]. childNodes [0]. nodeValue. strip ()
Print 'page: '+ booklist. getElementsByTagName ('page') [0]. childNodes [0]. nodeValue. strip ()

The running result is the same as that of method 1. Compared with the preceding two methods, method 1 performs multiple cycles Based on the xml tree structure, which is less readable than method 2. The method directly performs operations on each node, making it clearer. To call more methods, you can use a list and a dictionary for storage. For details, see method 3:

Copy codeThe Code is as follows:
# @ Xiaowuyi http://www.cnblogs.com/xiaowuyi
# Xml Parsing
Import xml. dom. minidom
Dom1 = xml. dom. minidom. parse ('book. xml ')
Root=dom1.doc umentElement
Book = []
Booknode = root. getElementsByTagName ('LIST ')
For booklist in booknode:
Bookdict = {}
Bookdict ['id'] = booklist. getAttribute ('id ')
Bookdict ['head'] = booklist. getElementsByTagName ('head') [0]. childNodes [0]. nodeValue. strip ()
Bookdict ['name'] = booklist. getElementsByTagName ('name') [0]. childNodes [0]. nodeValue. strip ()
Bookdict ['number'] = booklist. getElementsByTagName ('number') [0]. childNodes [0]. nodeValue. strip ()
Bookdict ['page'] = booklist. getElementsByTagName ('page') [0]. childNodes [0]. nodeValue. strip ()
Book. append (bookdict)
Print book

The running result is:

[{'Head': u'bookone', 'page': u'000000', 'number': u'001', 'id': u'001 ', 'name': u'python check'}, {'head': u'booktwo', 'page': u'000000', 'number': u'002 ', 'id': u'002 ', 'name': u'python learn'}]

The list contains two dictionaries.

3. Create an XML file
Here, we use the result obtained in method 3 to create an xml file.

Copy codeThe Code is as follows:
#-*-Coding: cp936 -*-
# @ Xiaowuyi http://www.cnblogs.com/xiaowuyi
# Xml Creation

Import xml. dom
Def create_element (doc, tag, attr ):
# Create an element node
ElementNode = doc. createElement (tag)
# Create a text node
TextNode = doc. createTextNode (attr)
# Use a text node as a subnode of an element node
ElementNode. appendChild (textNode)
Return elementNode

Dom1 = xml. dom. getDOMImplementation () # create a document object. The document object is used to create various nodes.
Doc = dom1.createDocument (None, "info", None)
Top_element = doc.doc umentElement # obtain the root node.
Books = [{'head': u'bookone', 'page': u'000000', 'number': u'001', 'id': u'001 ', 'name': u'python check'}, {'head': u'booktwo', 'page': u'000000', 'number': u'002 ', 'id': u'002 ', 'name': u'python learn'}]
For book in books:
SNode = doc. createElement ('LIST ')
SNode. setAttribute ('id', str (book ['id'])
HeadNode = create_element (doc, 'head', book ['head'])
NameNode = create_element (doc, 'name', book ['name'])
NumberNode = create_element (doc, 'number', book ['number'])
PageNode = create_element (doc, 'page', book ['page'])
SNode. appendChild (headNode)
SNode. appendChild (nameNode)
SNode. appendChild (pageNode)
Top_element.appendChild (sNode) # Add the traversal node to the root node
Xmlfile = open ('bookdate. xml', 'w ')
Doc. writexml (xmlfile, addindent = ''' * 4, newl = '\ n', encoding = 'utf-8 ')
Xmlfile. close ()

Generate the bookdate. xml file after running, which is the same as book. xml.

Xml. etree. ElementTree

Still use the example in Example 1 to parse and analyze the xml.

1. load XML

Method 1: directly load files

Copy codeThe Code is as follows:
Import xml. etree. ElementTree
Root = xml. etree. ElementTree. parse ('book. xml ')

Method 2: load the specified string

Copy codeThe Code is as follows:
Import xml. etree. ElementTree
Root = xml. etree. ElementTree. fromstring (xmltext) xmltext is the specified string.

2. Get nodes

Method 1 use the getiterator method to obtain the specified Node

Book_node = root. getiterator ("list ")

Method 2: Use the getchildren method to obtain the subnode. In example 1, you need to obtain the value of the head of the subnode under the list:

Copy codeThe Code is as follows:
# @ Small Five-definition http://www.cnblogs.com/xiaowuyiimport xml. etree. ElementTree
Root = xml. etree. ElementTree. parse ('book. xml ')
Book_node = root. getiterator ("list ")
For node in book_node:
Book_node_child = node. getchildren () [0]
Print book_node_child.tag + ':' + book_node_child.text

The running result is:

Head: bookone
Head: booktwo

Method 3 use the find and findall Methods

Find method to find the specified first node:

Copy codeThe Code is as follows:
#-*-Coding: cp936 -*-
# @ Xiao Wuyi
Import xml. etree. ElementTree
Root = xml. etree. ElementTree. parse ('book. xml ')
Book_find = root. find ('LIST ')
For note in book_find:
Print note. tag + ':' + note. text

Running result:

Head: bookone
Name: python check
Number: 001
Page: 200

The findall method will find all the specified nodes:

Copy codeThe Code is as follows:
#-*-Coding: cp936 -*-
# @ Xiao Wuyi
Import xml. etree. ElementTree
Root = xml. etree. ElementTree. parse ('book. xml ')
Book = root. findall ('LIST ')
For book_list in book:
For note in book_list:
Print note. tag + ':' + note. text

Running result:

Head: bookone
Name: python check
Number: 001
Page: 200
Head: booktwo
Name: python learn
Number: 002
Page: 300

3. Example of parsing book. xml

Copy codeThe Code is as follows:
#-*-Coding: cp936 -*-
# @ Xiao Wuyi
Import xml. etree. ElementTree
Root = xml. etree. ElementTree. parse ('book. xml ')
Book = root. findall ('LIST ')
For book_list in book:
Print '=' * 20
If book_list.attrib.has_key ('id '):
Print "id:" + book_list.attrib ['id']
For note in book_list:
Print note. tag + ':' + note. text
Print '=' * 20

The running result is:

================================
Id: 001
Head: bookone
Name: python check
Number: 001
Page: 200
================================
Id: 002
Head: booktwo
Name: python learn
Number: 002
Page: 300
================================

Note:
When you want to obtain the attribute value, such as list id = '001', use the attrib method.
To obtain the node value, use the text method for bookone in Use the tag method to obtain the node name.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.