There are the following xml files:
Copy codeThe Code is as follows:
<? Xml version = "1.0" encoding = "UTF-8"?>
<Root>
<Childs>
<Child name = 'first'> 1 </child>
<Child value = "2"> 2 </child>
</Childs>
</Root>
The following describes several methods for parsing xml files using python.
Method 1: the python module automatically traverses all nodes:
Copy codeThe Code is as follows:
#! /Usr/bin/env python
#-*-Coding: UTF-8 -*-
From xml. sax. handler import ContentHandler
From xml. sax import parse
Class TestHandle (ContentHandler ):
Def _ init _ (self, inlist ):
Self. inlist = inlist
Def startElement (self, name, attrs ):
Print 'name: ', name, 'attrs:', attrs. keys ()
Def endElement (self, name ):
Print 'endname', name
Def characters (self, chars ):
Print 'chars', chars
Self. inlist. append (chars)
If _ name _ = '_ main __':
Lt = []
Parse ('test. xml', TestHandle (lt ))
Print lt
Result:
[Html] view plaincopy
Name: root attrs: []
Chars
Name: childs attrs: []
Chars
Name: child attrs: [u'name']
Chars 1
Endname child
Chars
Name: child attrs: [u'value']
Chars 2
Endname child
Chars
Endname childs
Chars
Endname root
[U' \ n', U' \ n', u'1', U' \ n', u'2', U' \ n ', u' \ n']
Method 2: The python module obtains the root node and searches for the specified node as needed:
Copy codeThe Code is as follows:
#! /Usr/bin/env python
#-*-Coding: UTF-8 -*-
From xml. dom import minidom
Xmlstr = ''' <? Xml version = "1.0" encoding = "UTF-8"?>
<Hash>
<Request name = 'first'>/2/photos/square/type. xml </request>
<Error_code> 21301 </error_code>
<Error> auth faild! </Error>
</Hash>
'''
Def doxml (xmlstr ):
Dom = minidom. parseString (xmlstr)
Print 'dom :'
Print dom. toxml ()
Root = dom. firstChild
Print 'root :'
Print root. toxml ()
Childs = root. childNodes
For child in childs:
Print child. toxml ()
If child. nodeType = child. TEXT_NODE:
Pass
Else:
Print 'child node attribute name: ', child. getAttribute ('name ')
Print 'child node name: ', child. nodeName
Print 'child node len: ', len (child. childNodes)
Print 'child data: ', child. childNodes [0]. data
Print '============================================== ='
Print 'more help info to see :'
For med in dir (child ):
Print help (med)
If _ name _ = '_ main __':
Doxml (xmlstr)
Result:
[Html] view plaincopy
Dom:
<? Xml version = "1.0"?> <Hash>
<Request name = "first">/2/photos/square/type. xml </request>
<Error_code> 21301 </error_code>
<Error> auth faild! </Error>
</Hash>
Root:
<Hash>
<Request name = "first">/2/photos/square/type. xml </request>
<Error_code> 21301 </error_code>
<Error> auth faild! </Error>
</Hash>
<Request name = "first">/2/photos/square/type. xml </request>
Child node attribute name: first
Child node name: request
Child node len: 1
Child data:/2/photos/square/type. xml
========================================================
More help info to see:
The two methods have their own advantages. There are too many python xml processing modules. Currently, only these two methods are used.
====== Supplemental split line ============================
In practice, it is found that the mimidom of python cannot parse the xml of other encodings. It can only parse the UTF-8 encoding, And the header Declaration of the xml file must also be UTF-8. If it is another encoding, an error will be reported.
The solution on the Internet is to replace the encoding statement in the header of the xml file, and then convert the encoding to UTF-8 and then use minidom for decoding. The actual test is feasible, but it is a bit cumbersome.
This section is the second part of the code encapsulated by the python parsing xml module.
===== Split line for writing xml content ==========
Copy codeThe Code is as follows:
#! \ Urs \ bin \ env python
# Encoding: UTF-8
From xml. dom import minidom
Class xmlwrite:
Def _ init _ (self, resultfile ):
Self. resultfile = resultfile
Self. rootname = 'api'
Self. _ create_xml_dom ()
Def _ create_xml_dom (self ):
Xmlimpl = minidom. getDOMImplementation ()
Self. dom = xmlimpl. createDocument (None, self. rootname, None)
Self. root = self.dom.doc umentElement
Def _ get_spec_node (self, xpath ):
Patharr = xpath. split (R '/')
Parentnode = self. root
Exist = 1
For nodename in patharr:
If nodename. strip () = '':
Continue
If not exist:
Return None
Spcindex = nodename. find ('[')
If spcindex>-1:
Index = int (nodename [spcindex + 1:-1])
Else:
Index = 0
Count = 0
Childs = parentnode. childNodes
For child in childs:
If child. nodeName = nodename [: spcindex]:
If count = index:
Parentnode = child
Exist = 1
Break
Count + = 1
Continue
Else:
Exist = 0
Return parentnode
Def write_node (self, parent, nodename, value, attribute = None, CDATA = False ):
Node = self. dom. createElement (nodename)
If value:
If CDATA:
Nodedata = self. dom. createCDATASection (value)
Else:
Nodedata = self. dom. createTextNode (value)
Node. appendChild (nodedata)
If attribute and isinstance (attribute, dict ):
For key, value in attribute. items ():
Node. setAttribute (key, value)
Try:
Parentnode = self. _ get_spec_node (parent)
Except t:
Print 'get parent Node Fail, Use the Root as parent node'
Parentnode = self. root
Parentnode. appendChild (node)
Def write_start_time (self, time ):
Self. write_node ('/', 'starttime', time)
Def write_end_time (self, time ):
Self. write_node ('/', 'endtime', time)
Def write_pass_count (self, count ):
Self. write_node ('/', 'passcount ', count)
Def write_fail_count (self, count ):
Self. write_node ('/', 'failcount', count)
Def write_case (self ):
Self. write_node ('/', 'case', None)
Def write_case_no (self, index, value ):
Self. write_node ('/Case [% s]/' % index, 'No', value)
Def write_case_url (self, index, value ):
Self. write_node ('/Case [% s]/' % index, 'url', value)
Def write_case_dbdata (self, index, value ):
Self. write_node ('/Case [% s]/' % index, 'dbdata', value)
Def write_case_apidata (self, index, value ):
Self. write_node ('/Case [% s]/' % index, 'apidata', value)
Def write_case_dbsql (self, index, value ):
Self. write_node ('/Case [% s]/' % index, 'dbsql', value, CDATA = True)
Def write_case_apixpath (self, index, value ):
Self. write_node ('/Case [% s]/' % index, 'apixpath ', value)
Def save_xml (self ):
Myfile = file (self. resultfile, 'w ')
Self. dom. writexml (myfile, encoding = 'utf-8 ')
Myfile. close ()
If _ name _ = '_ main __':
Xr = xmlwrite (r 'd: \ test. xml ')
Xr. write_start_time ('20140901 ')
Xr. write_end_time ('20140901 ')
Xr. write_pass_count ('22 ')
Xr. write_fail_count ('33 ')
Xr. write_case ()
Xr. write_case ()
Xr. write_case_no (0, '0 ')
Xr. write_case_url (0, 'HTTP: // www.google.com ')
Xr. write_case_url (0, 'HTTP: // www.google.com ')
Xr. write_case_dbsql (0, 'select * from ')
Xr. write_case_dbdata (0, 'dbtata ')
Xr. write_case_apixpath (0, '/xpath ')
Xr. write_case_apidata (0, 'apidata ')
Xr. write_case_no (1, '1 ')
Xr. write_case_url (1, 'HTTP: // www.baidu.com ')
Xr. write_case_url (1, 'HTTP: // www.baidu.com ')
Xr. write_case_dbsql (1, 'select 1 from ')
Xr. write_case_dbdata (1, 'dbtata1 ')
Xr. write_case_apixpath (1, '/xpath1 ')
Xr. write_case_apidata (1, 'apidata1 ')
Xr. save_xml ()
The above Code encapsulates minidom and supports writing nodes through xpath. It does not support matching with attributes in xpath, but supports matching with indexes.
For example,/root/child [1] indicates the root 2nd child nodes.