Python parsing xml module encapsulation code

Source: Internet
Author: User

There are the following xml files:

Copy codeThe Code is as follows:
<? Xml version = "1.0" encoding = "UTF-8"?>
<Root>
<Childs>
<Child name = 'first'> 1 </child>
<Child value = "2"> 2 </child>
</Childs>
</Root>

The following describes several methods for parsing xml files using python.

Method 1: the python module automatically traverses all nodes:

Copy codeThe Code is as follows:
#! /Usr/bin/env python
#-*-Coding: UTF-8 -*-
From xml. sax. handler import ContentHandler
From xml. sax import parse
Class TestHandle (ContentHandler ):
Def _ init _ (self, inlist ):
Self. inlist = inlist

Def startElement (self, name, attrs ):
Print 'name: ', name, 'attrs:', attrs. keys ()

Def endElement (self, name ):
Print 'endname', name

Def characters (self, chars ):
Print 'chars', chars
Self. inlist. append (chars)


If _ name _ = '_ main __':
Lt = []
Parse ('test. xml', TestHandle (lt ))
Print lt

Result:
[Html] view plaincopy
Name: root attrs: []
Chars

Name: childs attrs: []
Chars

Name: child attrs: [u'name']
Chars 1
Endname child
Chars

Name: child attrs: [u'value']
Chars 2
Endname child
Chars

Endname childs
Chars

Endname root
[U' \ n', U' \ n', u'1', U' \ n', u'2', U' \ n ', u' \ n']

Method 2: The python module obtains the root node and searches for the specified node as needed:

Copy codeThe Code is as follows:
#! /Usr/bin/env python
#-*-Coding: UTF-8 -*-
From xml. dom import minidom
Xmlstr = ''' <? Xml version = "1.0" encoding = "UTF-8"?>
<Hash>
<Request name = 'first'>/2/photos/square/type. xml </request>
<Error_code> 21301 </error_code>
<Error> auth faild! </Error>
</Hash>
'''
Def doxml (xmlstr ):
Dom = minidom. parseString (xmlstr)
Print 'dom :'
Print dom. toxml ()

Root = dom. firstChild
Print 'root :'
Print root. toxml ()

Childs = root. childNodes
For child in childs:
Print child. toxml ()
If child. nodeType = child. TEXT_NODE:
Pass
Else:
Print 'child node attribute name: ', child. getAttribute ('name ')
Print 'child node name: ', child. nodeName
Print 'child node len: ', len (child. childNodes)
Print 'child data: ', child. childNodes [0]. data
Print '============================================== ='
Print 'more help info to see :'
For med in dir (child ):
Print help (med)


If _ name _ = '_ main __':
Doxml (xmlstr)

Result:
[Html] view plaincopy
Dom:
<? Xml version = "1.0"?> <Hash>
<Request name = "first">/2/photos/square/type. xml </request>
<Error_code> 21301 </error_code>
<Error> auth faild! </Error>
</Hash>
Root:
<Hash>
<Request name = "first">/2/photos/square/type. xml </request>
<Error_code> 21301 </error_code>
<Error> auth faild! </Error>
</Hash>

<Request name = "first">/2/photos/square/type. xml </request>
Child node attribute name: first
Child node name: request
Child node len: 1
Child data:/2/photos/square/type. xml
========================================================
More help info to see:
The two methods have their own advantages. There are too many python xml processing modules. Currently, only these two methods are used.

====== Supplemental split line ============================
In practice, it is found that the mimidom of python cannot parse the xml of other encodings. It can only parse the UTF-8 encoding, And the header Declaration of the xml file must also be UTF-8. If it is another encoding, an error will be reported.
The solution on the Internet is to replace the encoding statement in the header of the xml file, and then convert the encoding to UTF-8 and then use minidom for decoding. The actual test is feasible, but it is a bit cumbersome.

This section is the second part of the code encapsulated by the python parsing xml module.
===== Split line for writing xml content ==========

Copy codeThe Code is as follows:
#! \ Urs \ bin \ env python
# Encoding: UTF-8
From xml. dom import minidom

Class xmlwrite:
Def _ init _ (self, resultfile ):
Self. resultfile = resultfile
Self. rootname = 'api'
Self. _ create_xml_dom ()

Def _ create_xml_dom (self ):
Xmlimpl = minidom. getDOMImplementation ()
Self. dom = xmlimpl. createDocument (None, self. rootname, None)
Self. root = self.dom.doc umentElement

Def _ get_spec_node (self, xpath ):
Patharr = xpath. split (R '/')
Parentnode = self. root
Exist = 1
For nodename in patharr:
If nodename. strip () = '':
Continue
If not exist:
Return None
Spcindex = nodename. find ('[')
If spcindex>-1:
Index = int (nodename [spcindex + 1:-1])
Else:
Index = 0
Count = 0
Childs = parentnode. childNodes
For child in childs:
If child. nodeName = nodename [: spcindex]:
If count = index:
Parentnode = child
Exist = 1
Break
Count + = 1
Continue
Else:
Exist = 0
Return parentnode


Def write_node (self, parent, nodename, value, attribute = None, CDATA = False ):
Node = self. dom. createElement (nodename)
If value:
If CDATA:
Nodedata = self. dom. createCDATASection (value)
Else:
Nodedata = self. dom. createTextNode (value)
Node. appendChild (nodedata)
If attribute and isinstance (attribute, dict ):
For key, value in attribute. items ():
Node. setAttribute (key, value)
Try:
Parentnode = self. _ get_spec_node (parent)
Except t:
Print 'get parent Node Fail, Use the Root as parent node'
Parentnode = self. root
Parentnode. appendChild (node)


Def write_start_time (self, time ):
Self. write_node ('/', 'starttime', time)

Def write_end_time (self, time ):
Self. write_node ('/', 'endtime', time)

Def write_pass_count (self, count ):
Self. write_node ('/', 'passcount ', count)

Def write_fail_count (self, count ):
Self. write_node ('/', 'failcount', count)

Def write_case (self ):
Self. write_node ('/', 'case', None)

Def write_case_no (self, index, value ):
Self. write_node ('/Case [% s]/' % index, 'No', value)

Def write_case_url (self, index, value ):
Self. write_node ('/Case [% s]/' % index, 'url', value)

Def write_case_dbdata (self, index, value ):
Self. write_node ('/Case [% s]/' % index, 'dbdata', value)

Def write_case_apidata (self, index, value ):
Self. write_node ('/Case [% s]/' % index, 'apidata', value)

Def write_case_dbsql (self, index, value ):
Self. write_node ('/Case [% s]/' % index, 'dbsql', value, CDATA = True)

Def write_case_apixpath (self, index, value ):
Self. write_node ('/Case [% s]/' % index, 'apixpath ', value)

Def save_xml (self ):
Myfile = file (self. resultfile, 'w ')
Self. dom. writexml (myfile, encoding = 'utf-8 ')
Myfile. close ()

If _ name _ = '_ main __':
Xr = xmlwrite (r 'd: \ test. xml ')
Xr. write_start_time ('20140901 ')
Xr. write_end_time ('20140901 ')
Xr. write_pass_count ('22 ')
Xr. write_fail_count ('33 ')
Xr. write_case ()
Xr. write_case ()
Xr. write_case_no (0, '0 ')
Xr. write_case_url (0, 'HTTP: // www.google.com ')
Xr. write_case_url (0, 'HTTP: // www.google.com ')
Xr. write_case_dbsql (0, 'select * from ')
Xr. write_case_dbdata (0, 'dbtata ')
Xr. write_case_apixpath (0, '/xpath ')
Xr. write_case_apidata (0, 'apidata ')
Xr. write_case_no (1, '1 ')
Xr. write_case_url (1, 'HTTP: // www.baidu.com ')
Xr. write_case_url (1, 'HTTP: // www.baidu.com ')
Xr. write_case_dbsql (1, 'select 1 from ')
Xr. write_case_dbdata (1, 'dbtata1 ')
Xr. write_case_apixpath (1, '/xpath1 ')
Xr. write_case_apidata (1, 'apidata1 ')
Xr. save_xml ()

The above Code encapsulates minidom and supports writing nodes through xpath. It does not support matching with attributes in xpath, but supports matching with indexes.
For example,/root/child [1] indicates the root 2nd child nodes.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.