This article mainly introduces Python custom parsing simple XML format files, involving Python parsing XML file related skills, very practical value, the need for Friends can refer to:
Because the company's internal interface returns the string support 2 kinds of forms: PHP array, XML; results php array python can not be used directly, and XML string format is not standard, so also can not use the standard module parsing. "Non-standard place is the name of some node will be the beginning of the number", so write a simple step to parse the file, used to do interface testing.
?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26-27--28 29---30 31--32 33 34 35 36 37 38-39 40 41 42 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 |
#!/usr/bin/env python #encoding: utf-8 import re class xmlparse:def __init__ (Self, xmlstr): Self.xmlstr = Xmlstr SELF.XM Ldom = Self.__convet2utf8 () self.xmlnodelist = [] Self.xpath = ' def __convet2utf8 (self): Headstr = self.__get_head () XML Domstr = Self.xmlstr.replace (Headstr, ') if ' gbk ' in headstr:xmldomstr = Xmldomstr.decode (' GBK '). Encode (' utf-8 ') elif ' G b2312 ' in headstr:xmldomstr = Self.xmlstr.decode (' gb2312 '). Encode (' Utf-8 ') return xmldomstr def __get_head (self): Headpat = R ' <?xml.*?> ' Headpatobj = Re.compile (headpat) headregobj = Headpatobj.match (self.xmlstr) if headregobj:h Eadstr = Headregobj.group () return headstr Else:return ' Def parse (self, xpath): Self.xpath = XPath xpatlist = [] Xpatha rr = Self.xpath.split ('/') for xnode in xpatharr:if xnode:spcindex = Xnode.find (' [') if spcindex > -1:index = Int (xn ODE[SPCINDEX+1:-1]) XNode = xnode[:spcindex] Else:index = 0; Temppat = (' <%s> (. *?) </%s> '% (XNode, xnode), index) xpatlist.append (TEMPPAT) XMLNodestr = Self.xmldom for xpat,index in xpatlist:xmlnodelist = Re.findall (xpat,xmlnodestr) xmlnodestr = Xmlnodelist[index ] If Xmlnodestr.startswith (R ' <![ cdata['): Xmlnodestr = Xmlnodestr.replace (R ' <![ cdata[', ') [: -3] self.xmlnodelist = xmlnodelist return xmlnodestr if ' __main__ ' = __name__: Xmlstr = ' <?xml version= ' 1 .0 "encoding=" utf-8 "standalone=" yes "? ><resultobject><a><product_id>aaaaa</product_id ><product_name><! [cdata[bbbbb]]></a><b><product_id>bbbbb</product_id><product_name><! [cdata[bbbbb]]></b></product_name></resultobject> ' xpath1 = '/product_id ' xpath2 = '/product_id [1] ' xpath3 = '/a/product_id ' XP = Xmlparse (xmlstr) print ' xmlstr: ', xp.xmlstr print ' xmldom: ', xp.xmldom print '----------- -------------------' getstr = Xp.parse (xpath1) print ' XPath: ', xp.xpath print ' Get-list: ', xp.xmlnodelist print ' Get string : ', getstr print '------------------------------' getstr = Xp.parse (xpath2) print ' XPath: ', xp.xpath print ' Get-list: ', xp.xmlnodelist print ' Get string: ', getstr print '---------------------- --------' getstr = Xp.parse (xpath3) print ' XPath: ', xp.xpath print ' Get-list: ', xp.xmlnodelist print ' Get string: ', Getstr |
Run Result:
?
1 2 3 4 5 6 7 8 9 |
xmlstr: <?xml version= "1.0" encoding= "Utf-8" standalone= "yes"? ><resultobject><a><product_id >aaaaa</product_id><product_name><! [cdata[bbbbb]]></a><b><product_id>bbbbb</product_id><product_name><! [cdata[bbbbb]]></b></product_name></resultobject> xmldom: <resultObject><a>< product_id>aaaaa</product_id><product_name><! [cdata[bbbbb]]></a><b><product_id>bbbbb</product_id><product_name><! [cdata[bbbbb]]></b></product_name></resultobject>------------------------------XPath:/ product_id get list: [' aaaaa ', ' bbbbb '] get string:aaaaa------------------------------XPath:/product_id[1] Get list: [ ' AAAAA ', ' bbbbb '] get string:bbbbb------------------------------XPath:/A/product_id get list: [' AAAAA '] get string:aaaaa |
Because the XML format returned is simpler, there are no nodes with attributes, so it is easier to deal with them. But the test still found a bug. That is, when the same node is nested, there will be a positive match problem, which can be solved by avoiding the presence of nested nodes in XPath, otherwise, only complex mechanisms are rewritten.
I hope this article will help you with your Python programming.