The example in this article describes a Python custom method for parsing a simple XML format file. Share to everyone for your reference. The specific analysis is as follows:
Because the company's internal interface returns a string that supports 2 forms: PHP array, XML, PHP array Python cannot be used directly, and XML string format is not standard, so it cannot be parsed with standard modules. "The nonstandard place is that some nodes will have names that start with numbers," so write a simple step to parse the file and use it for the interface test.
#!/usr/bin/env python#encoding:utf-8import Reclass xmlparse:def __init__ (self, xmlstr): Self.xmlstr = Xmlstr Self . XMLDOM = Self.__convet2utf8 () self.xmlnodelist = [] Self.xpath = "def __convet2utf8 (self): Headstr = Self.__g Et_head () Xmldomstr = Self.xmlstr.replace (Headstr, ') if ' gbk ' in headstr:xmldomstr = Xmldomstr.decode (' GBK ' ). Encode (' utf-8 ') elif ' gb2312 ' in headstr:xmldomstr = Self.xmlstr.decode (' gb2312 '). Encode (' Utf-8 ') return XML Domstr def __get_head (self): Headpat = R ' <\?xml.*\?> ' Headpatobj = Re.compile (headpat) headregobj = HEADPA Tobj.match (self.xmlstr) If headregobj:headstr = Headregobj.group () return headstr Else:return ' d EF Parse (self, xpath): Self.xpath = XPath xpatlist = [] Xpatharr = Self.xpath.split ('/') for XNode in Xpatharr : If Xnode:spcindex = Xnode.find (' [') if spcindex > -1:index = Int (xnode[spcindex+1:-1]) XNode = xnode[:Spcindex] Else:index = 0; Temppat = (' <%s> (. *?)
'% (XNode, xnode), index) xpatlist.append (temppat) xmlnodestr = Self.xmldom for xpat,index in xpatlist: XmlNodeList = Re.findall (xpat,xmlnodestr) xmlnodestr = Xmlnodelist[index] If Xmlnodestr.startswith (R '<![CDATA['): xmlnodestr = xmlnodestr.replace(r'<![CDATA[','')[:-3] self.xmlnodelist = xmlnodelist return xmlnodestrif '__main__' == __name__: xmlstr = '<?xml version="1.0" encoding="utf-8" standalone="yes" ?> <resultObject> <product_id></product_id> aaaaa <product_name> <![CDATA[bbbbb]]> bbbbb bbbbb ' xpath1 = '/product_id ' xpath2 = '/product_id[1] ' xpath3 = '/a/product_id ' XP = Xmlparse (xmlstr) print ' Xmlstr: ', XP. Xmlstr print ' xmldom: ', xp.xmldom print '------------------------------' getstr = Xp.parse (xpath1) print ' XPath: ', XP.XP Ath print ' Get list: ', xp.xmlnodelist print ' Get string: ', getstr print '------------------------------' getstr = XP.PA RSE (xpath2) print ' XPath: ', xp.xpath print ' Get list: ', xp.xmlnodelist print ' Get string: ', getstr print '-------------- ----------------' getstr = Xp.parse (xpath3) print ' XPath: ', xp.xpath print ' Get list: ', xp.xmlnodelist print ' Get string : ', Getstr
Operation Result:
XMLSTR: <?xml version= "1.0" encoding= "Utf-8" standalone= "yes"?>
aaaaa
bbbbb
bbbbb
bbbbb
XMLDOM:
aaaaa
bbbbb
bbbbb
------------------------------XPath:/product_
bbbbb
Idget list: [' aaaaa ', ' bbbbb ']get string:aaaaa------------------------------XPath:/product_id[1] Get list: [' aaaaa ', ' BBBBB ']get string:bbbbb------------------------------XPath:/a/product_idget list: [' AAAAA ']get string:aaaaa
Because the returned XML format is relatively simple and has no attributes, it is easier to process. But the test still found a bug. That is, when a regular match problem occurs when the same node is nested, the problem can be resolved by avoiding the name of the nested node in the XPath, or only by rewriting the complex mechanism.
Hopefully this article will help you with Python programming.