Python custom parsing simple XML Format File

Source: Internet
Author: User

Because the strings returned by the Internal interface of the company support two forms: PhP array and XML; the PHP array Python cannot be used directly, but the XML string format is not standard, therefore, standard module Parsing is not supported. [It is not standard that the names of some nodes start with numbers]. Therefore, write a simple step to parse the file for interface testing.

#!/usr/bin/env python  #encoding: utf-8import reclass xmlparse:        def __init__(self, xmlstr):        self.xmlstr = xmlstr        self.xmldom = self.__convet2utf8()        self.xmlnodelist = []        self.xpath = ''            def __convet2utf8(self):        headstr = self.__get_head()        xmldomstr = self.xmlstr.replace(headstr, '')        if 'gbk' in headstr:            xmldomstr = xmldomstr.decode('gbk').encode('utf-8')        elif 'gb2312' in headstr:            xmldomstr = self.xmlstr.decode('gb2312').encode('utf-8')        return xmldomstr        def __get_head(self):        headpat = r'<\?xml.*\?>'        headpatobj = re.compile(headpat)        headregobj = headpatobj.match(self.xmlstr)          if headregobj:            headstr = headregobj.group()            return headstr        else:            return ''                def parse(self, xpath):         self.xpath = xpath        xpatlist = []        xpatharr = self.xpath.split('/')        for xnode in xpatharr:            if xnode:                spcindex = xnode.find('[')                if spcindex > -1:                    index = int(xnode[spcindex+1:-1])                    xnode = xnode[:spcindex]                            else:                    index = 0;                temppat = ('<%s>(.*?)</%s>' % (xnode, xnode),index)                               xpatlist.append(temppat)                xmlnodestr = self.xmldom        for xpat,index in xpatlist:            xmlnodelist = re.findall(xpat,xmlnodestr)             xmlnodestr = xmlnodelist[index]            if xmlnodestr.startswith(r'<![CDATA['):                xmlnodestr = xmlnodestr.replace(r'<![CDATA[','')[:-3]                        self.xmlnodelist = xmlnodelist        return xmlnodestr                                        if '__main__' == __name__:    xmlstr = '<?xml version="1.0" encoding="utf-8" standalone="yes" ?><resultObject><a><product_id>aaaaa</product_id><product_name><![CDATA[bbbbb]]></a><b><product_id>bbbbb</product_id><product_name><![CDATA[bbbbb]]></b></product_name></resultObject>'                    xpath1 = '/product_id'    xpath2 = '/product_id[1]'    xpath3 = '/a/product_id'    xp = xmlparse(xmlstr)      print 'xmlstr:',xp.xmlstr    print 'xmldom:',xp.xmldom      print '------------------------------'    getstr = xp.parse(xpath1)    print 'xpath:',xp.xpath          print 'get list:',xp.xmlnodelist         print 'get string:', getstr                     print '------------------------------'    getstr = xp.parse(xpath2)    print 'xpath:',xp.xpath          print 'get list:',xp.xmlnodelist         print 'get string:', getstr             print '------------------------------'    getstr = xp.parse(xpath3)    print 'xpath:',xp.xpath          print 'get list:',xp.xmlnodelist         print 'get string:', getstr                    

Running result:

xmlstr: <?xml version="1.0" encoding="utf-8" standalone="yes" ?><resultObject><a><product_id>aaaaa</product_id><product_name><![CDATA[bbbbb]]></a><b><product_id>bbbbb</product_id><product_name><![CDATA[bbbbb]]></b></product_name></resultObject>xmldom: <resultObject><a><product_id>aaaaa</product_id><product_name><![CDATA[bbbbb]]></a><b><product_id>bbbbb</product_id><product_name><![CDATA[bbbbb]]></b></product_name></resultObject>------------------------------xpath: /product_idget list: ['aaaaa', 'bbbbb']get string: aaaaa------------------------------xpath: /product_id[1]get list: ['aaaaa', 'bbbbb']get string: bbbbb------------------------------xpath: /a/product_idget list: ['aaaaa']get string: aaaaa

Because the returned XML format is relatively simple and there are no nodes with attributes, it is easier to process. However, the test still found a bug. That is, when the same node is nested, a regular expression matching problem occurs. This problem can be solved by avoiding the nested node name in XPath. Otherwise, only a complicated rewrite mechanism is required.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.