Python custom parsing of simple xml format files

Source: Internet
Author: User
This article mainly introduces the python custom parsing method for simple xml format files. it involves the skills related to Python parsing XML files and is very useful, for more information about parsing simple xml files, see the example in this article. Share it with you for your reference. The specific analysis is as follows:

Because the strings returned by the internal interface of the company support two forms: php array and xml; the php array python cannot be used directly, but the xml string format is not standard, therefore, standard module parsing is not supported. [It is not standard that the names of some nodes start with numbers]. therefore, write a simple step to parse the file for interface testing.

#!/usr/bin/env python#encoding: utf-8import reclass xmlparse:  def __init__(self, xmlstr):    self.xmlstr = xmlstr    self.xmldom = self.__convet2utf8()    self.xmlnodelist = []    self.xpath = ''  def __convet2utf8(self):    headstr = self.__get_head()    xmldomstr = self.xmlstr.replace(headstr, '')    if 'gbk' in headstr:       xmldomstr = xmldomstr.decode('gbk').encode('utf-8')    elif 'gb2312' in headstr:      xmldomstr = self.xmlstr.decode('gb2312').encode('utf-8')    return xmldomstr  def __get_head(self):    headpat = r'<\?xml.*\?>'    headpatobj = re.compile(headpat)    headregobj = headpatobj.match(self.xmlstr)    if headregobj:      headstr = headregobj.group()      return headstr    else:      return ''  def parse(self, xpath):    self.xpath = xpath    xpatlist = []    xpatharr = self.xpath.split('/')    for xnode in xpatharr:      if xnode:        spcindex = xnode.find('[')        if spcindex > -1:          index = int(xnode[spcindex+1:-1])          xnode = xnode[:spcindex]        else:          index = 0;        temppat = ('<%s>(.*?)
 ' % (xnode, xnode),index)        xpatlist.append(temppat)    xmlnodestr = self.xmldom    for xpat,index in xpatlist:      xmlnodelist = re.findall(xpat,xmlnodestr)      xmlnodestr = xmlnodelist[index]      if xmlnodestr.startswith(r''):        xmlnodestr = xmlnodestr.replace(r'<![CDATA[','')[:-3]    self.xmlnodelist = xmlnodelist    return xmlnodestrif '__main__' == __name__:  xmlstr = '<&#63;xml version="1.0" encoding="utf-8" standalone="yes" &#63;><resultObject><product_id>aaaaa</product_id><product_name><![CDATA[bbbbb
  
   bbbbb
  
  bbbbb
  '  xpath1 = '/product_id'  xpath2 = '/product_id[1]'  xpath3 = '/a/product_id'  xp = xmlparse(xmlstr)  print 'xmlstr:',xp.xmlstr  print 'xmldom:',xp.xmldom  print '------------------------------'  getstr = xp.parse(xpath1)  print 'xpath:',xp.xpath  print 'get list:',xp.xmlnodelist  print 'get string:', getstr  print '------------------------------'  getstr = xp.parse(xpath2)  print 'xpath:',xp.xpath  print 'get list:',xp.xmlnodelist  print 'get string:', getstr  print '------------------------------'  getstr = xp.parse(xpath3)  print 'xpath:',xp.xpath  print 'get list:',xp.xmlnodelist  print 'get string:', getstr

Running result:

xmlstr: <?xml version="1.0" encoding="utf-8" standalone="yes" ?>
 
  
   aaaaa
  
  bbbbb
   
    
     bbbbb
    
    bbbbb
    
  
 xmldom: 
 
  
   aaaaa
  
  bbbbb
   
    
     bbbbb
    
    bbbbb
    
  
 ------------------------------xpath: /product_idget list: ['aaaaa', 'bbbbb']get string: aaaaa------------------------------xpath: /product_id[1] get list: ['aaaaa', 'bbbbb']get string: bbbbb------------------------------xpath: /a/product_idget list: ['aaaaa']get string: aaaaa

Because the returned xml format is relatively simple and there are no nodes with attributes, it is easier to process. However, the test still found a bug. That is, when the same node is nested, a regular expression matching problem occurs. this problem can be solved by avoiding the nested node name in xpath. Otherwise, only a complicated rewrite mechanism is required.

I hope this article will help you with Python programming.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.