Python custom method for parsing simple XML format files

Source: Internet
Author: User
Tags cdata
The example in this article describes a Python custom method for parsing a simple XML format file. Share to everyone for your reference. The specific analysis is as follows:

Because the company's internal interface returns a string that supports 2 forms: PHP array, XML, PHP array Python cannot be used directly, and XML string format is not standard, so it cannot be parsed with standard modules. "The nonstandard place is that some nodes will have names that start with numbers," so write a simple step to parse the file and use it for the interface test.

#!/usr/bin/env python#encoding:utf-8import Reclass xmlparse:def __init__ (self, xmlstr): Self.xmlstr = Xmlstr Self . XMLDOM = Self.__convet2utf8 () self.xmlnodelist = [] Self.xpath = "def __convet2utf8 (self): Headstr = Self.__g Et_head () Xmldomstr = Self.xmlstr.replace (Headstr, ') if ' gbk ' in headstr:xmldomstr = Xmldomstr.decode (' GBK ' ). Encode (' utf-8 ') elif ' gb2312 ' in headstr:xmldomstr = Self.xmlstr.decode (' gb2312 '). Encode (' Utf-8 ') return XML Domstr def __get_head (self): Headpat = R ' <\?xml.*\?> ' Headpatobj = Re.compile (headpat) headregobj = HEADPA Tobj.match (self.xmlstr) If headregobj:headstr = Headregobj.group () return headstr Else:return ' d EF Parse (self, xpath): Self.xpath = XPath xpatlist = [] Xpatharr = Self.xpath.split ('/') for XNode in Xpatharr          : If Xnode:spcindex = Xnode.find (' [') if spcindex > -1:index = Int (xnode[spcindex+1:-1]) XNode = xnode[:Spcindex] Else:index = 0; Temppat = (' <%s> (. *?)
 '% (XNode, xnode), index) xpatlist.append (temppat) xmlnodestr = Self.xmldom for xpat,index in xpatlist: XmlNodeList = Re.findall (xpat,xmlnodestr) xmlnodestr = Xmlnodelist[index] If Xmlnodestr.startswith (R '<![CDATA['):        xmlnodestr = xmlnodestr.replace(r'<![CDATA[','')[:-3]    self.xmlnodelist = xmlnodelist    return xmlnodestrif '__main__' == __name__:  xmlstr = '<?xml version="1.0" encoding="utf-8" standalone="yes" ?> <resultObject> <product_id></product_id> aaaaa <product_name> <![CDATA[bbbbb]]>   bbbbb bbbbb  ' xpath1 = '/product_id ' xpath2 = '/product_id[1] ' xpath3 = '/a/product_id ' XP = Xmlparse (xmlstr) print ' Xmlstr: ', XP. Xmlstr print ' xmldom: ', xp.xmldom print '------------------------------' getstr = Xp.parse (xpath1) print ' XPath: ', XP.XP Ath print ' Get list: ', xp.xmlnodelist print ' Get string: ', getstr print '------------------------------' getstr = XP.PA RSE (xpath2) print ' XPath: ', xp.xpath print ' Get list: ', xp.xmlnodelist print ' Get string: ', getstr print '-------------- ----------------' getstr = Xp.parse (xpath3) print ' XPath: ', xp.xpath print ' Get list: ', xp.xmlnodelist print ' Get string : ', Getstr

Operation Result:

XMLSTR: <?xml version= "1.0" encoding= "Utf-8" standalone= "yes"?>
 
  
  
   
    
   aaaaa
  
   
 
    bbbbb
   
    
    
     
      
     bbbbb
    
     
    
      bbbbb
    
      
  
    
   XMLDOM: 
 
  
  
   
    
   aaaaa
  
   
  
    bbbbb
   
    
    
     
      
     bbbbb
    
     ------------------------------XPath:/product_
    
      bbbbb
    
      
  
   
 
   Idget list: [' aaaaa ', ' bbbbb ']get string:aaaaa------------------------------XPath:/product_id[1] Get list: [' aaaaa ', ' BBBBB ']get string:bbbbb------------------------------XPath:/a/product_idget list: [' AAAAA ']get string:aaaaa

Because the returned XML format is relatively simple and has no attributes, it is easier to process. But the test still found a bug. That is, when a regular match problem occurs when the same node is nested, the problem can be resolved by avoiding the name of the nested node in the XPath, or only by rewriting the complex mechanism.

Hopefully this article will help you with Python programming.

  • Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.