Python custom method for parsing simple XML format files

Last Update:2016-06-06 Source: Internet

Author: User

Tags cdata

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The example in this article describes a Python custom method for parsing a simple XML format file. Share to everyone for your reference. The specific analysis is as follows:

Because the company's internal interface returns a string that supports 2 forms: PHP array, XML, PHP array Python cannot be used directly, and XML string format is not standard, so it cannot be parsed with standard modules. "The nonstandard place is that some nodes will have names that start with numbers," so write a simple step to parse the file and use it for the interface test.

#!/usr/bin/env python#encoding:utf-8import Reclass xmlparse:def __init__ (self, xmlstr): Self.xmlstr = Xmlstr Self . XMLDOM = Self.__convet2utf8 () self.xmlnodelist = [] Self.xpath = "def __convet2utf8 (self): Headstr = Self.__g Et_head () Xmldomstr = Self.xmlstr.replace (Headstr, ') if ' gbk ' in headstr:xmldomstr = Xmldomstr.decode (' GBK ' ). Encode (' utf-8 ') elif ' gb2312 ' in headstr:xmldomstr = Self.xmlstr.decode (' gb2312 '). Encode (' Utf-8 ') return XML Domstr def __get_head (self): Headpat = R ' <\?xml.*\?> ' Headpatobj = Re.compile (headpat) headregobj = HEADPA Tobj.match (self.xmlstr) If headregobj:headstr = Headregobj.group () return headstr Else:return ' d EF Parse (self, xpath): Self.xpath = XPath xpatlist = [] Xpatharr = Self.xpath.split ('/') for XNode in Xpatharr          : If Xnode:spcindex = Xnode.find (' [') if spcindex > -1:index = Int (xnode[spcindex+1:-1]) XNode = xnode[:Spcindex] Else:index = 0; Temppat = (' <%s> (. *?)
 '% (XNode, xnode), index) xpatlist.append (temppat) xmlnodestr = Self.xmldom for xpat,index in xpatlist: XmlNodeList = Re.findall (xpat,xmlnodestr) xmlnodestr = Xmlnodelist[index] If Xmlnodestr.startswith (R '<![CDATA['):        xmlnodestr = xmlnodestr.replace(r'<![CDATA[','')[:-3]    self.xmlnodelist = xmlnodelist    return xmlnodestrif '__main__' == __name__:  xmlstr = '<?xml version="1.0" encoding="utf-8" standalone="yes" ?> <resultObject> <product_id></product_id> aaaaa <product_name> <![CDATA[bbbbb]]>   bbbbb bbbbb  ' xpath1 = '/product_id ' xpath2 = '/product_id[1] ' xpath3 = '/a/product_id ' XP = Xmlparse (xmlstr) print ' Xmlstr: ', XP. Xmlstr print ' xmldom: ', xp.xmldom print '------------------------------' getstr = Xp.parse (xpath1) print ' XPath: ', XP.XP Ath print ' Get list: ', xp.xmlnodelist print ' Get string: ', getstr print '------------------------------' getstr = XP.PA RSE (xpath2) print ' XPath: ', xp.xpath print ' Get list: ', xp.xmlnodelist print ' Get string: ', getstr print '-------------- ----------------' getstr = Xp.parse (xpath3) print ' XPath: ', xp.xpath print ' Get list: ', xp.xmlnodelist print ' Get string : ', Getstr

Operation Result:

XMLSTR: <?xml version= "1.0" encoding= "Utf-8" standalone= "yes"?>
 
  
  
   
    
   aaaaa
  
   
 
    bbbbb
   
    
    
     
      
     bbbbb
    
     
    
      bbbbb
    
      
  
    
   XMLDOM: 
 
  
  
   
    
   aaaaa
  
   
  
    bbbbb
   
    
    
     
      
     bbbbb
    
     ------------------------------XPath:/product_
    
      bbbbb
    
      
  
   
 
   Idget list: [' aaaaa ', ' bbbbb ']get string:aaaaa------------------------------XPath:/product_id[1] Get list: [' aaaaa ', ' BBBBB ']get string:bbbbb------------------------------XPath:/a/product_idget list: [' AAAAA ']get string:aaaaa

Because the returned XML format is relatively simple and has no attributes, it is easier to process. But the test still found a bug. That is, when a regular match problem occurs when the same node is nested, the problem can be resolved by avoiding the name of the nested node in the XPath, or only by rewriting the complex mechanism.

Hopefully this article will help you with Python programming.



This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More