在當前項目中,我收到資料庫開發人員提供的XML視圖檔案,其中包含了表資訊; 但這些資訊混雜在大量的UI配置中,很難閱讀,於是我決定用Python來編寫一個簡單的程式來進行 XML 解析,將所需的資料欄位資訊轉換成CSV格式,再匯入到Excel中(耗時2小時),有如下幾點技術體會:
- Python中採用minidom進行解析時,其XML檔案必須是UTF-8編碼格式,否則會出錯。在進行解析前要先進行編碼轉換工作;
- Python中的DOM節點Node值擷取必須要用firstChild.nodeValue形式,不能直接用nodeValue來擷取;
- Python中解析後的String值都是UTF-8格式,所以其File IO操作必須用codecs方式;
- Python編程時逐步從逐行解釋方式過渡到OPP方式,這樣雖然步驟比較多,但調試方便;
參考代碼如下:
class dbviewxmladapter:""""""def __init__(self):self._version = "0.1"self._path = "e:\\Temp\\Work"self._files = []self._lines = []def setPath( self, path ):self._path = pathdef addFile( self, filename ):self._files.append( filename )def getNodeValue( self, element, tagName ):return element.getElementsByTagName( tagName )[0].firstChild.nodeValuedef getSubNodeValue( self, element, tagName ):subNode = element.getElementsByTagName( 'BizObjPropertyDBInfo' )[0]return subNode.getElementsByTagName( tagName )[0].firstChild.nodeValuedef parseXml( self ):import xml.dom.minidomtry:for file in self._files:filename = self._path + '\\' + fileprint filenamef = open( filename )doc = xml.dom.minidom.parse( f )viewEName = doc.getElementsByTagName('BizObject')[0].getElementsByTagName('EName')[0].firstChild.nodeValueviewCName = doc.getElementsByTagName('BizObject')[0].getElementsByTagName('CName')[0].firstChild.nodeValueline = viewEName + ', , , , , , ' + viewCNameself._lines.append( line )items = doc.getElementsByTagName( 'BizObjProperty' )for item in items:EName = self.getNodeValue( item, 'EName' )CName = self.getNodeValue( item, 'CName' )Description = self.getNodeValue( item, 'Description' )Type = self.getSubNodeValue( item, 'Type' )Length = self.getSubNodeValue( item, 'Length' )Size = self.getSubNodeValue( item, 'Size')IsPK = self.getSubNodeValue( item, 'IsPK' ) == '1'IsNullable = self.getSubNodeValue( item, 'IsNullable' ) == '1'line = EName + ',' + Type + ',' + Length + ',' + Size + ',' + str(IsPK) + ', ' + str(IsNullable) + ',' + CName + ':' + Descriptionself._lines.append( line )finally:print "over"def printLines( self ):for line in self._lines:print linedef writeToCSVFile( self, outfilename ):import codecsfilename = self._path + '\\' + outfilenamef = codecs.open( filename,'w','utf-8' )for line in self._lines:f.write( line + '\n' )f.flush()f.close()# TestSuite ScriptsaObject = dbviewxmladapter()for i in range(5):filename = str(i+1) + ".xml"aObject.addFile( filename )#aObject.addFile("5.xml")aObject.parseXml()#aObject.printLines()aObject.writeToCSVFile( "all.csv" )