Python has three ways to parse Xml,sax,dom, and ElementTree
# # #1. SAX (Simple API for XML)
The Pyhton standard library contains sax parsers, which are a typical, extremely fast tool that does not consume a lot of memory when parsing XML.
However, this is based on a callback mechanism, so in some data it calls some methods for delivery. This means that you must specify a handle for the data,
In order to maintain their own state, it is very difficult.
# # #2. DOM (Document Object Model)
Compared to sax, the typical disadvantage of DOM is that it is slower and consumes more memory because the DOM reads the entire XML number into memory and the tree
Establishes an object in the first node in the The advantage of using the DOM is that you don't need to track the status, because every node knows who it is.
The parent node, who is the child node. But Dom is a little tricky to use.
# # #3. ElementTree (element Tree)
ElementTree is like a lightweight dom, with a convenient and friendly API. Code availability is good, fast, memory consumption is low, here the main
Introducing ElementTree.
A basic knowledge
1. Inserting nodes
Element.insert (index, Element), element(tag[, attrib][, **extra]) , subelement ( Parent, tag[, attrib[, **extra]), Element.append (subelement)
2. Deleting nodes
Element.remove (subelement) deletes a node, element.clear () deletes all child nodes under that node
3. Inserting attributes into a node
Element.set (key, value)
4. Find nodes
a) Element.getiterator b) Element.getchildren c) element.find D) element.findall
#!/usr/bin/python#-*-encoding:utf-8-*-import osimport sysimport os.pathimport xml.etree.ElementTree as Etdef read_xml (XMLFile, DestDir): # print ' = =----------------= ', Et.parse (xmlfile) # Load the XML file (2 methods, one is to load the specified string, and the second is to load the specified file) c:\xml\1.xml "' <?xml version=" 1.0 "?><root> <file_directory name=" ca002 "> <file_directory name=" RT_C A "><file_name name=" 0000.obj "><coff_file_head begin=" 0 "end=" > <machine>x86</machine > <NumberOfSections>2</NumberOfSections> <pointertosymboltable>21205</ Pointertosymboltable> <NumberOfSymbols>107</NumberOfSymbols> <sizeofoptionalheader>0</ Sizeofoptionalheader> <Characteristics>0</Characteristics> </coff_file_head><coff_image_ Sections><coff_image_section index= "0" > <Name>.rdata</Name> <sizeofrawdata>5064</ Sizeofrawdata> <PointerToRawData>100</PointerToRawData> <pointertoreLocations>0</pointertorelocations> <PointerToLinenumbers>0</PointerToLinenumbers> < Numberofrelocations>0</numberofrelocations> <NumberOfLinenumbers>0</NumberOfLinenumbers> </COFF_IMAGE_SECTION></COFF_IMAGE_SECTIONS></FILE_NAME> </FILE_DIRECTORY> </file_ Directory></root>:p Aram XMLFile::p Aram DestDir:: return: "tree = Et.parse (xmlfile) root = Tree.getroot () # root = et.fromstring (xmlcontent) dir1_nodes = Root.getchildren () # Create dir1 for Dir 1_node in Dir1_nodes:dir1 = DestDir + os.path.sep + dir1_node.attrib[' NAME '] # print Dir1 if Os.path . exists (dir1) = = False:os.mkdir (dir1) # create Dir2 dir2_nodes = Dir1_node.getchildren () For dir2_node in Dir2_nodes:dir2 = dir1 + os.path.sep + dir2_node.attrib[' NAME '] if Os.path.exi STS (DIR2) = = False:os.mkdir (DIR2) # Create File Dir3_nodes = Dir2_node.getchildren () for Dir3_node in Dir3_nodes: Dir3 = Dir2 + os.path.sep + dir3_node.attrib[' NAME ' # parameter W creates a new or overwritten file, F = open (Dir3, ' W ') # Traverse XML tag Name=***.obj prelen = 0 dir4_nodes = Dir3_node.getchildren () For Dir4_node in Dir4_nodes:traversal (Dir4_node, F, Prelen) F.close () def Travers Al (node, F, Prelen): "" recursively traversal the rest of the XML ' s content ' length = Node.getchildren () Attrs = ' texts = ' If Len (node.attrib) > 0:for key in Node.attrib:attrs + = str (key) + ":" + str (no De.attrib[key]) + "attrs = attrs[:-1] F.write ('-' * prelen + node.tag + ' (' + attrs + ') ') Else: F.write ('-' * prelen + node.tag) if node.text! = None:f.write (': ' + node.text) f.write (' \ n ') if length ! = 0:nodes = nodE.getchildren () Prelen + = 4 for Node1 in Nodes:traversal (Node1, F, Prelen) def parsexmls (FilePath , DestDir): "Traversal xmls directory" if Os.path.isfile (FilePath) and Os.path.basename (FilePath). EndsWith ('. x ML '): # print ' filepath=== ', FilePath read_xml (FilePath, DestDir) else:for item in Os.listdir (file Path): Print Item subpath = FilePath + os.path.sep + Item parsexmls (subpath, DestDir) def Main (): "Main function." # input XML dir while True:dir = raw_input ("Input the Dir:") If not os.path.exists (dir): Prin T ("You input dir was not existed!") Continue Else:break # Create the dir of Dest path that using to store the parsing xmls ' "' DestDir = Os.path.split (dir) [0]+os.sep+time.strftime ('%y%m%d ') if not os.path.exists (DestDir): OS.MKD IR (destDir) "DestDir = Os.path.split (dir) [0] + os.path.sep + OS. Path.basename (dir) + ' xml ' if os.path.exists (destDir) = = False:os.mkdir (destDir) # recall the function Of parse the xmls parsexmls (dir, destDir) if __name__ = = ' __main__ ': Main ()
The xml.etree.ElementTree of the Python module