python parsing of XML
The common XML programming interface has DOM and sax, and the two interfaces handle XML files in different ways, of course, using different scenarios.
Python has three ways to parse Xml,sax,dom, and ElementTree:
1.SAX (Simple API for XML)
The Pyhton standard library contains the SAX parser, which uses the event-driven model to process XML files by triggering events and invoking user-defined callback functions during parsing of XML.
2.DOM (Document Object Model)
Parses XML data into a tree in memory, manipulating the XML by manipulating the tree.
The file object model, or DOM, is the standard programming interface recommended by the Organization for the processing of extensible superscript languages.
A DOM parser parses an XML document, reads the entire document at once, stores all the elements of the document in a tree structure in memory, and then you can use the different functions provided by the DOM to read or modify the contents and structure of the document, or to write the modified content to an XML file.
3.ElementTree (element tree)
ElementTree is like a lightweight dom, with a convenient and friendly API. Good code availability, fast speed, low memory consumption.
1. Loading an XML file
There are 2 ways to load an XML file, one to load the specified string, and two to load the specified file
2. How to obtain the element
A) through Getiterator
b) Over GetChildren
C) Find method
D) FindAll method
Note: because the DOM needs to map XML data to a tree in memory, one is slower, the other is less memory, and Sax streams the XML file faster and consumes less memory, but requires the user to implement the callback function (handler).
# coding: utf-8# filename: testxml.py# Method A test parse XML file, using Xml.dom mode import Xml.dom.minidomxmlfile = r ' E:\Program\python\testxml.xml ' def testxmlusedom (): domtree = xml.dom.minidom.parse (xmlfile) # Get root node root = DOMTree.documentElement # Get Properties if root.hasattribute ("shelf"): print "root element: %s" % root.getattribute ("shelf") # Gets the child node movies = root.getelementsbytagname ("movie") for move in movies: print "*****movie******" if move.hasattribute ("Title"): print "movie : %s", move.getattribute ("Title") Text of the # node, data t = move.getelementsbytagname ("type") [0] print "type: %s" % t.childNodes[0].data f = Move.getelementsbytagname ("format") [0] print "format: %s " % f.childnodes[0].data testxmlusedom () # Method Two, using Elementtreefrom xml.etree import elementtree as etdef print_node ( node): ' Print node ' print ' ========================== ' # return property is a dictionary format print "node.attrib: %s" % node.attrib if Node.attrib.has_key ("title"): print "node.attrib[' title ']: %s " % node.attrib[' title '] # tag node name print "node.tag:%s" % node.tag # text node content print "node.text:%s" % node.textdef testxmlet (): print " Testxmlet " # load file root = et.parse (xmlfile) # Load from Text # root = et.fromstring (text) # Traversing child nodes lst_node = root.getiterator ("movie") for node in lst_node: print_node (node) # get the first node movie print "First movie:" # Get all child nodes getchildren lst_node_child = lst_node[0]. GetChildren () [0] print_node (Lst_node_child) print "find movie: " node_find = root.find (" movie ") print_ Node (node_find) # findall, find all node_findall = Root.findall ("movie") [1] print_node (Node_findall) # Find node_findone = root.find directly using the path ("Movie/type") print_ Node (node_findone) # Direct use path lookup node_findall2 = Root.findall ("Movie/type") for node in node_findall2: print_Node (node_findone) testxmlet ()
Python parsing xml file