Python parsing of XML
The common XML programming interface has DOM and sax, and the two interfaces handle XML files in different ways, of course, using different scenarios.
Python has three methods for parsing XML, namely Sax,dom, and ElementTree three methods.
The following cases describe three methods in turn:
Write an XML file about book first
<Books> < BookID= "On"> <BookName>Getting Started with Python</BookName> <author>Li Qiang</author> < Price>25</ Price> </ Book> < BookID= "The "> <BookName>Java Basics</BookName> <author>Wang Yang</author> < Price>30</ Price> </ Book> < BookID= "OK"> <BookName>The Statue of God</BookName> <author>Jin yong 's</author> < Price>212</ Price> </ Book></Books>
1.DOM (Document Object Model)
Parses XML data into a tree in memory, manipulating the XML by manipulating the tree.
The operation code is as follows:
#introducing the Parse package fromXml.dom.minidomImportParsedoc=parse ("Book.xml")#first load the XML file inRoot=doc.documentelement#gets the root node of the elementBooks=root.getelementsbytagname (' Book')#Find the child node and get an array forBookinchBooks#To traverse all the child nodes Print("===book====") ifBook.hasattribute ('ID'):#If there is an id attribute, the output Print('the ID of the book is:%s'% Book.getattribute ('ID')) BookName=book.getelementsbytagname ("BookName") [0]#found by tag name and output first element Print("The title is:%s"%bookname.childnodes[0].data)#outputs the first value of the child node of the label name and goes to the data typeAuthor=book.getelementsbytagname ("author") [0]Print("author is:%s"%author.childnodes[0].data) Price=book.getelementsbytagname (" Price") [0]Print("Price is:%s"%price.childnodes[0].data)
2.SAX (Simple API for XML)
The Python standard library contains the SAX parser, which uses the event-driven model to process XML files by triggering events and invoking user-defined callback functions during parsing of XML.
fromXml.saxImportParse, ContentHandler#introducing inherit package ContentHandler#the class of the bookclassBook :#define initialization properties, same as XML file attributes def __init__(self,bookname=none,author=none,price=None): Self.bookname=bookname Self.author=author Self.price= Pricedef __str__(self):#Convert to string output returnself.bookname+","+self.author+","+Self.pricebooks=[]#define an array of books to hold every data you get#defines the class that inherits ContentHandler, and can implement the corresponding methodclassBkdemo (ContentHandler):def __init__(self):#Defining global VariablesSelf.book=none#the corresponding data used to receive the bookSelf.tag=none#to receive content from the characters method defStartdocument (self):#Books Object Start Print("Object Start") defEnddocument (self):#Books Object End Print("end of Object") defStartelement (self, Name, attrs):#the start of each tag element, Name: Tag name attrs: Tag internal corresponding property ifname==' Book':#if the label signature is bookSelf.book=book ()#Create a book () object defEndElement (self, name):#the end of each tag element, name: Tag name (this will get the content) ifname=='BookName': Self.book.bookname=self.tag#the label name of the object = the value of the corresponding content ifname=='author': Self.book.author=Self.tagifname==' Price': Self.book.price=Self.tagifname==' Book': Books.append (Self.book)#appends the resulting element to the defined array defcharacters (self, content): Self.tag=content#If you write self, you can define it as a global variable .Parse ("Book.xml", Bkdemo ())#Parse method, which indicates the XML file separately, and calls the Lookup class method forIinchBooks#Logarithmic group books[] Loop Print(i)
3.ElementTree (element Tree)
ElementTree is like a lightweight dom, with a convenient and friendly API. Good code availability, fast speed, low memory consumption.
fromXml.etreeImportElementTree#introduction of ElementTree packages#the class of the bookclassBook :#define initialization properties, same as XML file attributes def __init__(self,bookname=none,author=none,price=None): Self.bookname=bookname Self.author=author Self.price= Pricedef __str__(self):#Convert to string output returnself.bookname+","+self.author+","+Self.priceroota=elementtree.parse ("Book.xml")#Parse method reads an XML file to get the element treeBk=roota.findall (" Book")#findall Query all book tagsBoo=[]#Define a collection forAainchBk:#to get all the sub tags under the root element loop outputBook=book ()#defining a Class objectBook.bookname=aa.find ("BookName"). Text#the corresponding label value of the object = The name of the fixed label found by the child label, and output in text formBook.author=aa.find ("author"). Text Book.price=aa.find (" Price"). Text Boo.append (book)#appends the resulting property value to the defined collection forIinchBoo#iterating through the collection Print(i)
Output Result:
Getting started with Python, Li Qiang,Java Foundation, Wang Yang, statue of God, Jin Yong,212Process finished with exit code 0
Note: because the DOM needs to map XML data to a tree in memory, one is slower, the other is less memory, and Sax streams the XML file faster and consumes less memory, but requires the user to implement the callback function (handler).
The XML instance file used in this section movies.xml the following:
XML file parsing of Python