XML parsing of Python
What is XML?
XML refers to the Extensible Markup Language (eXTensibleMArkupLAnguage ).
XML is designed to transmit and store data.
XML is a set of rules for defining semantic tags that divide documents into many parts and identify them.
It is also a meta-markup language, which defines the syntax language used to define other semantic and structured markup languages related to specific fields.
Parse XML in python
Common XML Programming interfaces include DOM and SAX. These two interfaces process XML files in different ways. of course, they are used in different scenarios.
Python has three methods to parse XML, SAX, DOM, and ElementTree:
1. SAX (simple API for XML)
The python Standard Library contains the SAX Parser. with the event-driven model, SAX triggers events one by one during XML parsing and calls user-defined callback functions to process XML files.
2. DOM (Document Object Model)
Parse XML data into a tree in the memory and operate the XML in the tree.
3. ElementTree (element tree)
ElementTree is like a lightweight DOM with convenient and friendly APIs. Good code availability, fast speed, and low memory consumption.
Note:Because DOM needs to map XML data to trees in the memory, the first is relatively slow, and the second is relatively memory-consuming, while the SAX stream reads XML files, which is faster and consumes less memory, however, you need to implement the callback function (handler ).
The movies. XML file used in this section is as follows:
War, Thriller
DVD
2003
PG
10
Talk about a US-Japan war
Anime, Science Fiction
DVD
1989
R
8
A schientific fiction
Anime, Action
DVD
4
PG
10
Vash the Stampede!
Comedy
VHS
PG
2
Viewable boredom
Python parses xml using SAX
SAX is an event-driven API.
Parsing XML documents using SAX involves two parts: the parser and the event processor.
The parser reads XML documents and sends events to the event processor, such as the element start and Element end events;
The event processor is responsible for responding to the event and processing the transmitted XML data.
1. process large files;
2. you only need part of the file, or you only need to get specific information from the file.
3. when you want to create your own object model.
In python, to use the sax method to process xml, you must first introduce the parse Function in xml. sax, and the ContentHandler in xml. sax. handler.
ContentHandler class method introduction
Characters (content) method
Call time:
Starting from the row, there are characters before the tag is met, and the value of content is these strings.
From a tag, before the next tag is met, there are characters whose content value is these strings.
There are characters before a line Terminator from a tag and the value of content is these strings.
A tag can be a start tag or an end tag.
StartDocument () method
This document is called at startup.
EndDocument () method
The parser is called when it reaches the end of the document.
StartElement (name, attrs) method
When a tag is started in XML, the name is the tag name, and the attrs is the attribute value Dictionary of the tag.
EndElement (name) method
It is called when an XML end tag is encountered.
Make_parser method
The following method creates a new parser object and returns it.
xml.sax.make_parser( [parser_list] )
Parameter description:
Parser method
Follow these steps to create a SAX parser and parse the xml document:
xml.sax.parse( xmlfile, contenthandler[, errorhandler])
Parameter description:
Xmlfile-Xml file name
Contenthandler-It must be a ContentHandler object.
Errorhandler-If this parameter is specified, errorhandler must be a SAX ErrorHandler object.
ParseString method
The parseString method creates an XML parser and parses the xml string:
xml.sax.parseString(xmlstring, contenthandler[, errorhandler])
Parameter description:
Xmlstring-Xml string
Contenthandler-It must be a ContentHandler object.
Errorhandler-If this parameter is specified, errorhandler must be a SAX ErrorHandler object.
Python parsing XML instance
#! /Usr/bin/python #-*-coding: UTF-8-*-import xml. saxclass MovieHandler (xml. sax. contentHandler): def init (self): self. currentData = "" self. type = "" self. format = "" self. year = "" self. rating = "" self. stars = "" self. description = "" # element start event processing def startElement (self, tag, attributes): self. currentData = tag if tag = "movie": print "****** Movie *****" title = attributes ["title"] print "Title :", title # def endElement (self, tag): if self. currentData = "type": print "Type:", self. type elif self. currentData = "format": print "Format:", self. format elif self. currentData = "year": print "Year:", self. year elif self. currentData = "rating": print "Rating:", self. rating elif self. currentData = "stars": print "Stars:", self. stars elif self. currentData = "description": print "Description:", self. description self. currentData = "" # content Event Processing def characters (self, content): if self. currentData = "type": self. type = content elif self. currentData = "format": self. format = content elif self. currentData = "year": self. year = content elif self. currentData = "rating": self. rating = content elif self. currentData = "stars": self. stars = content elif self. currentData = "description": self. description = content if (name = "main"): # Create an XMLReader parser = xml. sax. make_parser () # turn off namepsaces parser. setFeature (xml. sax. handler. feature_namespaces, 0) # override ContextHandler Handler = MovieHandler () parser. setContentHandler (Handler) parser. parse ("movies. xml ")
The code execution result is as follows:
*****Movie*****Title: Enemy BehindType: War, ThrillerFormat: DVDYear: 2003Rating: PGStars: 10Description: Talk about a US-Japan war*****Movie*****Title: TransformersType: Anime, Science FictionFormat: DVDYear: 1989Rating: RStars: 8Description: A schientific fiction*****Movie*****Title: TrigunType: Anime, ActionFormat: DVDRating: PGStars: 10Description: Vash the Stampede!*****Movie*****Title: IshtarType: ComedyFormat: VHSRating: PGStars: 2Description: Viewable boredom
Use xml. dom to parse xml
The Document Object Model (DOM) is a standard programming interface recommended by W3C for processing extensible slogans.
When a DOM parser parses an XML document, it reads the entire document at one time and stores all the elements in the document in a tree structure in the memory, then you can use different functions provided by DOM to read or modify the content and structure of the document, or write the modified content into the xml file.
In python, xml. dom. minidom is used to parse xml files. The example is as follows:
#! /Usr/bin/python #-*-coding: UTF-8-*-from xml. dom. minidom import parseimport xml. dom. minidom # Use the minidom parser to open the XML document DOMTree = xml. dom. minidom. parse ("movies. xml ") collection = DOMTree.doc umentElementif collection. hasAttribute ("shelf"): print "Root element: % s" % collection. getAttribute ("shelf") # get all movies in the collection movies = collection. getElementsByTagName ("movie") # print the details of each movie for Movie in movies: print "****** movie *****" if movie. hasAttribute ("title"): print "Title: % s" % movie. getAttribute ("title") type = movie. getElementsByTagName ('type') [0] print "type: % s" % Type. childNodes [0]. data format = movie. getElementsByTagName ('format') [0] print "format: % s" % Format. childNodes [0]. data rating = movie. getElementsByTagName ('rating') [0] print "rating: % s" % Rating. childNodes [0]. data description = movie. getElementsByTagName ('Description') [0] print "description: % s" % Description. childNodes [0]. data
The execution result of the above program is as follows:
Root element : New Arrivals*****Movie*****Title: Enemy BehindType: War, ThrillerFormat: DVDRating: PGDescription: Talk about a US-Japan war*****Movie*****Title: TransformersType: Anime, Science FictionFormat: DVDRating: RDescription: A schientific fiction*****Movie*****Title: TrigunType: Anime, ActionFormat: DVDRating: PGDescription: Vash the Stampede!*****Movie*****Title: IshtarType: ComedyFormat: VHSRating: PGDescription: Viewable boredom
The above is the detailed explanation of XML parsing in Python. For more information, see other related articles on php Chinese network!