XML parsing of Python

Source: Internet
Author: User
XML parsing of Python



What is XML?


XML refers to the Extensible Markup Language (eXTensibleMArkupLAnguage ).

XML is designed to transmit and store data.

XML is a set of rules for defining semantic tags that divide documents into many parts and identify them.

It is also a meta-markup language, which defines the syntax language used to define other semantic and structured markup languages related to specific fields.

Parse XML in python

Common XML Programming interfaces include DOM and SAX. These two interfaces process XML files in different ways. of course, they are used in different scenarios.

Python has three methods to parse XML, SAX, DOM, and ElementTree:

1. SAX (simple API for XML)

The python Standard Library contains the SAX Parser. with the event-driven model, SAX triggers events one by one during XML parsing and calls user-defined callback functions to process XML files.

2. DOM (Document Object Model)

Parse XML data into a tree in the memory and operate the XML in the tree.

3. ElementTree (element tree)

ElementTree is like a lightweight DOM with convenient and friendly APIs. Good code availability, fast speed, and low memory consumption.

Note:Because DOM needs to map XML data to trees in the memory, the first is relatively slow, and the second is relatively memory-consuming, while the SAX stream reads XML files, which is faster and consumes less memory, however, you need to implement the callback function (handler ).

The movies. XML file used in this section is as follows:

 
     
   
    War, Thriller
      
   
    DVD
      
   
    2003
      
   
    PG
      
   
    10
      
   
    Talk about a US-Japan war
   
  
     
   
    Anime, Science Fiction
      
   
    DVD
      
   
    1989
      
   
    R
      
   
    8
      
   
    A schientific fiction
   
     
     
   
    Anime, Action
      
   
    DVD
      
   
    4
      
   
    PG
      
   
    10
      
   
    Vash the Stampede!
   
  
     
   
    Comedy
      
   
    VHS
      
   
    PG
      
   
    2
      
   
    Viewable boredom
   
  
 
Python parses xml using SAX

SAX is an event-driven API.

Parsing XML documents using SAX involves two parts: the parser and the event processor.

The parser reads XML documents and sends events to the event processor, such as the element start and Element end events;

The event processor is responsible for responding to the event and processing the transmitted XML data.

  • 1. process large files;

  • 2. you only need part of the file, or you only need to get specific information from the file.

  • 3. when you want to create your own object model.

In python, to use the sax method to process xml, you must first introduce the parse Function in xml. sax, and the ContentHandler in xml. sax. handler.

ContentHandler class method introduction

Characters (content) method

Call time:

Starting from the row, there are characters before the tag is met, and the value of content is these strings.

From a tag, before the next tag is met, there are characters whose content value is these strings.

There are characters before a line Terminator from a tag and the value of content is these strings.

A tag can be a start tag or an end tag.

StartDocument () method

This document is called at startup.

EndDocument () method

The parser is called when it reaches the end of the document.

StartElement (name, attrs) method

When a tag is started in XML, the name is the tag name, and the attrs is the attribute value Dictionary of the tag.

EndElement (name) method

It is called when an XML end tag is encountered.

Make_parser method

The following method creates a new parser object and returns it.

xml.sax.make_parser( [parser_list] )

Parameter description:

  • Parser_list-Optional parameter, parser list

Parser method

Follow these steps to create a SAX parser and parse the xml document:

xml.sax.parse( xmlfile, contenthandler[, errorhandler])

Parameter description:

  • Xmlfile-Xml file name

  • Contenthandler-It must be a ContentHandler object.

  • Errorhandler-If this parameter is specified, errorhandler must be a SAX ErrorHandler object.

ParseString method

The parseString method creates an XML parser and parses the xml string:

xml.sax.parseString(xmlstring, contenthandler[, errorhandler])

Parameter description:

  • Xmlstring-Xml string

  • Contenthandler-It must be a ContentHandler object.

  • Errorhandler-If this parameter is specified, errorhandler must be a SAX ErrorHandler object.

Python parsing XML instance
#! /Usr/bin/python #-*-coding: UTF-8-*-import xml. saxclass MovieHandler (xml. sax. contentHandler): def init (self): self. currentData = "" self. type = "" self. format = "" self. year = "" self. rating = "" self. stars = "" self. description = "" # element start event processing def startElement (self, tag, attributes): self. currentData = tag if tag = "movie": print "****** Movie *****" title = attributes ["title"] print "Title :", title # def endElement (self, tag): if self. currentData = "type": print "Type:", self. type elif self. currentData = "format": print "Format:", self. format elif self. currentData = "year": print "Year:", self. year elif self. currentData = "rating": print "Rating:", self. rating elif self. currentData = "stars": print "Stars:", self. stars elif self. currentData = "description": print "Description:", self. description self. currentData = "" # content Event Processing def characters (self, content): if self. currentData = "type": self. type = content elif self. currentData = "format": self. format = content elif self. currentData = "year": self. year = content elif self. currentData = "rating": self. rating = content elif self. currentData = "stars": self. stars = content elif self. currentData = "description": self. description = content if (name = "main"): # Create an XMLReader parser = xml. sax. make_parser () # turn off namepsaces parser. setFeature (xml. sax. handler. feature_namespaces, 0) # override ContextHandler Handler = MovieHandler () parser. setContentHandler (Handler) parser. parse ("movies. xml ")

The code execution result is as follows:

*****Movie*****Title: Enemy BehindType: War, ThrillerFormat: DVDYear: 2003Rating: PGStars: 10Description: Talk about a US-Japan war*****Movie*****Title: TransformersType: Anime, Science FictionFormat: DVDYear: 1989Rating: RStars: 8Description: A schientific fiction*****Movie*****Title: TrigunType: Anime, ActionFormat: DVDRating: PGStars: 10Description: Vash the Stampede!*****Movie*****Title: IshtarType: ComedyFormat: VHSRating: PGStars: 2Description: Viewable boredom
Use xml. dom to parse xml

The Document Object Model (DOM) is a standard programming interface recommended by W3C for processing extensible slogans.

When a DOM parser parses an XML document, it reads the entire document at one time and stores all the elements in the document in a tree structure in the memory, then you can use different functions provided by DOM to read or modify the content and structure of the document, or write the modified content into the xml file.

In python, xml. dom. minidom is used to parse xml files. The example is as follows:

#! /Usr/bin/python #-*-coding: UTF-8-*-from xml. dom. minidom import parseimport xml. dom. minidom # Use the minidom parser to open the XML document DOMTree = xml. dom. minidom. parse ("movies. xml ") collection = DOMTree.doc umentElementif collection. hasAttribute ("shelf"): print "Root element: % s" % collection. getAttribute ("shelf") # get all movies in the collection movies = collection. getElementsByTagName ("movie") # print the details of each movie for Movie in movies: print "****** movie *****" if movie. hasAttribute ("title"): print "Title: % s" % movie. getAttribute ("title") type = movie. getElementsByTagName ('type') [0] print "type: % s" % Type. childNodes [0]. data format = movie. getElementsByTagName ('format') [0] print "format: % s" % Format. childNodes [0]. data rating = movie. getElementsByTagName ('rating') [0] print "rating: % s" % Rating. childNodes [0]. data description = movie. getElementsByTagName ('Description') [0] print "description: % s" % Description. childNodes [0]. data

The execution result of the above program is as follows:

Root element : New Arrivals*****Movie*****Title: Enemy BehindType: War, ThrillerFormat: DVDRating: PGDescription: Talk about a US-Japan war*****Movie*****Title: TransformersType: Anime, Science FictionFormat: DVDRating: RDescription: A schientific fiction*****Movie*****Title: TrigunType: Anime, ActionFormat: DVDRating: PGDescription: Vash the Stampede!*****Movie*****Title: IshtarType: ComedyFormat: VHSRating: PGDescription: Viewable boredom

The above is the detailed explanation of XML parsing in Python. For more information, see other related articles on php Chinese network!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.