Method instance analysis of accessing XML using Python

Source: Internet
Author: User
Tags tagname xml attribute
This article mainly introduced the Python accesses the XML the common method, unifies the concrete instance form to analyze in detail the python accesses the XML the common method, the advantage and disadvantage comparison and the related attention matter, the need friend can refer to the next

The examples in this article describe the common ways in which Python accesses XML. Share to everyone for your reference, as follows:

For now, Python 3.2 accesses XML in the following four ways:

1.Expat
2.DOM
3.SAX
4.ElementTree

Use the following XML as a basis for discussion

<?xml version= "1.0" encoding= "Utf-8"?><schools> <school name= "Xidian" > <class Id= "030612" > <student name= "Salomon" > <Scores> <Math>98</Math> <English>85< /english> <physics>89</physics> </Scores> </Student> <student Name          = "Jupiter" > <Scores> <Math>74</Math> <English>83</English> <physics>69</physics> </Scores> </Student> </Class> <class id= "0306 "> <student name=" Venus "> <Scores> <Math>98</Math> <english& gt;85</english> <physics>89</physics> </Scores> </Student> <stu           Dent name= "Mars" > <Scores> <Math>74</Math> <English>83</English> <physics>69</physics> </Scores> </Student> </Class> </School></Schools> 

Expat

Expat is a stream-oriented parser. You register the parser callback (or handler) feature, and then start searching for its document. When the parser recognizes the specified location of the file, it invokes the appropriate handler for that part (if you have already registered one). The file is sent to the parser, which is segmented into multiple fragments and loaded into memory. So expat can parse those huge files.

Sax

Sax is a parser API for sequential access to XML, and a parser that implements sax (that is, "Sax Parser") takes the form of a stream parser and has an event-driven API. The callback function is defined by the user, and is invoked when an event occurs. Events are raised when any of the XML attributes are encountered, and are raised again when they are encountered at the end. The XML attribute is also used as part of the event data passed to the element. SAX processing is single-directional, and parsed data cannot be read again without restarting.

Dom

The DOM parser must put the entire tree in memory before any processing begins, so the memory usage of the DOM parser is based entirely on the size of the input data (relative to the memory content of the SAX parser, is based on the maximum depth of the XML file (the maximum depth of the XML tree) and the maximum data stored on the XML attribute on a single XML item).

Dom is implemented in python3.2 in two ways:

1.xml.minidom is a basic implementation.
2.xml.pulldom constructs the subtree that is accessed only when needed.

"' Created on 2012-5-25@author:salomon '" Import xml.dom.minidom as Minidomdom = Minidom.parse ("e:\\test.xml") root = Dom.getelementsbytagname ("schools") #The function getElementsByTagName returns Nodelist.print (Root.length) for node in Root:print ("Root element is%s.  "%node.tagname" # formatted output, which differs greatly from the C-series language.    Schools = Node.getelementsbytagname ("School") for School in Schools:print (school.nodename) print (school.tagname) Print (School.getattribute ("name")) print (school.attributes["name"].value) classes = School.getelementsbytagname ("C Lass ") Print (" There is%d classes in school%s "% (Classes.length, School.getattribute (" Name "))) the for MClass in class Es:print (Mclass.getattribute ("Id"))) for student in Mclass.getelementsbytagname ("Student"): Print (student        . attributes["Name"].value) print (Student.getelementsbytagname ("中文版") [0].nodevalue] #这个为什么啊? Print (Student.getelementsbytagname ("中文版") [0].childnodes[0].nodevalue] Student.geteleMentsbytagname ("中文版") [0].childnodes[0].nodevalue = 75f = open (' New.xml ', ' w ', encoding = ' utf-8 ') Dom.writexml (F, encoding = ' utf-8 ') f.close ()

ElementTree

At present, the ElementTree information is less, and the working mechanism is not known at present. There is data to show that ElementTree is near a lightweight dom, but elementtree all Element nodes work in the same way. It is similar to the XPathNavigator in C #.

"Created on 2012-5-25@author:salomon" from xml.etree.ElementTree import Elementtreetree = ElementTree () tree.parse ( "E:\\test.xml") root = Tree.getroot () print (Root.tag) print (root[0].tag) print (root[0].attrib) schools = Root.getchildren () for school in schools:  print (School.get ("Name"))  classes = School.findall ("Class")  For MClass in classes: Print (    mclass.items ())    print (Mclass.keys ())    print (mclass.attrib["Id"])    Math = Mclass.find ("Student"). Find ("Scores"). Find ("Math")    print (Math.text)    math.set ("Teacher", "Bada") Tree.write ("New.xml")

Compare:

For the above points, expat and sax parse XML the same way, just don't know how performance compares. Dom consumes memory relative to both of these parsers, and processing files is relatively slow due to time-consuming access. Dom is not available if the file is too large to load into memory, but the DOM is the only option for some kind of XML validation that requires access to the entire file, or if some XML processing requires only the need to access the entire file.

Note:

It is important to point out that the techniques for accessing XML are not unique to Python, but that Python is also introduced by reference to other languages or directly from other languages. For example, expat is a development library developed in C that is used to parse XML documents. While Sax was originally developed by Davidmegginson in the Java language, the DOM can access and modify the content and structure of a document in a platform-independent and language way. can be applied to any programming language.

As a comparison, I would also like to enumerate the ways in which C # accesses XML documents:

1. Dom-based XmlDocument
2. XmlReader and XmlWriter based on stream files (unlike Sax stream file implementations, Sax is an event-driven model).
3. Linq to Xml

Stream files Two models: Xmlreader/xmlwriter VS SAX

Each time the flow model iterates over a node in an XML document, it is suitable for processing large documents with little memory space. There are two variants in the flow model-the "push" model and the "pull" model.

Push model is often said Sax,sax is an event-driven model, that is to say: Every node it discovers a push model to trigger an event, and we have to write the handlers of these events, which is very inflexible and cumbersome.

. NET is based on the implementation of the "pull" model, the "pull" model when traversing the document will be interested in the document part from the reader, do not need to raise events, allowing us to programmatically access the document, which greatly improves the flexibility, the performance of the "pull" model can be selective processing node, While Sax notifies the client of every node it discovers, using a "pull" model can improve the overall efficiency of the application.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.