Cute python: Relive Python's XML tools

Source: Internet
Author: User
Tags data structures in python

The first and second parts of the lovely python that David Mertz created outline the use of XML in Python. However, the XML tools in Python have evolved significantly since the first articles appeared. Unfortunately, most of these improvements are not backward compatible. In this particular section, we revisit the author's previous discussion of XML tools and provide the latest code examples.

In many cases, Python is the ideal language for using XML documents. Like Perl, REBOL, REXX, and TCL, it is a flexible scripting language with powerful text manipulation capabilities. Moreover, in addition to encoding most types of text files (or streaming files), XML documents encode a large number of complex data structures.

Continue support for XML in Python 2.0

The "read several rows and compare them to some rule expressions" style common in text processing is often not well suited for thorough parsing and processing of XML. Fortunately, Python (compared to most other languages) not only has direct methods for dealing with complex data structures (typically using classes and attributes), but also a series of XML-related modules that can help parse, process, and generate XML.

Members of the XML-SIG (Special Interest Group) have done a lot to maintain Python's series of XML tools. Like other Python special interest groups, XML-SIG maintains mailing lists, list files, useful reference books, documents, standard packages, and other resources (see resources later in this article).

Starting with Python 2.0, Python includes most xml-sig projects in its standard release. The latest Xml-sig package may contain some "extreme advanced" features not available in the Python standard release, but for most people--including the discussion in this article--Python 2.0 's XML support will be of interest to you. Fortunately, the basic support for xmllib in the early Python version has improved greatly under Python 2.0+. For now, Python users can normally choose DOM, SAX, and expat technology to process XML (XML developers who use other programming languages will be aware of this).

Module: Xmllib

Xmllib is a low-level parser that is not validated. The xmllib used by application programmers can override the Xmlparser class and provide methods for handling document elements such as specific or generic tags, or character entities. The use of Xmllib has not changed since Python 1.5x to Python 2.0+, and in most cases the better option is to use SAX technology, which is also a stream-oriented technology that is more standard for both languages and developers.

The examples in this article are the same as in the original column: a DTD called QUOTATIONS.DTD and a document Sample.xml for this DTD (see Resources for a file of the files mentioned in this article). The following code shows the first few lines of each quote in Sample.xml and produces a very simple unknown tag and an ASCII indicator of the entity. The parsed text is processed as a continuous stream, and any accumulator used is owned by the programmer (such as a string in the tag (#PCDATA), or a list or dictionary of the tags encountered.

List 1:try_xmllib.py

                   Import Xmllib, String class Quotationparser (Xmllib. Xmlparser): "" "Crude xmllib Extractor for QUOTATIONS.DTD document" "Def __ini T__ (self): Xmllib. Xmlparser.__init__ (self) self.thisquote = ' # quotation accumulator def Han
                    Dle_data (self, data): Self.thisquote = self.thisquote + Data def 
                Syntax_error (self, message): Pass Def start_quotations
                    (Self, attrs): # Top level tag print '---Begin Document---'
                 def start_quotation (self, attrs): Print
     ' Quotation: ' Def end_quotation           (self): print String.Join (String.Split (self.thisquote[:230])) + ' ... ',
                    print ' (' +str len (self.thisquote) + ' bytes ') \ n ' self.thisquote = ' def unknown_starttag (self, Tag, attrs): Self.thisquote = Self.thisquote + ' {' Def unknown_endtag (self, tag): Self.thisquote = SE Lf.thisquote + '} ' def unknown_charref (self, ref): SELF.T
                    Hisquote = Self.thisquote + '? '
                    def unknown_entityref (self, ref): Self.thisquote = self.thisquote + ' # '
                 if __name__ = = ' __main__ ': parser = Quotationparser () for C in open ("Sample.xml"). Read (): PARSER.FEed (c) parser.close () 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.