Full parsing of Python 2.0 XML Problems

Source: Internet
Author: User

In Python, comparing Python 2.0 XML with some rule expressions "style is usually not suitable for thorough syntax analysis and processing of XML. Python not only has a direct method to process complex data structures, there are also a series of XML-related modules that can help syntax analysis, processing, and generation of XML.

Members of the XML-SIG interest group) have done a lot of work to maintain a series of Python XML tools. Like other Python interest groups, XML-SIG maintains mail lists, list archives, useful references, documentation, standard packages, and other resources, see references later in this article ).

Since Python 2.0, Python has included most XML-SIG projects in its standard release. The latest XML-SIG package may contain "extremely advanced" features not available in some Python standard releases, but for the vast majority of people

-- Including the Python 2.0 XML support discussed in this article, which will interest you. Fortunately, the basic support for xmllib in earlier Python versions has been greatly improved in Python 2.0 +. Currently, Python users can select a proper option.

XML developers who use DOM, SAX, and expat technologies to process XML using other programming languages will be aware of this ). Xmllib is a non-verified low-level syntax analyzer. The xmllib used by application programmers can overwrite the XMLParser class and provide methods to process document elements, such as specific, class tags, or character entities.

The usage of xmllib has not changed since Python 1.5x to Python 2.0 +. In most cases, the better choice is to use the SAX technology, which is also a stream-oriented technology, it is more standard for languages and developers. The example in this article is the same as that in the original column: includes a file named quotations. dtd and the document sample of this DTD. for more information about xml, see references ).

The following code displays the first few lines of each section in sample. xml, and generates simple ASCII indicators for unknown tags and entities. The analyzed text is processed as a continuous stream. Any accumulators used are handled by the programmer, such as the string (# PCDATA) in the tag, or the list or dictionary of tags encountered ).

 
 
  1. classQuotationHandler(ContentHandler):  
  2.     """Crude extractor for quotations.dtd compliant XML document"""  
  3.     def__init__(self):  
  4.         self.in_quote = 0 
  5.         self.thisquote = '' 
  6.     defstartDocument(self):  
  7.         print '--- Begin Document ---'  
  8.     defstartElement(self, name, attrs):  
  9.         if name == 'quotation':  
  10.             print 'QUOTATION:'  
  11.             self.in_quote = 1 
  12.         else:  
  13.             selfself.thisquote = self.thisquote + '{'  
  14.     defendElement(self, name):  
  15.         if name == 'quotation':  
  16.             print string.join(string.split(self.thisquote[:230]))+'...',  
  17.             print '('+str(len(self.thisquote))+' bytes)'  
  18.             self.thisquote = '' 
  19.             self.in_quote = 0 
  20.         else: 

The reason you may need to look forward to the future of standard XML support is that syntax analysis requires verification. Unfortunately, the standard Python 2.0 XML package does not include a validation syntax analyzer. Xmlproc is a python original syntax analyzer that performs almost complete verification. Xmlproc is currently the only option for verifying syntax analyzer in Python 2.0 XML. In addition, xmlproc provides a variety of advanced and test interfaces not available in other syntax analyzers.

You can directly import xml. parsers. expat. If you do this, you will be able to get some special tips that the SAX interface does not provide. In this way, xml. parsers. expat is somewhat "low-level" compared with SAX ". However, the SAX technology is very standard and stream-oriented;

In most cases, the level of SAX is appropriate. Generally, since the make_parser () function can obtain the performance provided by expat, the pure speed difference is very small. DOM can be used to modify XML documents because a DOM tree can be created.

Modify the tree by adding a new node and moving the subtree back and forth, and then generate a new XML document as the output. You can also construct a DOM tree by yourself and convert it to XML. In this way, the XML output ratio is only set to <tag1>... </tag1> more flexible file writing methods.

  1. How to embed Python into C ++ applications?
  2. In-depth discussion of Ruby and Python syntax comparison
  3. Introduction to Python
  4. Python Learning Experience: version, IDE selection and coding Solutions
  5. Analysis of Python GIL and thread security

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.