In Python, comparing Python 2.0 XML with some rule expressions "style is usually not suitable for thorough syntax analysis and processing of XML. Python not only has a direct method to process complex data structures, there are also a series of XML-related modules that can help syntax analysis, processing, and generation of XML.
Members of the XML-SIG interest group) have done a lot of work to maintain a series of Python XML tools. Like other Python interest groups, XML-SIG maintains mail lists, list archives, useful references, documentation, standard packages, and other resources, see references later in this article ).
Since Python 2.0, Python has included most XML-SIG projects in its standard release. The latest XML-SIG package may contain "extremely advanced" features not available in some Python standard releases, but for the vast majority of people
-- Including the Python 2.0 XML support discussed in this article, which will interest you. Fortunately, the basic support for xmllib in earlier Python versions has been greatly improved in Python 2.0 +. Currently, Python users can select a proper option.
XML developers who use DOM, SAX, and expat technologies to process XML using other programming languages will be aware of this ). Xmllib is a non-verified low-level syntax analyzer. The xmllib used by application programmers can overwrite the XMLParser class and provide methods to process document elements, such as specific, class tags, or character entities.
The usage of xmllib has not changed since Python 1.5x to Python 2.0 +. In most cases, the better choice is to use the SAX technology, which is also a stream-oriented technology, it is more standard for languages and developers. The example in this article is the same as that in the original column: includes a file named quotations. dtd and the document sample of this DTD. for more information about xml, see references ).
The following code displays the first few lines of each section in sample. xml, and generates simple ASCII indicators for unknown tags and entities. The analyzed text is processed as a continuous stream. Any accumulators used are handled by the programmer, such as the string (# PCDATA) in the tag, or the list or dictionary of tags encountered ).
- classQuotationHandler(ContentHandler):
- """Crude extractor for quotations.dtd compliant XML document"""
- def__init__(self):
- self.in_quote = 0
- self.thisquote = ''
- defstartDocument(self):
- print '--- Begin Document ---'
- defstartElement(self, name, attrs):
- if name == 'quotation':
- print 'QUOTATION:'
- self.in_quote = 1
- else:
- selfself.thisquote = self.thisquote + '{'
- defendElement(self, name):
- if name == 'quotation':
- print string.join(string.split(self.thisquote[:230]))+'...',
- print '('+str(len(self.thisquote))+' bytes)'
- self.thisquote = ''
- self.in_quote = 0
- else:
The reason you may need to look forward to the future of standard XML support is that syntax analysis requires verification. Unfortunately, the standard Python 2.0 XML package does not include a validation syntax analyzer. Xmlproc is a python original syntax analyzer that performs almost complete verification. Xmlproc is currently the only option for verifying syntax analyzer in Python 2.0 XML. In addition, xmlproc provides a variety of advanced and test interfaces not available in other syntax analyzers.
You can directly import xml. parsers. expat. If you do this, you will be able to get some special tips that the SAX interface does not provide. In this way, xml. parsers. expat is somewhat "low-level" compared with SAX ". However, the SAX technology is very standard and stream-oriented;
In most cases, the level of SAX is appropriate. Generally, since the make_parser () function can obtain the performance provided by expat, the pure speed difference is very small. DOM can be used to modify XML documents because a DOM tree can be created.
Modify the tree by adding a new node and moving the subtree back and forth, and then generate a new XML document as the output. You can also construct a DOM tree by yourself and convert it to XML. In this way, the XML output ratio is only set to <tag1>... </tag1> more flexible file writing methods.
- How to embed Python into C ++ applications?
- In-depth discussion of Ruby and Python syntax comparison
- Introduction to Python
- Python Learning Experience: version, IDE selection and coding Solutions
- Analysis of Python GIL and thread security