Details about the dynamics of DOM methods in Python and the dynamics of pythondom

Last Update:2015-04-13 Source: Internet

Author: User

Tags python list

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Details about the dynamics of DOM methods in Python and the dynamics of pythondom

Document Object Model

The xml. dom module may be the most powerful tool for Python programmers when using XML documents. Unfortunately, there are currently few documents provided by the XML-SIG. The W3C language-independent DOM specification fills in this gap. But it is better for Python programmers to have a quick start guide for DOM specific to the Python language. This article aims to provide such a guide. In the previous column, some samples use the sample quotations. dtd file, which can be used together with the code sample file in this article.

It is necessary to understand the exact meaning of DOM. In this regard, the formal explanation is very good:

The Document Object Model is a platform-independent and language-independent interface that allows programs and scripts to dynamically access and update document content, structures, and styles. You can further process the document, and the results can also be merged into the displayed page. (DOM workgroup of the World Wide Web Alliance)

DOM converts an XML document to a tree-or forest-representation. The W3C specification provides an example of the DOM version of an HTML table.

As shown in, DOM defines a set of methods that can traverse, trim, reorganize, output, and operate trees from a more abstract perspective, this method is more convenient than linear representation of XML documents.

Convert HTML to XML

Valid HTML is almost effective XML, but not exactly the same. There are two major differences: XML tags are case-sensitive, and all XML tags require an explicit ending symbol (as the ending mark, this is optional for some HTML tags; for example, ). A simple example of using xml. dom is to use the HtmlBuilder () class to convert HTML into XML.
Try_dom1.py

"""Convert a valid HTML document to XML  USAGE: python try_dom1.py < infile.html > outfile.xml"""        import         sys        from         xml.dom         import         core        from         xml.dom.html_builder         import         HtmlBuilder        # Construct an HtmlBuilder object and feed the data to itb = HtmlBuilder()b.feed(sys.stdin.read())        # Get the newly-constructed document objectdoc = b.document        # Output it as XML        print         doc.toxml()

The HtmlBuilder () class can easily implement the functions of some basic xml. dom. builder templates inherited by it. Its source code is worth studying. However, even if we implement the template function, the outline of the DOM program is similar. In general, we will use some methods to build a DOM instance and then operate on the instance. The. toxml () method of the DOM instance is a simple method to generate the string representation of the DOM instance (in the above cases, as long as it is printed after it is generated ).

Convert a Python object to XML

Python programmers can export any Python object as an XML instance to implement many functions and versatility. This allows us to process Python objects in a habitual way, and we can choose whether to use the instance attribute as the tag in the generated XML. Just a few lines (derived from the building. py example) can be used to convert a Python "native" object to a DOM object and perform recursive processing on those attributes containing the object.
Try_dom2.py

"""Build a DOM instance from scratch, write it to XML  USAGE: python try_dom2.py > outfile.xml"""        import         types        from         xml.dom         import         core        from         xml.dom.builder         import         Builder        # Recursive function to build DOM instance from Python instance        defobject_convert        (builder, inst):          # Put entire object inside an elem w/ same name as the class.  builder.startElement(inst.__class__.__name__)          for         attr         in         inst.__dict__.keys():            if         attr[0] ==         '_':           # Skip internal attributes                        continue            value = getattr(inst, attr)            if         type(value) == types.InstanceType:              # Recursively process subobjects      object_convert(builder, value)            else        :              # Convert anything else to string, put it in an element      builder.startElement(attr)      builder.text(str(value))      builder.endElement(attr)  builder.endElement(inst.__class__.__name__)        if         __name__ ==         '__main__':          # Create container classes          classquotations        :         pass  classquotation        :         pass    # Create an instance, fill it with hierarchy of attributes          inst = quotations()  inst.title =         "Quotations file (not quotations.dtd conformant)"  inst.quot1 = quot1 = quotation()  quot1.text =         """'"is not a quine" is not a quine' is a quine"""  quot1.source =         "Joshua Shagam, kuro5hin.org"  inst.quot2 = quot2 = quotation()  quot2.text =         "Python is not a democracy. Voting doesn't help. "+\                 "Crying may..."  quot2.source =         "Guido van Rossum, comp.lang.python"             # Create the DOM Builder  builder = Builder()  object_convert(builder, inst)          print         builder.document.toxml()

The object_convert () function has some limitations. For example, you cannot use the above process to generate the quotations. dtd: # PCDATA text that conforms to the XML document and can only be placed in the class attributes (such as. text ). A simple work und is to let object_convert () process a property with a name in a special way, such as. PCDATA. You can use various methods to make DOM conversion more clever, but the beauty of this method is that we can start from the entire Python object and convert them into XML documents in a concise way.

It should also be noted that in the generated XML document, there is no obvious sequential relationship between elements at the same level. For example, if Python of a specific version is used in the author's system, the second quotation defined in the source code appears first in the output. However, this order changes between different versions and systems. Attributes of Python objects are not arranged in a fixed order, so this feature is meaningful. We hope that the data related to the database system will have this feature, however, it is clear that Articles marked as XML do not want this feature (unless we want to update William Burroughs's "cut-up" method ).

Convert an XML document to a Python object

Generating Python objects from XML documents is as simple as its reverse process. In most cases, you can use the xml. dom method. However, in some cases, it is best to use the same technology as to process all "class" Python objects to process objects generated from XML documents. For example, in the following code, the pyobj_printer () function may have been used to process any Python object.
Try_dom3.py

"""Read in a DOM instance, convert it to a Python object"""        from         xml.dom.utils         import         FileReader        classPyObject        :         passdefpyobj_printer        (py_obj, level=0):          """Return a "deep" string description of a Python object"""               from                   string         import         join, split          import         types  descript =         ''               for                   membname         in         dir(py_obj):    member = getattr(py_obj,membname)            if         type(member) == types.InstanceType:      descript = descript + (        ' '*level) +         '{'+membname+        '}\n'      descript = descript + pyobj_printer(member, level+3)            elif         type(member) == types.ListType:      descript = descript + (        ' '*level) +         '['+membname+        ']\n'                       for                   i         in         range(len(member)):        descript = descript+(        ' '*level)+str(i+1)+        ': '+ \              pyobj_printer(member[i],level+3)            else        :      descript = descript + membname+        '='      descript = descript + join(split(str(member)[:50]))+        '...\n'               return                   descript        defpyobj_from_dom        (dom_node):          """Converts a DOM tree to a "native" Python object"""  py_obj = PyObject()  py_obj.PCDATA =         ''               for                   node         in         dom_node.get_childNodes():            if         node.name ==         '#text':      py_obj.PCDATA = py_obj.PCDATA + node.value            elif         hasattr(py_obj, node.name):      getattr(py_obj, node.name).append(pyobj_from_dom(node))            else        :      setattr(py_obj, node.name, [pyobj_from_dom(node)])          return         py_obj        # Main testdom_obj = FileReader(        "quotes.xml").documentpy_obj = pyobj_from_dom(dom_obj)        if         __name__ ==         "__main__":          print         pyobj_printer(py_obj)

Here the focus should be on the function pyobj_from_dom (), especially the xml. dom method. get_childNodes () that actually works (). In pyobj_from_dom (), we extract all texts between tags and put them in the reserved property. PCDATA. For any Nested Tag, we create a new attribute whose name matches the tag and assigns a list to this attribute, this can potentially include the tags that appear multiple times in the parent block. Of course, use the list to maintain the order of tags encountered in the XML document.

In addition to using the old pyobj_printer () class functions (or more complex and robust functions), we can use the normal attribute mark to access the elements of py_obj.
Python interactive session

>>>         from         try_dom3         import         *>>> py_obj.quotations[0].quotation[3].source[0].PCDATA        'Guido van Rossum, '

Reschedule the DOM tree

One advantage of DOM is that it allows programmers to operate XML documents in a non-linear manner. Each block enclosed by the matched on/off mark is only a "Node" in the DOM tree ". When a node is maintained in a list-like manner to retain the ordered information, there is no special or unchangeable order. We can easily cut down a node and graft it to another location of the DOM tree (if DTD permits, or even graft it to another layer ). Or add new nodes, delete existing nodes, and so on.
Try_dom4.py

"""Manipulate the arrangement of nodes in a DOM object"""        from         try_dom3         import         *        #-- Var 'doc' will hold the single <quotations> "trunk"doc = dom_obj.get_childNodes()[0]        #-- Pull off all the nodes into a Python list# (each node is a <quotation> block, or a whitespace text node)nodes = []        while         1:          try        : node = doc.removeChild(doc.get_childNodes()[0])          except        :         break          nodes.append(node)        #-- Reverse the order of the quotations using a list method# (we could also perform more complicated operations on the list:# delete elements, add new ones, sort on complex criteria, etc.)nodes.reverse()        #-- Fill 'doc' back up with our rearranged nodes        for         node         in         nodes:          # if second arg is None, insert is to end of list  doc.insertBefore(node, None)        #-- Output the manipulated DOM        print         dom_obj.toxml()

If we think of an XML document as only a text file, or use a sequence-oriented module (such as xmllib or xml. and then execute the rescheduling operation on the quotation node in the preceding lines. However, if DOM is used, the problem is as simple as any other operation on the Python list.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More