A detailed explanation of the dynamic _python of DOM methods in Python

Last Update:2017-01-19 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Document Object Model

The Xml.dom module may be the most powerful tool for Python programmers when working with XML documents. Unfortunately, the documentation provided by Xml-sig is still relatively small. The language-independent DOM specification of the consortium fills in some of this space. But Python programmers would be better off with a quick start guide to DOM that is specific to the Python language. This article is intended to provide such a guide. In the last column, sample QUOTATIONS.DTD files were used in some samples, and these files can be used with the code sample files in this article.

It is necessary to understand the exact meaning of the DOM. In this regard, the formal explanation is very good:

The Document Object model is a platform-independent and language-independent interface that allows programs and scripts to dynamically access and update the content, structure, and style of a document. The document can be processed further, and the results of the processing can be merged into the displayed page. (World Wide Web Alliance DOM Working Group)

The DOM converts an XML document into a tree-or forest-represents. The World Wide Web Consortium specification gives an example of a DOM version of an HTML table.

As shown in the illustration above, the DOM defines a set of methods that can traverse, trim, reorganize, output, and manipulate trees from a more abstract perspective, which is more convenient than the linear representation of XML documents.

Convert HTML to XML

Valid HTML is almost a valid XML, but not exactly the same. There are two major differences, the XML tags are case-sensitive, and all XML tags require an explicit ending symbol (as the closing tag, which is optional for some HTML tags; for example: ). A simple example of using Xml.dom is to convert HTML to XML using the Htmlbuilder () class.
try_dom1.py

"" "Convert a valid HTML document to XML
  Usage:python try_dom1.py < infile.html > Outfile.xml
" "
    
    
Import
    
     sys
    
    from
    
     xml.dom 
    
    import
    
     core
    
    from
    
     xml.dom.html_builder 
    
    import
    
     Htmlbuilder
    
    # Construct an Htmlbuilder object and feeds the data to it
B = htmlbuilder ()
b.feed (sys.stdin.re AD ())
    
    # Get the newly-constructed Document object
doc = b.document
    
    # Output it as XML
    
    
print
    
     doc. ToXml ()

The Htmlbuilder () class is easy to implement the functionality of some of the basic Xml.dom.builder templates it inherits, and its source code is worth studying. However, even if we implement the template functionality ourselves, the contours of the DOM program are similar. In general, we will build a DOM instance with some methods and then manipulate the instance. The. ToXml () method of a DOM instance is a simple method of generating a string representation of a DOM instance (in the case of the above, it is printed only after the build).

Convert a Python object into XML

Python programmers can achieve quite a lot of functionality and versatility by exporting arbitrary Python objects as XML instances. This allows us to deal with Python objects in a customary way, and we can choose whether to end up using instance properties as tokens in the build XML. With just a few lines (derived from the building.py example), we can convert the Python "native" object into a DOM object and perform recursion on those attributes that contain the object.
try_dom2.py

"" "Build a DOM instance from scratch, write it to XML Usage:python try_dom2.py > Outfile.xml" "Import 
    
    Types from Xml.dom import core from Xml.dom.builder Import Builder # recursive function to build DOM instance from Python instance Defobje
  Ct_convert (Builder, Inst): # put entire object inside a elem w/same name as the class.
    
    
    Builder.startelement (inst.__class__.__name__) for attr in Inst.__dict__.keys (): If attr[0] = = ' _ ': # Skip Internal Attributes cont Inue value = GetAttr (Inst, attr) if type (value) = = Types.
    
    Instancetype: # Recursively process subobjects Object_convert (builder, value) else : # Convert anything else to string, pUT it in a element builder.startelement (attr) builder.text (str (value)) builder.endelement (attr) build Er.endelement (inst.__class__.__name__) if __name__ = = ' __main__ ': # Create Contai NER Classes Classquotations:pass Classquotation:pass # Create A instance, fill it with hierarchy of attributes Inst = Quotations () Inst.title = "Quotations fil  E (not quotations.dtd conformant) "inst.quot1 = QUOT1 = Quotation () Quot1.text =" "" Isn't not a quine "is not A quine ' is a quine ' "" "Quot1.source =" Joshua Shagam, kuro5hin.org "inst.quot2 = Quot2 = quotation () quot 2.text = "Python is not a democracy." Voting doesn ' t help. "+\" crying may ... "Quot2.source =" Guido van Rossum, Comp.lang.python "# C
  reate the DOM Builder Builder = Builder () object_convert (Builder, inst)
    
    Print Builder.document.toxml ()

function Object_convert () has some limitations. For example, it is not possible to generate a quotations.dtd that conforms to an XML document using the above procedure: #PCDATA text cannot be placed directly in the quotation class, but only in the properties of the class (such as. text). A simple workaround is to have Object_convert () handle a property with a name in a special way, for example. PCDATA. There are various ways to make conversion to the DOM more ingenious, but the beauty of this approach is that we can start with the entire Python object and translate them into XML documents in a concise way.

It should also be noted that in the generated XML document, elements at the same level do not have a distinct sequential relationship. For example, using a specific version of Python in the author's system, the second quotation defined in the source code appears first in the output. But this sequential relationship changes between different versions and systems. The properties of a Python object are not in a fixed order, so this feature makes sense. For data related to database systems, we want them to have this feature, but it is obvious that the article labeled XML does not want to have this feature (unless we want to update William Burroughs's "cut-up" method).

Convert an XML document into a Python object

Generating a Python object from an XML document is as simple as its reverse process. In most cases, it's OK to use the Xml.dom method. In some cases, however, it is best to handle objects generated from an XML document using the same techniques as all "generic" Python objects. For example, in the following code, the function Pyobj_printer () may be a function that has been used to process any Python object.
try_dom3.py

"" "Read in a DOM instance, convert it to a Python object" "" Xml.dom.utils Import
  
    
    FileReader Classpyobject:pass defpyobj_printer (Py_obj, level=0): 
    
    "" "Return a" deep "string description of a Python object" "' from string
     
     
    
     Import join, split import types descript = ' for Membname in Dir (py_obj): member = GetAttr (py_obj,membname) if T Ype (member) = = types. Instancetype:descript = descript + (' *level ') + ' {' +membname+ '}\n ' descript = Descript + pyobj_printer (member, level+3) elif type (member) = = types.
             
     
     Listtype:descript = descript + (' *level) + ' [' +membname+ ']\n '] For I
    
    In range (len): descript = descript+ (' *level) +str (i+1) + ': ' + \
    
    Pyobj_printer (member[i],level+3) else:descript = descript + membname+
     
     
    
     ' = ' descript = descript + join (Split (str (member) [:)) + ' ... \ n ' return Descript defpyobj_from_dom (Dom_node): "" "Converts a DOM tree to a" native "Python objec T "" "Py_obj = Pyobject () py_obj.
    
    
    PCDATA = ' for node in Dom_node.get_childnodes (): if node.name = = ' #text ': py_obj. PCDATA = Py_obj. PCDATA + node.value elif hasattr (Py_obj, Node.name): GetAttr (Py_obj, Node.name). Append (pyobj 
    
     _from_dom (node)) else:setattr (Py_obj, Node.name, [Pyobj_from_dom (node)]) return
    
 Py_obj   # Main Test dom_obj = FileReader ("Quotes.xml"). Document Py_obj = Pyobj_from_dom (dom_obj) if

 __name__ = = "__main__": Print Pyobj_printer (py_obj)

The focus here should be the function pyobj_from_dom (), especially the Xml.dom method that actually works. Get_childnodes (). In Pyobj_from_dom (), we directly extract all the text between the tags and place it in the reserved property. In PCDATA. For any nested tags that are encountered, we create a new property whose name matches the tag and assigns a list to the property, which can potentially contain tags that appear multiple times in the parent block. Of course, use lists to maintain the order of tags that are encountered in XML documents.

In addition to using the old pyobj_printer () generic function (or, more complex and robust functions), we can use the normal property notation to access the py_obj elements.
Python Interactive Session

>>> 
    
    from
    
     try_dom3 
    
    import
    
     *
>>> py_obj.quotations[0].quotation[3].source[0 ]. PCDATA
    
    ' Guido van Rossum, '

Rearranging the DOM tree

A big advantage of DOM is that it allows programmers to manipulate XML documents in a non-linear way. Each block that is enclosed by a matching open/close tag is just a "node" in the DOM tree. When maintaining a node in a way that is like a list to preserve order information, the order is not special or immutable. We can easily cut a node and graft it to another location in the DOM tree (if the DTD allows, or even grafted onto another layer). Or add new nodes, delete existing nodes, and so on.
try_dom4.py

"" "Manipulate the arrangement of nodes in a DOM object" "' from try_dom3 import * #--Var ' Doc ' would hold the single <quotations> "trunk" doc = dom_obj.get_childnodes () [0] #--Pul
    
    L The nodes into a Python list # (each node is a <quotation> block, or a whitespace text node) nodes = []
    
    While 1:try:node = Doc.removechild (doc.get_childnodes () [0]) except : Break nodes.append (node) #--Reverse The order of the quotations using a list meth OD # (we could also perform more complicated operations on the list: # delete elements, add new ones, sort on complex crit
Eria, etc.)
    
     Nodes.reverse () #--Fill ' Doc ' back and our rearranged nodes for node in Nodes: # If second arg is None, insert be to end of list Doc.insertbefore (node, None) #--OUTPU T the Manipulated DOM Print Dom_obj.toxml ()

If we view an XML document as only a text file, or if we use a sequence-oriented module (such as Xmllib or Xml.sax), then performing a rescheduling of the quotation node in the above lines will lead to a problem that is worth considering. However, if you use the DOM, the problem is as simple as any other operation that is performed on the Python list.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More