Python-minidom module "Parsing xml"

Source: Internet
Author: User

http://zc0604.iteye.com/blog/1520703

Http://www.cnblogs.com/rirtue/archive/2012/01/12/2321028.html

http://blog.csdn.net/zgx007/article/details/6107226
1,xml Document Structure 1.1,xml document includes XML header information and XML information body 1.1.1,xml Document header information

<?xml version= "1.0" encoding= "Utf-8"?>
It shows the version used for this XML document and how it is encoded. Somewhat complex there are also some definitions of document types (DOCTYPE) that define the DTD or schema used for this XML document and the definitions of some entities.

1.1.2,xml Document Information Body

<Table>
	<Name>
		tbl_test
	</Name>
	<Comment> This
		ia a test Table
	</Comment>
	<schema format= "Json" >
        </Schema>
</Table>

The XML information body is composed of the top elements of the tree. Each XML document has a document element, which is the root element of the tree, and all other elements and content are contained within the root element.
DOM is the abbreviation for Document Object model, which is a method of representing an XML document in an object tree, and the advantage of using it is that you can easily iterate through the object.


2,minidom module reads XML

As I understand it, after getting the root node of the XML document tree, there are actually two kinds of nodes, "test only uses these two nodes here, actually according to NodeType know there are many more": element node and the text node. The element node, like the name tag above, is an element node. Text nodes, such as the tbl_test above, also serve as a node, or text node.

Nodes have such three attributes;

Node.nodename NodeName is the name of the knot.
Node.nodevalue NodeValue is the value of the node and is valid only for the text node.
Node.nodetype NodeType is the type of node.

Element node can use Root.getelementsbytagname ("Table") to get a list of table labels.

Text nodes can use Column.getattribute (' Name ') to get such a property value for Name. Properties refer to: <column name= "PT" value= "1"/> Such a structure. You can use Node.data or node.nodevalue to get text values. 2.1, get the DOM object

Getting DOM objects from an XML file

>>> Import xml.dom.minidom
>>> dom = xml.dom.minidom.parse (' D:/catalog.xml ')
To get a DOM object from an XML string
>>> Import xml.dom.minidom
>>> dom = xml.dom.minidom.parseString (xmlstring)

2.2, get the document element object

>>> root = dom.documentelement


3, Test 3.1, Experiment 1

<?xml version= "1.0" encoding= "UTF-8"?>
<Table>
	<Name>
		tbl_test
	</Name>
	<Comment>
		This IA a test table
	</Comment>
	<schema format= "Json" >
        </Schema>
</Table>


Dom = parsestring (string1)
#root = dom.documentelement
Table  =  dom.getelementsbytagname ("table ") [0]
name  =  table.getelementsbytagname (" name ") [0] for
textnode in Name.childnodes:
   Print Textnode.data
   print Textnode.nodevalue
1,dom gets the entire XML object

2, "Not running" root gets the entire document object, if executed, actually gets the root node unique label <Table></Table> under the things if there are multiple <table></table> The estimate will be ignored and the first one as the root by default.

3,root.getelementsbytagname ("Table") will get all <Table></Table> tag pairs, which is a list-like thing that can be obtained by using a list method. Because here is a <Table></Table> tag, so direct [0] returns this individual object.

4,table gets the real single one of the <table></Table> objects.

5,table.getelementsbytagname ("Name") also obtains a list of [<name></name>,..., <name></name>

6,name gets a single, now-only <Name></Name> object.

7, because the text node tbl_test under name. Although there is only one, but can have multiple. At this point, the above are all element nodes, the name tag is a text node, you can use Name.childnodes to get a list of text nodes, note, or list.

8,textnode is one of the only tbl_test.

9, because it is a text node, all have the data attribute. Of course, Node.nodevalue can also read it.


3.2, Experiment 2

<?xml version= "1.0" encoding= "UTF-8"?>
<Partitions>
	<Partition>
		<column name= "pt "value=" 1 "/>
	</Partition>
</Partitions>

Dom = parsestring (string2)
#root = dom.documentelement
partitions = dom.getelementsbytagname ("Partitions") [0 ]
partition = Partitions.getelementsbytagname ("Partition") [0]
column = Partition.getelementsbytagname (" Column ") [0]
print column.getattribute (' Name ')

1,dom gets the entire XML object

2, "not run" root gets the entire document object, if executed, actually gets the root node unique label <Table></Table> under the things if there are multiple <partitions></ Partitions>, the estimate will be ignored, default to the first as a root.

3,root.getelementsbytagname ("Partitions") will get all <Partitions></Partitions> tag pairs, which is a list-like thing that can be obtained using a list method. Because here is a <Partitions></Partitions> tag, so direct [0] returns this individual object.

4,partitions gets the real single one of the <Partitions></Partitions> objects.

5,partitions.getelementsbytagname ("Partition") also obtains a [<partition></partition>,..., <partition ></partition>] such a list.

6,partition gets a single, now-only <Partition></Partition> object.

7,column get to a single <Column></Column> object in the same way

8, because Name is a property of column, use Column.getattribute (' Name ') to get this property value


3.3, Experiment 3

String1= ' ' <?xml version= "1.0" encoding= "UTF-8"?>
           <Table>
                 <Name>
                    tbl_test
                 < /name>
                 <Comment>
                        <Name>
                            gexing
                        </Name> This
                        ia a test table
                 </ comment>
                 <schema format= "Json" >
                 </Schema>
                 <Name>
                       Dandan
                 </name >
                 </Table>
       ""

Dom = parsestring (string1)
    root = dom.documentelement
    names = Root.getelementsbytagname ("Name") for
    Name In names: For child in
        name.childnodes:
            print Child.nodevalue

Output:

[admin@r42h06016.xy2.aliyun.com] $python test.py

                        tbl_test


                                gexing


                           Dandan

Note that 1 is a blank line because the XML you actually want is a space-free content.

Note that point 2 shows that the list obtained with getElementsByTagName is the traversal of all "nodes", and then regardless of which level, encounters a match will be added in. If there is no textual information, none is output.


3.4, simple function

For simple elements, such as: <caption>python</caption>, we can write a function to get its contents (this is Python).

def gettagtext (root, tag):
    node = root.getelementsbytagname (tag) [0]
    rc = "" For
    node in Node.childnodes:
    if Node.nodetype in (node. Text_node, NODE. Cdata_section_node):
        rc = rc + node.data return
    RC

4,xml.etree.elementtree module reads XML

Import xml.etree.ElementTree
content = xml.etree.ElementTree.fromstring (string1)
name = Content.findall (' Name ') #找到所有的Name的列表
name = Content.findtext (' name ') #找到下一层的Name节点























Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.