Python-minidom module "Parsing xml"

Last Update:2018-07-24 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

http://zc0604.iteye.com/blog/1520703

Http://www.cnblogs.com/rirtue/archive/2012/01/12/2321028.html

http://blog.csdn.net/zgx007/article/details/6107226
1,xml Document Structure 1.1,xml document includes XML header information and XML information body 1.1.1,xml Document header information

<?xml version= "1.0" encoding= "Utf-8"?>

It shows the version used for this XML document and how it is encoded. Somewhat complex there are also some definitions of document types (DOCTYPE) that define the DTD or schema used for this XML document and the definitions of some entities.

1.1.2,xml Document Information Body

<Table>
	<Name>
		tbl_test
	</Name>
	<Comment> This
		ia a test Table
	</Comment>
	<schema format= "Json" >
        </Schema>
</Table>

The XML information body is composed of the top elements of the tree. Each XML document has a document element, which is the root element of the tree, and all other elements and content are contained within the root element.
DOM is the abbreviation for Document Object model, which is a method of representing an XML document in an object tree, and the advantage of using it is that you can easily iterate through the object.

2,minidom module reads XML

As I understand it, after getting the root node of the XML document tree, there are actually two kinds of nodes, "test only uses these two nodes here, actually according to NodeType know there are many more": element node and the text node. The element node, like the name tag above, is an element node. Text nodes, such as the tbl_test above, also serve as a node, or text node.

Nodes have such three attributes;

Node.nodename	NodeName is the name of the knot.
Node.nodevalue	NodeValue is the value of the node and is valid only for the text node.
Node.nodetype	NodeType is the type of node.

Element node can use Root.getelementsbytagname ("Table") to get a list of table labels.

Text nodes can use Column.getattribute (' Name ') to get such a property value for Name. Properties refer to: <column name= "PT" value= "1"/> Such a structure. You can use Node.data or node.nodevalue to get text values. 2.1, get the DOM object

Getting DOM objects from an XML file

>>> Import xml.dom.minidom
>>> dom = xml.dom.minidom.parse (' D:/catalog.xml ')

To get a DOM object from an XML string

>>> Import xml.dom.minidom
>>> dom = xml.dom.minidom.parseString (xmlstring)

2.2, get the document element object

>>> root = dom.documentelement

3, Test 3.1, Experiment 1

<?xml version= "1.0" encoding= "UTF-8"?>
<Table>
	<Name>
		tbl_test
	</Name>
	<Comment>
		This IA a test table
	</Comment>
	<schema format= "Json" >
        </Schema>
</Table>

Dom = parsestring (string1)
#root = dom.documentelement
Table  =  dom.getelementsbytagname ("table ") [0]
name  =  table.getelementsbytagname (" name ") [0] for
textnode in Name.childnodes:
   Print Textnode.data
   print Textnode.nodevalue

1,dom gets the entire XML object

2, "Not running" root gets the entire document object, if executed, actually gets the root node unique label <Table></Table> under the things if there are multiple <table></table> The estimate will be ignored and the first one as the root by default.

3,root.getelementsbytagname ("Table") will get all <Table></Table> tag pairs, which is a list-like thing that can be obtained by using a list method. Because here is a <Table></Table> tag, so direct [0] returns this individual object.

4,table gets the real single one of the <table></Table> objects.

5,table.getelementsbytagname ("Name") also obtains a list of [<name></name>,..., <name></name>

6,name gets a single, now-only <Name></Name> object.

7, because the text node tbl_test under name. Although there is only one, but can have multiple. At this point, the above are all element nodes, the name tag is a text node, you can use Name.childnodes to get a list of text nodes, note, or list.

8,textnode is one of the only tbl_test.

9, because it is a text node, all have the data attribute. Of course, Node.nodevalue can also read it.

3.2, Experiment 2

<?xml version= "1.0" encoding= "UTF-8"?>
<Partitions>
	<Partition>
		<column name= "pt "value=" 1 "/>
	</Partition>
</Partitions>

Dom = parsestring (string2)
#root = dom.documentelement
partitions = dom.getelementsbytagname ("Partitions") [0 ]
partition = Partitions.getelementsbytagname ("Partition") [0]
column = Partition.getelementsbytagname (" Column ") [0]
print column.getattribute (' Name ')

1,dom gets the entire XML object

2, "not run" root gets the entire document object, if executed, actually gets the root node unique label <Table></Table> under the things if there are multiple <partitions></ Partitions>, the estimate will be ignored, default to the first as a root.

3,root.getelementsbytagname ("Partitions") will get all <Partitions></Partitions> tag pairs, which is a list-like thing that can be obtained using a list method. Because here is a <Partitions></Partitions> tag, so direct [0] returns this individual object.

4,partitions gets the real single one of the <Partitions></Partitions> objects.

5,partitions.getelementsbytagname ("Partition") also obtains a [<partition></partition>,..., <partition ></partition>] such a list.

6,partition gets a single, now-only <Partition></Partition> object.

7,column get to a single <Column></Column> object in the same way

8, because Name is a property of column, use Column.getattribute (' Name ') to get this property value

3.3, Experiment 3

String1= ' ' <?xml version= "1.0" encoding= "UTF-8"?>
           <Table>
                 <Name>
                    tbl_test
                 < /name>
                 <Comment>
                        <Name>
                            gexing
                        </Name> This
                        ia a test table
                 </ comment>
                 <schema format= "Json" >
                 </Schema>
                 <Name>
                       Dandan
                 </name >
                 </Table>
       ""

Dom = parsestring (string1)
    root = dom.documentelement
    names = Root.getelementsbytagname ("Name") for
    Name In names: For child in
        name.childnodes:
            print Child.nodevalue

Output:

[admin@r42h06016.xy2.aliyun.com] $python test.py

                        tbl_test


                                gexing


                           Dandan

Note that 1 is a blank line because the XML you actually want is a space-free content.

Note that point 2 shows that the list obtained with getElementsByTagName is the traversal of all "nodes", and then regardless of which level, encounters a match will be added in. If there is no textual information, none is output.

3.4, simple function

For simple elements, such as: <caption>python</caption>, we can write a function to get its contents (this is Python).

def gettagtext (root, tag):
    node = root.getelementsbytagname (tag) [0]
    rc = "" For
    node in Node.childnodes:
    if Node.nodetype in (node. Text_node, NODE. Cdata_section_node):
        rc = rc + node.data return
    RC

4,xml.etree.elementtree module reads XML

Import xml.etree.ElementTree
content = xml.etree.ElementTree.fromstring (string1)
name = Content.findall (' Name ') #找到所有的Name的列表
name = Content.findtext (' name ') #找到下一层的Name节点

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python-minidom module "Parsing xml"

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support