http://zc0604.iteye.com/blog/1520703
Http://www.cnblogs.com/rirtue/archive/2012/01/12/2321028.html
http://blog.csdn.net/zgx007/article/details/6107226
1,xml Document Structure 1.1,xml document includes XML header information and XML information body 1.1.1,xml Document header information
<?xml version= "1.0" encoding= "Utf-8"?>
It shows the version used for this XML document and how it is encoded. Somewhat complex there are also some definitions of document types (DOCTYPE) that define the DTD or schema used for this XML document and the definitions of some entities.
1.1.2,xml Document Information Body
<Table>
<Name>
tbl_test
</Name>
<Comment> This
ia a test Table
</Comment>
<schema format= "Json" >
</Schema>
</Table>
The XML information body is composed of the top elements of the tree. Each XML document has a document element, which is the root element of the tree, and all other elements and content are contained within the root element.
DOM is the abbreviation for Document Object model, which is a method of representing an XML document in an object tree, and the advantage of using it is that you can easily iterate through the object.
2,minidom module reads XML
As I understand it, after getting the root node of the XML document tree, there are actually two kinds of nodes, "test only uses these two nodes here, actually according to NodeType know there are many more": element node and the text node. The element node, like the name tag above, is an element node. Text nodes, such as the tbl_test above, also serve as a node, or text node.
Nodes have such three attributes;
| Node.nodename |
NodeName is the name of the knot. |
| Node.nodevalue |
NodeValue is the value of the node and is valid only for the text node. |
| Node.nodetype |
NodeType is the type of node. |
Element node can use Root.getelementsbytagname ("Table") to get a list of table labels.
Text nodes can use Column.getattribute (' Name ') to get such a property value for Name. Properties refer to: <column name= "PT" value= "1"/> Such a structure. You can use Node.data or node.nodevalue to get text values. 2.1, get the DOM object
Getting DOM objects from an XML file
>>> Import xml.dom.minidom
>>> dom = xml.dom.minidom.parse (' D:/catalog.xml ')
To get a DOM object from an XML string
>>> Import xml.dom.minidom
>>> dom = xml.dom.minidom.parseString (xmlstring)
2.2, get the document element object
>>> root = dom.documentelement
3, Test
3.1, Experiment 1
<?xml version= "1.0" encoding= "UTF-8"?>
<Table>
<Name>
tbl_test
</Name>
<Comment>
This IA a test table
</Comment>
<schema format= "Json" >
</Schema>
</Table>
Dom = parsestring (string1)
#root = dom.documentelement
Table = dom.getelementsbytagname ("table ") [0]
name = table.getelementsbytagname (" name ") [0] for
textnode in Name.childnodes:
Print Textnode.data
print Textnode.nodevalue
1,dom gets the entire XML object
2, "Not running" root gets the entire document object, if executed, actually gets the root node unique label <Table></Table> under the things if there are multiple <table></table> The estimate will be ignored and the first one as the root by default.
3,root.getelementsbytagname ("Table") will get all <Table></Table> tag pairs, which is a list-like thing that can be obtained by using a list method. Because here is a <Table></Table> tag, so direct [0] returns this individual object.
4,table gets the real single one of the <table></Table> objects.
5,table.getelementsbytagname ("Name") also obtains a list of [<name></name>,..., <name></name>
6,name gets a single, now-only <Name></Name> object.
7, because the text node tbl_test under name. Although there is only one, but can have multiple. At this point, the above are all element nodes, the name tag is a text node, you can use Name.childnodes to get a list of text nodes, note, or list.
8,textnode is one of the only tbl_test.
9, because it is a text node, all have the data attribute. Of course, Node.nodevalue can also read it.
3.2, Experiment 2
<?xml version= "1.0" encoding= "UTF-8"?>
<Partitions>
<Partition>
<column name= "pt "value=" 1 "/>
</Partition>
</Partitions>
Dom = parsestring (string2)
#root = dom.documentelement
partitions = dom.getelementsbytagname ("Partitions") [0 ]
partition = Partitions.getelementsbytagname ("Partition") [0]
column = Partition.getelementsbytagname (" Column ") [0]
print column.getattribute (' Name ')
1,dom gets the entire XML object
2, "not run" root gets the entire document object, if executed, actually gets the root node unique label <Table></Table> under the things if there are multiple <partitions></ Partitions>, the estimate will be ignored, default to the first as a root.
3,root.getelementsbytagname ("Partitions") will get all <Partitions></Partitions> tag pairs, which is a list-like thing that can be obtained using a list method. Because here is a <Partitions></Partitions> tag, so direct [0] returns this individual object.
4,partitions gets the real single one of the <Partitions></Partitions> objects.
5,partitions.getelementsbytagname ("Partition") also obtains a [<partition></partition>,..., <partition ></partition>] such a list.
6,partition gets a single, now-only <Partition></Partition> object.
7,column get to a single <Column></Column> object in the same way
8, because Name is a property of column, use Column.getattribute (' Name ') to get this property value
3.3, Experiment 3
String1= ' ' <?xml version= "1.0" encoding= "UTF-8"?>
<Table>
<Name>
tbl_test
< /name>
<Comment>
<Name>
gexing
</Name> This
ia a test table
</ comment>
<schema format= "Json" >
</Schema>
<Name>
Dandan
</name >
</Table>
""
Dom = parsestring (string1)
root = dom.documentelement
names = Root.getelementsbytagname ("Name") for
Name In names: For child in
name.childnodes:
print Child.nodevalue
Output:
[admin@r42h06016.xy2.aliyun.com] $python test.py
tbl_test
gexing
Dandan
Note that 1 is a blank line because the XML you actually want is a space-free content.
Note that point 2 shows that the list obtained with getElementsByTagName is the traversal of all "nodes", and then regardless of which level, encounters a match will be added in. If there is no textual information, none is output.
3.4, simple function
For simple elements, such as: <caption>python</caption>, we can write a function to get its contents (this is Python).
def gettagtext (root, tag):
node = root.getelementsbytagname (tag) [0]
rc = "" For
node in Node.childnodes:
if Node.nodetype in (node. Text_node, NODE. Cdata_section_node):
rc = rc + node.data return
RC
4,xml.etree.elementtree module reads XML
Import xml.etree.ElementTree
content = xml.etree.ElementTree.fromstring (string1)
name = Content.findall (' Name ') #找到所有的Name的列表
name = Content.findtext (' name ') #找到下一层的Name节点