Python uses the xml. dom module to parse xml, pythonxml. dom
1. What is xml? What are the features?
Xml can be used to tag data and define data types. It is a source language that allows you to define your own markup language.
Example: del. xml
<?xml version="1.0" encoding="utf-8"?><catalog> <maxid>4</maxid> <login username="pytest" passwd='123456'> <caption>Python</caption> <item id="4"> <caption>test</caption> </item> </login> <item id="2"> <caption>Zope</caption> </item></catalog>
The structure is similar to HTML hypertext markup language. However, they are designed for different purposes. hypertext markup language is designed to display data, and its focus is on the appearance of the data. It is designed to transmit and store data, with the focus on data content.
It has the following features:
• It is composed of tag pairs,<aa></aa>
• Tags can have attributes:<aa id='123'></aa>
• Tag pairs can embed data:<aa>abc</aa>
• Tags can be embedded into sub-tags (hierarchical)
Ii. Obtain tag attributes
# Coding: utf-8import xml. dom. minidomdom = xml. dom. minidom. parse ("del. xml ") # Open the xml document root = dom.doc umentElement # obtain the xml Document Object print" nodeName: ", root. nodeName # each node has its nodeName, nodeValue, and nodeType attributes print "nodeValue:", root. nodeValue # nodeValue is the value of the node. It is only valid for the text node print "nodeType:", root. nodeTypeprint "ELEMENT_NODE:", root. ELEMENT_NODE
NodeType is the node type. Catalog is of the ELEMENT_NODE type.
There are currently the following types:
'ATTRIBUTE_NODE''CDATA_SECTION_NODE''COMMENT_NODE''DOCUMENT_FRAGMENT_NODE''DOCUMENT_NODE''DOCUMENT_TYPE_NODE''ELEMENT_NODE''ENTITY_NODE''ENTITY_REFERENCE_NODE''NOTATION_NODE''PROCESSING_INSTRUCTION_NODE''TEXT_NODE'
Running result
nodeName: catalognodeValue: NonenodeType: 1ELEMENT_NODE: 1
3. Obtain sub-tags
#coding: utf-8import xml.dom.minidomdom = xml.dom.minidom.parse("del.xml") root = dom.documentElementbb = root.getElementsByTagName('maxid')print type(bb)print bbb = bb[0]print b.nodeNameprint b.nodeValue
Running result
<class 'xml.dom.minicompat.NodeList'>[<DOM Element: maxid at 0x2707a48>]maxidNone
4. Obtain tag attribute values
# Coding: utf-8import xml. dom. minidomdom = xml. dom. minidom. parse ("del. xml ") root = dom.doc umentElementitemlist = root. getElementsByTagName ('login') item = itemlist [0] print item. getAttribute ("username") print item. getAttribute ("passwd") itemlist = root. getElementsByTagName ("item") item = itemlist [0] # differentiate print items by location in itemlist. getAttribute ("id") item2 = itemlist [1] # differentiate print item2.getAttribute ("id") by location in itemlist ")
Running result
pytest12345642
5. obtain data between tag pairs
#coding: utf-8import xml.dom.minidomdom = xml.dom.minidom.parse("del.xml") root = dom.documentElementitemlist = root.getElementsByTagName('caption')item = itemlist[0]print item.firstChild.dataitem2 = itemlist[1]print item2.firstChild.data
Running result
Pythontest
Vi. Example
<?xml version="1.0" encoding="UTF-8" ?><users> <user id="1000001"> <username>Admin</username> <email>admin@live.cn</email> <age>23</age> <sex>boy</sex> </user> <user id="1000002"> <username>Admin2</username> <email>admin2@live.cn</email> <age>22</age> <sex>boy</sex> </user> <user id="1000003"> <username>Admin3</username> <email>admin3@live.cn</email> <age>27</age> <sex>boy</sex> </user> <user id="1000004"> <username>Admin4</username> <email>admin4@live.cn</email> <age>25</age> <sex>girl</sex> </user> <user id="1000005"> <username>Admin5</username> <email>admin5@live.cn</email> <age>20</age> <sex>boy</sex> </user> <user id="1000006"> <username>Admin6</username> <email>admin6@live.cn</email> <age>23</age> <sex>girl</sex> </user></users>
Output name, email, age, and sex
Reference Code
# -*- coding:utf-8 -*-from xml.dom import minidomdef get_attrvalue(node, attrname): return node.getAttribute(attrname) if node else ''def get_nodevalue(node, index = 0): return node.childNodes[index].nodeValue if node else ''def get_xmlnode(node, name): return node.getElementsByTagName(name) if node else []def get_xml_data(filename = 'user.xml'): doc = minidom.parse(filename) root = doc.documentElement user_nodes = get_xmlnode(root, 'user') print "user_nodes:", user_nodes user_list=[] for node in user_nodes: user_id = get_attrvalue(node, 'id') node_name = get_xmlnode(node, 'username') node_email = get_xmlnode(node, 'email') node_age = get_xmlnode(node, 'age') node_sex = get_xmlnode(node, 'sex') user_name =get_nodevalue(node_name[0]) user_email = get_nodevalue(node_email[0]) user_age = int(get_nodevalue(node_age[0])) user_sex = get_nodevalue(node_sex[0]) user = {} user['id'] , user['username'] , user['email'] , user['age'] , user['sex'] = ( int(user_id), user_name , user_email , user_age , user_sex ) user_list.append(user) return user_listdef test_load_xml(): user_list = get_xml_data() for user in user_list : print '-----------------------------------------------------' if user: user_str='No.:\t%d\nname:\t%s\nsex:\t%s\nage:\t%s\nEmail:\t%s' % (int(user['id']) , user['username'], user['sex'] , user['age'] , user['email']) print user_strif __name__ == "__main__": test_load_xml()
Result
C:\Users\wzh94434\Desktop\xml>python user.pyuser_nodes: [<DOM Element: user at 0x2758c48>, <DOM Element: user at 0x2756288>, <DOM Element: user at 0x2756888>, <DOM Element: user at 0x2756e88>, <DOM Element: user at 0x275e4c8>, <DOM Element: user at 0x275eac8>]-----------------------------------------------------No.: 1000001name: Adminsex: boyage: 23Email: admin@live.cn-----------------------------------------------------No.: 1000002name: Admin2sex: boyage: 22Email: admin2@live.cn-----------------------------------------------------No.: 1000003name: Admin3sex: boyage: 27Email: admin3@live.cn-----------------------------------------------------No.: 1000004name: Admin4sex: grilage: 25Email: admin4@live.cn-----------------------------------------------------No.: 1000005name: Admin5sex: boyage: 20Email: admin5@live.cn-----------------------------------------------------No.: 1000006name: Admin6sex: grilage: 23Email: admin6@live.cn
VII. Summary
Minidom. parse (filename) load and read the XML file doc.doc umentElement to get the XML file object node. getAttribute (AttributeName) gets the XML node attribute value node. getElementsByTagName (TagName) obtains the node of the XML node object set. childNodes # Return to the subnode list. Node. childNodes [index]. nodeValue get the XML node value node. firstChild # access the first node. It is equivalent to pagexml. childNodes [0] doc = minidom. parse (filename) doc. toxml ('utf-8') returns the text Node in xml format of the Node. attributes ["id"]. name # Is the above "id". value # Attribute value access element attribute
Well, the above is all the content of this article. I hope the content of this article will help you in your study or work. If you have any questions, you can leave a message, thank you for your support.