Python uses the xml. dom module to parse xml, pythonxml. dom

Source: Internet
Author: User

Python uses the xml. dom module to parse xml, pythonxml. dom

1. What is xml? What are the features?

Xml can be used to tag data and define data types. It is a source language that allows you to define your own markup language.

Example: del. xml

<?xml version="1.0" encoding="utf-8"?><catalog> <maxid>4</maxid> <login username="pytest" passwd='123456'>  <caption>Python</caption>  <item id="4">   <caption>test</caption>  </item> </login> <item id="2">  <caption>Zope</caption> </item></catalog>

The structure is similar to HTML hypertext markup language. However, they are designed for different purposes. hypertext markup language is designed to display data, and its focus is on the appearance of the data. It is designed to transmit and store data, with the focus on data content.

It has the following features:

• It is composed of tag pairs,<aa></aa>

• Tags can have attributes:<aa id='123'></aa>

• Tag pairs can embed data:<aa>abc</aa>

• Tags can be embedded into sub-tags (hierarchical)

Ii. Obtain tag attributes

# Coding: utf-8import xml. dom. minidomdom = xml. dom. minidom. parse ("del. xml ") # Open the xml document root = dom.doc umentElement # obtain the xml Document Object print" nodeName: ", root. nodeName # each node has its nodeName, nodeValue, and nodeType attributes print "nodeValue:", root. nodeValue # nodeValue is the value of the node. It is only valid for the text node print "nodeType:", root. nodeTypeprint "ELEMENT_NODE:", root. ELEMENT_NODE

NodeType is the node type. Catalog is of the ELEMENT_NODE type.

There are currently the following types:


Running result

nodeName: catalognodeValue: NonenodeType: 1ELEMENT_NODE: 1 

3. Obtain sub-tags

#coding: utf-8import xml.dom.minidomdom = xml.dom.minidom.parse("del.xml") root = dom.documentElementbb = root.getElementsByTagName('maxid')print type(bb)print bbb = bb[0]print b.nodeNameprint b.nodeValue

Running result

<class 'xml.dom.minicompat.NodeList'>[<DOM Element: maxid at 0x2707a48>]maxidNone 

4. Obtain tag attribute values

# Coding: utf-8import xml. dom. minidomdom = xml. dom. minidom. parse ("del. xml ") root = dom.doc umentElementitemlist = root. getElementsByTagName ('login') item = itemlist [0] print item. getAttribute ("username") print item. getAttribute ("passwd") itemlist = root. getElementsByTagName ("item") item = itemlist [0] # differentiate print items by location in itemlist. getAttribute ("id") item2 = itemlist [1] # differentiate print item2.getAttribute ("id") by location in itemlist ")

Running result


5. obtain data between tag pairs

#coding: utf-8import xml.dom.minidomdom = xml.dom.minidom.parse("del.xml") root = dom.documentElementitemlist = root.getElementsByTagName('caption')item = itemlist[0]print item.firstChild.dataitem2 = itemlist[1]print

Running result


Vi. Example

<?xml version="1.0" encoding="UTF-8" ?><users> <user id="1000001">  <username>Admin</username>  <email></email>  <age>23</age>  <sex>boy</sex> </user> <user id="1000002">  <username>Admin2</username>  <email></email>  <age>22</age>  <sex>boy</sex> </user> <user id="1000003">  <username>Admin3</username>  <email></email>  <age>27</age>  <sex>boy</sex> </user> <user id="1000004">  <username>Admin4</username>  <email></email>  <age>25</age>  <sex>girl</sex> </user> <user id="1000005">  <username>Admin5</username>  <email></email>  <age>20</age>  <sex>boy</sex> </user> <user id="1000006">  <username>Admin6</username>  <email></email>  <age>23</age>  <sex>girl</sex> </user></users>

Output name, email, age, and sex

Reference Code

# -*- coding:utf-8 -*-from xml.dom import minidomdef get_attrvalue(node, attrname):  return node.getAttribute(attrname) if node else ''def get_nodevalue(node, index = 0): return node.childNodes[index].nodeValue if node else ''def get_xmlnode(node, name): return node.getElementsByTagName(name) if node else []def get_xml_data(filename = 'user.xml'): doc = minidom.parse(filename)  root = doc.documentElement user_nodes = get_xmlnode(root, 'user') print "user_nodes:", user_nodes user_list=[] for node in user_nodes:   user_id = get_attrvalue(node, 'id')   node_name = get_xmlnode(node, 'username')  node_email = get_xmlnode(node, 'email')  node_age = get_xmlnode(node, 'age')  node_sex = get_xmlnode(node, 'sex')  user_name =get_nodevalue(node_name[0])  user_email = get_nodevalue(node_email[0])  user_age = int(get_nodevalue(node_age[0]))  user_sex = get_nodevalue(node_sex[0])  user = {}  user['id'] , user['username'] , user['email'] , user['age'] , user['sex'] = (   int(user_id), user_name , user_email , user_age , user_sex  )  user_list.append(user) return user_listdef test_load_xml(): user_list = get_xml_data() for user in user_list :  print '-----------------------------------------------------'  if user:   user_str='No.:\t%d\nname:\t%s\nsex:\t%s\nage:\t%s\nEmail:\t%s' % (int(user['id']) , user['username'], user['sex'] , user['age'] , user['email'])   print user_strif __name__ == "__main__": test_load_xml()


C:\Users\wzh94434\Desktop\xml>python user.pyuser_nodes: [<DOM Element: user at 0x2758c48>, <DOM Element: user at 0x2756288>, <DOM Element: user at 0x2756888>, <DOM Element: user at 0x2756e88>, <DOM Element: user at 0x275e4c8>, <DOM Element: user at 0x275eac8>]-----------------------------------------------------No.: 1000001name: Adminsex: boyage: 23Email: 1000002name: Admin2sex: boyage: 22Email: 1000003name: Admin3sex: boyage: 27Email: 1000004name: Admin4sex: grilage: 25Email: 1000005name: Admin5sex: boyage: 20Email: 1000006name: Admin6sex: grilage: 23Email:  

VII. Summary

Minidom. parse (filename) load and read the XML file doc.doc umentElement to get the XML file object node. getAttribute (AttributeName) gets the XML node attribute value node. getElementsByTagName (TagName) obtains the node of the XML node object set. childNodes # Return to the subnode list. Node. childNodes [index]. nodeValue get the XML node value node. firstChild # access the first node. It is equivalent to pagexml. childNodes [0] doc = minidom. parse (filename) doc. toxml ('utf-8') returns the text Node in xml format of the Node. attributes ["id"]. name # Is the above "id". value # Attribute value access element attribute

Well, the above is all the content of this article. I hope the content of this article will help you in your study or work. If you have any questions, you can leave a message, thank you for your support.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.