Python example code for XML file parsing, pythonxml
1. XML Introduction
XML (eXtensible Markup Language) is an eXtensible Markup Language designed to transmit and store data. It has become the core of many new technologies and has different applications in different fields. It is an inevitable product of the development of web to a certain stage. It has both the core features of SGML and the simple features of HTML, as well as many new features such as clear and well-structured.
Test. XML file
<? Xml version = "1.0" encoding = "UTF-8"?> <Catalog> <maxid> 4 </maxid> <login username = "pytest" passwd = '000000'> <caption> Python </caption> <item id = "4"> <caption> test </caption> </item> </login> <item id = "2"> <caption> Zope </caption> </item> </catalog>
XML detailed introduction can refer to: http://www.w3school.com.cn/xmldom/dom_nodetype.asp
2. XML file parsing
There are three common methods for parsing XML in python: xml. dom. * module, which is the W3C dom api implementation. This module is suitable for processing DOM APIs. The second is xml. sax. * module, which is the implementation of the sax api. This module sacrifices convenience in exchange for speed and memory usage. SAX is an event-based API, this means that it can "process a large number of documents in the air" without fully loading them into the memory. The third is xml. etree. the ElementTree module (ET) Provides Lightweight Python APIs. Compared with DOM, ET is much faster and many pleasant APIs are available, ET. iterparse also provides an "in the air" processing method, and there is no need to load the entire document to the memory. The average performance of ET is similar to that of SAX, however, the API is more efficient and easy to use.
2.1 xml. dom .*
The Document Object Model (DOM) is a standard programming interface recommended by W3C for processing Extensible slogans. When a DOM parser parses an XML document, it reads the entire document at one time and stores all the elements in the document in a tree structure in the memory, then you can use different functions provided by DOM to read or modify the content and structure of the document, or write the modified content into the xml file. In python, xml. dom. minidom is used to parse xml files.
A. Obtain sub-tags
B. differentiate tags with the same Tag Name
C. Get tag attribute values
D. obtain data between tag pairs.
# Coding = UTF-8 # parse the xml File import xml. dom. minidom as xmldomimport OS ''' to read the XML file <? Xml version = "1.0" encoding = "UTF-8"?> <Catalog> <maxid> 4 </maxid> <login username = "pytest" passwd = '000000'> dasdas <caption> Python </caption> <item id = "4"> <caption> test </caption> </item> </login> <item id = "2"> <caption> Zope </caption> </item> </catalog> '''xmlfilepath = OS. path. abspath ("test. xml ") print (" xml file path: ", xmlfilepath) # obtain the Document Object domobj = xmldom. parse (xmlfilepath) print ("xmldom. parse: ", type (domobj) # obtain the Element Object elementobj = domobj.doc umentElementprint (" domobj.doc umentElement: ", type (elementobj) # obtain the subtag subElementObj = elementobj. getElementsByTagName ("login") print ("getElementsByTagName:", type (subElementObj) print (len (subElementObj) # obtain the tag attribute value print (subElementObj [0]. getAttribute ("username") print (subElementObj [0]. getAttribute ("passwd") # differentiate tags with the same tag name from subElementObj1 = elementobj. getElementsByTagName ("caption") for I in range (len (subElementObj1): print ("subElementObj1 [I]:", type (subElementObj1 [I]) print (subElementObj1 [I]. firstChild. data) # display data between tag pairs
Output result:
>>> D: \ Pystu> python xml_instance.py
>>> Xml file path: D: \ Pystu \ test. xml
>>> Xmldom. parse: <class 'xml. dom. minidom. document'>
>>> Domobj.doc umentElement: <class 'xml. dom. minidom. element'>
>>> GetElementsByTagName: <class 'xml. dom. minicompat. nodelist'>
>>> Username: pytest
>>> Passwd: 123456
>>> SubElementObj1 [I]: <class 'xml. dom. minidom. element'>
>>> Python
>>> SubElementObj1 [I]: <class 'xml. dom. minidom. element'>
>>> Test
>>> SubElementObj1 [I]: <class 'xml. dom. minidom. element'>
>>> Zope
2.2 xml. etree. ElementTree
ElementTree is born to process XML. It has two implementations in the Python standard library: one is implemented in pure Python, such as xml. etree. elementTree, and xml with a higher speed. etree. cElementTree. Note: Use the C language as much as possible because it is faster and consumes less memory.
A. traverse the next layer of the Root Node
B. Access tags, attributes, and text by subscript
C. Search for the specified tag under root.
D. Traverse XML files
E. Modify the XML file
# Coding = UTF-8 # parse the xml file ''' try: import xml. etree. CElementTree as etemedit: import xml. etree. elementTree as ET from Python3.3 the ElementTree module will automatically find available C libraries to speed up '''import xml. etree. elementTree as ETimport osimport sys ''' XML file read <? Xml version = "1.0" encoding = "UTF-8"?> <Catalog> <maxid> 4 </maxid> <login username = "pytest" passwd = '000000'> dasdas <caption> Python </caption> <item id = "4"> <caption> test </caption> </item> </login> <item id = "2"> <caption> Zope </caption> </item> </catalog> ''' # traverse the xml file def traverseXml (element): # print (len (element) if len (element)> 0: for child in element: print (child. tag, "----", child. attrib) traverseXml (child) # else: # print (element. tag ,"---- ", Element. attrib) if _ name _ = "_ main _": xmlFilePath = OS. path. abspath ("test. xml ") print (xmlFilePath) try: tree = ET. parse (xmlFilePath) print ("tree type:", type (tree) # obtain the root node root = tree. getroot () Escape t Exception as e: # capture the program and exit sys. print ("parse test. xml fail! ") Sys. exit () print ("root type:", type (root) print (root. tag, "----", root. attrib) # traverse the next layer of root for child in root: print ("traverse the next layer of root", child. tag, "----", child. attrib) # use subscript to access print (root [0]. text) print (root [1] [1] [0]. text) print (20 * "*") # traverse the xml file traverseXml (root) print (20 * "*") # search for all the labels under root by Tag Name captionList = root. findall ("item") # print (len (captionList) for caption in captionList: print (caption. tag, "----", caption. attrib, "----", caption. text) # modify the xml file and change passwd to 999999 login = root. find ("login") passwdValue = login. get ("passwd") print ("not modify passwd:", passwdValue) login. set ("passwd", "999999") # modify. If you modify text, it indicates login. text print ("modify passwd:", login. get ("passwd "))
Output result:
>>> D: \ Pystu \ test. xml
>>> Tree type: <class 'xml. etree. ElementTree. ElementTree '>
>>> Root type: <class 'xml. etree. ElementTree. element'>
>>> Catalog ----{}
>>> Traverse the next layer of maxid ---- {} of root ----{}
>>> Traverse the next layer of the root login ---- {'username': 'pytest', 'passwd': '123 '}
>>> Traverse the next item ---- {'id': '2'} of the root user '}
>>> 4
>>> Test
>>> ********************
>>> Maxid ----{}
>>> Login ---- {'username': 'pytest', 'passwd': '123 '}
>>> Caption ----{}
>>> Item ---- {'id': '4 '}
>>> Caption ----{}
>>> Item ---- {'id': '2 '}
>>> Caption ----{}
>>> ********************
>>> 1
>>> Item ---- {'id': '2 '}----
>>> Not modify passwd: 123456
>>> Modify passwd: 999999
Appendix:
# Coding = UTF-8 ''' XML parsing class @ function-add, delete, modify, and query of nodes ''' import xml. etree. elementTree as ETimport sysimport OS. pathclass XmlParse: def _ init _ (self, file_path): self. tree = None self. root = None self. xml_file_path = file_path def ReadXml (self): try: print ("xmlfile:", self. xml_file_path) self. tree = ET. parse (self. xml_file_path) self. root = self. tree. getroot () failed t Exception as e: print ("parse xml faild! ") Sys. exit () else: print (" parse xml success! ") Finally: return self. tree def CreateNode (self, tag, attrib, text): element = ET. element (tag, attrib) element. text = text print ("tag: % s; attrib: % s; text: % s" % (tag, attrib, text) return element def AddNode (self, Parent, tag, attrib, text): element = self. createNode (tag, attrib, text) if Parent: Parent. append (element) el = self. root. find ("lizhi") print (el. tag, "----", el. attrib, "----", el. text) else: print ("parent is none") def WriteXml (self, destfile): dest_xml_file = OS. path. abspath (destfile) self. tree. write (dest_xml_file, encoding = "UTF-8", xml_declaration = True) if _ name _ = "_ main _": xml_file = OS. path. abspath ("test. xml ") parse = XmlParse (xml_file) tree = parse. readXml () root = tree. getroot () print (root) parse. addNode (root, "Python", {"age": "22", "hello": "world"}, "YES") parse. writeXml ("testtest. xml ")
2.3 xml. sax .*
SAX is an event-driven API. parsing XML using SAX involves two parts: a parser and an event processor.
The parser reads XML documents and sends events to the event processor, such as element start and element end events.
The event processor is responsible for responding to the event and processing the transmitted XML data.
Common scenarios:
(1) process large files
(2) You only need part of the file, or you only need to get specific information from the file.
(3) create your own Object Model
The event-driven SAX parsing XML content will be supplemented later!
The above is all the content of this article. I hope it will be helpful for your learning and support for helping customers.