Use the Xml.etree.ElementTree module to parse the XML file as follows. The ElementTree module provides two classes to accomplish this purpose:
We operate the following XML file: migapp.xml
We can import the ElementTree module in the following ways: Import Xml.etree.ElementTree as ET
Alternatively, you can import only the parse parser: from Xml.etree.ElementTree import parse
First you need to open an XML file, the local file uses the Open function, and if it is an Internet file, use Urlopen:
f = open (' Migapp.xml ', ' RT ', encoding= ' utf-8 ')
The XML is then parsed.
1 Parsing XML files
1.1 parsing the root element
Tree = Et.parse (f) root = Tree.getroot () print (' Root.tag = ', Root.tag) print (' Root.attrib = ', Root.attrib)
1.2 parsing the son of the root
For children in Root: # can only parse the son of root, unable to parse root descendants print (Child.tag) print (child.attrib) # attrib is a dict
1.3 to parse the descendants of the root by index
Print (Root[1][1].tag) print (Root[1][1].text)
1.4 Iterative parsing of all specified element
for element in Root.iter (' Environment '): print (Element.attrib)
1.5 A few useful ways
# Element.findall () parse out all the sons of the specified element # element.find () resolves the first son of the specified element # element.get () resolves the attribute of the specified element attribfor Environment in Root.findall (' Environment '): first_variable = environment.find (' variable ') print (first_ Variable.get (' name '))
2 Modifying an XML file
Suppose we need to add a property size= "50" to each text element, modify its text to "Benxin Tuzi", add a child element date= "2016/01/16"
For text in Root.iter (' text '): text.set (' Size ', ' ' ") text.text = ' benxin Tuzi ' text.append (ET. Element (' Date ', attrib={}, text= ' 2016/01/16 ')) tree.write (' Output.xml ')
Part of the migapp.xml :
The corresponding part of the output.xml :
3 Illustrative matters
Importerror:no module named ' Xml.etree '; ' XML ' isn't a package
Analysis:
This is because the import will first look under the current path, at this time found the existence of the xml.py module, and we wrote the xml.py of course not a package
Attention:
After deleting xml.py, it still cannot be explained successfully, because XML.PYC is also generated in the current path, and the priority of the file is higher than xml.py, so the interpreter would prefer to look for it in Xml.pyc, so the file must also be deleted to successfully resolve the problem.
Conclusion:
The file name should try not to have the same name as the package name or module name, even if you do not use the module or package in the script, there may be strange errors.
Many of the parsing functions provided in the ElementTree module require a pre-read of the entire XML document into memory, which is not a good thing for large XML parsing, especially when we are reading XML from the network and pipeline, and non-blocking parsing is important. At this point, we can use the Xmlpullparse class in the ElementTree module to handle it. Of course we can also choose the Iterparse () of the ElementTree module instead, which does not need to read all the memory when parsing large XML.