Although the Xml.etree.ElementTree library is often used for parsing work, it can also create XML documents. For example, consider the following function:
From Xml.etree.ElementTree import elementdef dict_to_xml (tag, D): "Turn a simple dict of key/value pairs into XML" Elem = Element (tag) for key, Val in D.items (): Child = Element (key) child.text = str (val) elem.append (child) return Elem
Here is an example of use:
>>> s = {' name ': ' GOOG ', ' shares ': +, ' price ': 490.1}>>> e = dict_to_xml (' stock ', s) >>> E
>>>
The conversion result is an Element instance. For I/O operations, it is easy to convert it to a byte string using the ToString () function in Xml.etree.ElementTree. For example:
>>> from Xml.etree.ElementTree import tostring>>> tostring (e) B '
490.1
GOOG
' >>>
If you want to add attribute values to an element, you can use the set () method:
>>> e.set (' _id ', ' 1234 ') >>> ToString (e) B '
490.1
GOOG
' >>>
If you still want to maintain the order of the elements, consider constructing a ordereddict instead of a common dictionary. When creating XML, you are restricted to constructing values of string types only. For example:
def dict_to_xml_str (tag, D): ' Turn a simple dict of key/value pairs into XML ' parts = [' <{}&G t; '. Format (TAG)] for Key, Val in D.items (): parts.append (' <{0}>{1}
'. Format (key,val)) Parts.append ("
. Format (tag)) return". Join (Parts)
The problem is that if you do it manually, you may encounter some trouble. For example, what happens when a dictionary contains some special characters in its value?
>>> d = {' name ': '
}>>> # String creation>>> dict_to_xml_str (' item ', d) '
c4> ' >>>
# Proper XML creation>>> e = Dict_to_xml (' Item
'
, d) >>> ToString (e) B '
>>>
Note that the characters ' < ' and ' > ' are replaced by < and > in the following example of the program
For reference only, if you need to convert these characters manually, you can use the Escape () and unescape () functions in Xml.sax.saxutils. For example:
>>> from xml.sax.saxutils import Escape, unescape>>> Escape ('
) '
' >>> unescape (_) '
>>>
In addition to creating the correct output, there is another reason to recommend that you create an Element instance instead of a string, which is not so easy to construct a larger document using a string combination. The Element instance can be processed in many ways without considering parsing the XML text. That is, you can do all of your work on a high-level data structure and output it as a string at the end.
Parsing XML documents with namespaces
If you parse this document and execute a normal query, you will find that this is not so easy, because all the steps have become quite cumbersome.
>>> # Some queries that work>>> doc.findtext (' author ') ' David Beazley ' >>> doc.find (' content ' )
>>> # A query involving a namespace (doesn ' t work) >>> doc.find (' content/html ') >> > # Works if fully qualified>>> doc.find (' content/{http://www.w3.org/1999/xhtml}html ')
> >> # doesn ' t work>>> doc.findtext (' Content/{http://www.w3.org/1999/xhtml}html/head/title ') >> > # Fully qualified>>> doc.findtext (' content/{http://www.w3.org/1999/xhtml}html/' ... ' {http://www.w3.org/1999/xhtml}head/{http://www.w3.org/1999/xhtml}title ') ' Hello World ' >>>
You can simplify this process by wrapping the namespace processing logic as a tool class:
Class XmlNamespaces: def __init__ (self, **kwargs): self.namespaces = {} for name, Uri in Kwargs.items (): self.register (name, URI) def register (self, Name, URI): self.namespaces[name] = ' {' +uri+ '} ' def __ call__ (self, Path): return Path.format_map (self.namespaces)
Use this class in the following way:
>>> ns = xmlnamespaces (html= ' http://www.w3.org/1999/xhtml ') >>> doc.find (ns (' content/{html}html ') )
>>> doc.findtext (ns (' Content/{html}html/{html}head/{html}title ')) ' Hello World ' >> >
Discuss
Parsing XML documents that contain namespaces can be tedious. The xmlnamespaces above simply allows you to use the abbreviated name instead of the full URI to make it a little more concise.
Unfortunately, there is no way to get the namespace information in the basic ElementTree parsing. However, if you use the Iterparse () function, you can get more information about the scope of the namespace processing. For example:
>>> from Xml.etree.ElementTree import iterparse>>> to evt, Elem in Iterparse (' Ns2.xml ', (' End ', ' Start -ns ', ' End-ns '): ... print (evt, elem) ... end
start-ns (', ' http://www.w3.org/1999/xhtml ') End End End end
end-ns noneend
end
>>> Elem # This is the topmost element
>>>
Finally, if you want to work with XML text in addition to other advanced XML features, use the namespace, it is recommended that you use the Lxml function library instead of ElementTree. For example, lxml provides better support for validating documents with DTDs, better XPath support, and some other advanced XML features. This section really just teaches you how to make XML parsing a little bit simpler.