Serialization:
Serialization usually uses the tostring () method to return a string, or ElementTree. write () method to write a file, an object of a class file, or a URL (PUT through FTP or http post ). Both use the same keyword parameter such as pretty_print to format the output or encoding to select a specific output encoding instead of simple ASCII.
>>> Root =
Etree. XML ("<root> <a> <B/> </a> </root> ")
>>>
Etree. tostring (root)
'<Root> <a> <B/> </a> </root>'
>>> Print etree. tostring (root, xml_declaration = True)
<? Xml
Version = '1. 0' encoding = 'ascii '?>
<Root> <a> <B/> </a> </root>
>>> Print etree. tostring (root, encoding = "iso-8859-1 ")
<? Xml
Version = '1. 0' encoding = 'iso-8859-1 '?>
<Root> <a> <B/> </a> </root>
>>> Print etree. tostring (root, pretty_print = True)
<Root>
<A>
<B/>
</A>
</Root>
Note that pretty printing appends a newline at the end.
Note that pretty print adds a new line at the end.
From lxml2.0, serialisation can not only serialize XML, but also serialize to HTML or extract text content by passing function keywords.
>>> Root =
Etree. XML ("
>>> Etree. tostring (root) # default: method = 'xml'
'<Html>
>>> Etree. tostring (root, method = "xml") # same as abve
'<Html>
>>> Etree. tostring (root, method = "html ")
'<Html>
>>> Print etree. tostring (root, method = "html", pretty_print = True)
<Html>
<Head>
<Body> <p> Hello <br> World </p> </body>
</Html>
>>> Etree. tostring (root, method = "text ")
B 'helloworld'
For XML serialization, the default text encoding is ASCII.
>>> Br = root. find (". // br ")
>>> Br. tail = u "W \ xf6rld"
>>> Etree. tostring (root, method = "text") # doctest: + ELLIPSIS
Traceback (most recent call last ):
...
UnicodeEncodeError: 'ascii'
Codec can't encode character U' \ xf6 '...
>>> Etree. tostring (root,
Method = "text", encoding = "UTF-8 ")
B 'HelloW \ xc3 \ xb6rld'
>>> Etree. tostring (root, encoding = unicode, method = "text ")
U'hellow \ xf6rld'
ElementTree class:
An ElementTree is a Document Packaging class centered around a tree with a root node. It provides many methods for parsing, serialization, and general document processing. The biggest difference is that it is serialized as a whole document. In contrast, it is serialized into a single element.
>>> Tree = etree. parse (StringIO ("""\
<? Xml
Version = "1.0"?>
<! DOCTYPE root SYSTEM "test" [<! ENTITY tasty
"Eggs">]>
<Root>
<A> & tasty; </a>
</Root>
"""))
>>> Print(tree.docinfo.doc type)
<! DOCTYPE root SYSTEM "test">
>>># Lxml 1.3.4 and later
>>>
Print (etree. tostring (tree ))
<! DOCTYPE root SYSTEM "test "[
<! ENTITY tasty "eggs">
]>
<Root>
<A> eggs </a>
</Root>
>>># Lxml 1.3.4 and later
>>>
Print (etree. tostring (etree. ElementTree (tree. getroot ())))
<! DOCTYPE root
SYSTEM "test "[
<! ENTITY tasty "eggs">
]>
<Root>
<A> eggs </a>
</Root>
>>># ElementTree and lxml <= 1.3.3
>>>
Print (etree. tostring (tree. getroot ()))
<Root>
<A> eggs </a>
</Root>
Parse from strings and files:
Fromstring () is the easiest way to parse strings
>>> Some_xml_data = "<root> data </root>"
>>>
Root = etree. fromstring (some_xml_data)
>>> Print
Root. tag
Root
>>>
Etree. tostring (root)
'<Root> data </root>'
The XML () method is similar to the fromstring () method, but it is mainly used to write XML text to the source file.
>>> Root =
Etree. XML ("<root> data </root> ")
>>> Print
Root. tag
Root
>>>
Etree. tostring (root)
'<Root> data </root>'
The parse () method is used to parse Files or Class Object.
>>> Some_file_like =
StringIO. StringIO ("<root> data </root> ")
>>> Tree =
Etree. parse (some_file_like)
>>>
Etree. tostring (tree)
'<Root> data </root>'
Note that parse () returns an ElementTree object instead of the Element Object of the string parsing method.
>>> Root = tree. getroot ()
>>> Print
Root. tag
Root
>>>
Etree. tostring (root)
'<Root> data </root>'