Python 之lxml庫學習筆記三

來源:互聯網
上載者:User

序列化:

序列化通常使用tostring()方法來返回一個字串,或者ElementTree.write()方法來寫入一個檔案,一個類檔案的對象,或者一個URL(通過FTP的PUT或者HTTP的POST)。二者都使用相同的關鍵字參數比如pretty_print來格式化輸出或者encoding來選擇一個特定的輸出編碼而不是簡單的ASCII。

>>> root =
etree.XML("<root><a><b/></a></root>")
>>>
etree.tostring(root)

’<root><a><b/></a></root>’

>>> print etree.tostring(root, xml_declaration=True)
<?xml
version=’1.0’ encoding=’ASCII’?>

<root><a><b/></a></root>

>>> print etree.tostring(root, encoding="iso-8859-1")
<?xml
version=’1.0’ encoding=’iso-8859-1’?>

<root><a><b/></a></root>

>>> print etree.tostring(root, pretty_print=True)
<root>

<a>
<b/>
</a>
</root>

Note that pretty printing appends a newline at the end.

注意pretty列印在末尾添加一個新行。

從lxml2.0起,serialisation可以做的不止XML序列化,可以序列化到HTML或者通過傳遞函數關鍵字來提取常值內容。

>>> root =
etree.XML("<html><head/><body><p>Hello<br/>World</p></body></html>")

>>> etree.tostring(root) # default: method = ’xml’

’<html><head/><body><p>Hello<br/>World</p></body></html>’

>>> etree.tostring(root, method="xml") # same as above

’<html><head/><body><p>Hello<br/>World</p></body></html>’

>>> etree.tostring(root, method="html")

’<html><head></head><body><p>Hello<br>World</p></body></html>’

>>> print etree.tostring(root, method="html", pretty_print=True)

<html>
<head></head>

<body><p>Hello<br>World</p></body>

</html>

>>> etree.tostring(root, method="text")
b’HelloWorld’

對XML序列化而言,預設的文本編碼是ASCII

>>> br = root.find(".//br")
>>> br.tail = u"W\xf6rld"

>>> etree.tostring(root, method="text") # doctest: +ELLIPSIS

Traceback (most recent call last):
...
UnicodeEncodeError: ’ascii’
codec can’t encode character u’\xf6’ ...
>>>etree.tostring(root,
method="text", encoding="UTF-8")
b’HelloW\xc3\xb6rld’

>>> etree.tostring(root, encoding=unicode, method="text")

u’HelloW\xf6rld’

ElementTree類:

一個ElementTree主要是圍繞在一個有根節點的樹的文檔封裝類。它提供了很多方法來解析,序列化以及一般的文檔處理。一個最大的區別是它作為一個整體文檔來序列化。與之相對的是序列化成單個的元素。

>>> tree = etree.parse(StringIO("""\
 <?xml
version="1.0"?>
 <!DOCTYPE root SYSTEM "test" [ <!ENTITY tasty
"eggs"> ]>
 <root>
 <a>&tasty;</a>

 </root>
 """))
>>> print(tree.docinfo.doctype)

<!DOCTYPE root SYSTEM "test">

>>> # lxml 1.3.4 and later
>>>
print(etree.tostring(tree))
<!DOCTYPE root SYSTEM "test" [

<!ENTITY tasty "eggs">
]>
<root>

<a>eggs</a>
</root>

>>> # lxml 1.3.4 and later
>>>
print(etree.tostring(etree.ElementTree(tree.getroot())))
<!DOCTYPE root
SYSTEM "test" [
<!ENTITY tasty "eggs">
]>
<root>

<a>eggs</a>
</root>

>>> # ElementTree and lxml <= 1.3.3
>>>
print(etree.tostring(tree.getroot()))
<root>

<a>eggs</a>
</root>

從字串和檔案中解析:

fromstring()是解析字串最容易的方法

>>> some_xml_data = "<root>data</root>"
>>>
root = etree.fromstring(some_xml_data)
>>> print
root.tag
root
>>>
etree.tostring(root)
’<root>data</root>’

XML()方法和fromstring()方法類似,但它主要用來把XML文字寫入源檔案。
>>> root =
etree.XML("<root>data</root>")
>>> print
root.tag
root
>>>
etree.tostring(root)
’<root>data</root>’

parse()方法用來從檔案或者類檔案對象中解析
>>> some_file_like =
StringIO.StringIO("<root>data</root>")
>>> tree =
etree.parse(some_file_like)
>>>
etree.tostring(tree)
’<root>data</root>’

注意parse()返回的是一個ElementTree對象,而不是字串解析方法的Element對象。

>>> root = tree.getroot()
>>> print
root.tag
root
>>>
etree.tostring(root)
’<root>data</root>’

相關文章

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.