使用XPath尋找文本
另一個抽取XML樹的常值內容是XPath,
>>> print(html.xpath("string()")) # lxml.etree only!
TEXTTAIL
>>> print(html.xpath("//text()")) # lxml.etree only!
[’TEXT’, ’TAIL’]
如果經常使用,可以封裝成一個方法:
>>> build_text_list = etree.XPath("//text()") # lxml.etree only!
>>> print(build_text_list(html))
[’TEXT’, ’TAIL’]
也可以通過getparent方法得到父節點
>>> texts = build_text_list(html)
>>> print(texts[0])
TEXT
>>> parent = texts[0].getparent()
>>> print(parent.tag)
body
>>> print(texts[1])
TAIL
>>> print(texts[1].getparent().tag)
br
You can also find out if it’s normal text content or tail text:
>>> print(texts[0].is_text)
True
>>> print(texts[1].is_text)
False
>>> print(texts[1].is_tail)
True
樹的迭代:
Elements提供一個樹的迭代器可以迭代訪問樹的元素。
>>> root = etree.Element("root")
>>> etree.SubElement(root, "child").text = "Child 1"
>>> etree.SubElement(root, "child").text = "Child 2"
>>> etree.SubElement(root, "another").text = "Child 3"
>>> print(etree.tostring(root, pretty_print=True))
<root>
<child>Child 1</child>
<child>Child 2</child>
<another>Child 3</another>
</root>
>>> for element in root.iter():
... print("%s - %s" % (element.tag, element.text))
root – None
child - Child 1
child - Child 2
another - Child 3
如果知道感興趣的tag,可以把tag的名字傳給iter方法,起到過濾作用。
>>> for element in root.iter("child"):
... print("%s - %s" % (element.tag, element.text))
child - Child 1
child - Child 2
預設情況下,迭代器得到一個樹的所有節點,包括ProcessingInstructions, Comments and Entity的執行個體。如果想確認只有Elements對象返回,可以把Element factory作為參數傳入。
>>> root.append(etree.Entity("#234"))
>>> root.append(etree.Comment("some comment"))
>>> for element in root.iter():
... if isinstance(element.tag, basestring):
... print("%s - %s" % (element.tag, element.text))
... else:
... print("SPECIAL: %s - %s" % (element, element.text))
root - None
child - Child 1
child - Child 2
another - Child 3
SPECIAL: ê - ê
SPECIAL: <!--some comment--> - some comment
>>> for element in root.iter(tag=etree.Element):
... print("%s - %s" % (element.tag, element.text))
root - None
child - Child 1
child - Child 2
another - Child 3
>>> for element in root.iter(tag=etree.Entity):
... print(element.text)
ê