MyPage = "<title>TITLE</title>
<body>
<div> </div>
<div id= "Photos" >
<span id= "Pic1" >
*</span>
<span id= "Pic2" >****</span>
<p><a href= "http://www.example.com/more_pic.html" >* </a></p>
<a href= "http://www.baidu.com" >****</a>
<a href= "http://www.163.com" >*****</a>
<a href= "http://www.sohu.com" >****</a>
</div>
<p class= "myClassName" >hello,\nworld!<br/>--by adam</p>
<div class= "Foot" > Other notes on the tail </div>
</body>
html = etree.fromstring (mypage)
#一, positioning
DIVS1 = Html.xpath ('//div ')
DIVS2 = Html.xpath ('//div[@id] ')
DIVS3 = Html.xpath ('//div[@class = "foot"])
DIVS4 = Html.xpath ('//div[@] ')
DIVS5 = Html.xpath ('//div[1] ')
DIVS6 = Html.xpath ('//div[last ()-1] ')
DIVS7 = Html.xpath ('//div[position () <3] ')
DIVS8 = Html.xpath ('//div|//h1 ')
DIVS9 = Html.xpath ('//div[not (@)] ')
Second, take the text () Difference Html.xpath (' string () ')
Text1 = Html.xpath ('//div/text () ')
Text2 = Html.xpath ('//div[@id]/text () ')
Text3 = Html.xpath ('//div[@class = ' foot ']/text () ')
Text4 = Html.xpath ('//div[@*]/text () ')
TEXT5 = Html.xpath ('//div[1]/text () ')
Text6 = Html.xpath ('//div[last () -1]/text () ')
Text7 = Html.xpath ('//div[position () <3]/text () ')
Text8 = Html.xpath ('//div/text () |//h1/text () ')
#三, take attribute @
value1 = Html.xpath ('//a/@href ')
value2 = Html.xpath ('//img/@src ')
Value3 = Html.xpath ('//div[2]/span/@id ')
#四, positioning (Advanced)
Find,findall methods for #1. Document (DOM) elements (Element)
DIVs = Html.xpath ('//div[position () <3] ')
For Div in divs:
The Div.findall (' a ') # can only be found here: Div->a, not found: div->p->a
For a in the other:
If A is not None:
#print (dir (a))
Print (A.text, a.attrib.get (' href ')) #文档 (DOM) Element property: text, attrib
2. Equivalent to 1
A_href = Html.xpath ('//div[position () <3]/a/@href ')
Print (A_HREF)
#3. Note the difference from 1, 2
A_href = Html.xpath ('//div[position () <3]//a/@href ')
Print (A_HREF)
Reference: https://www.cnblogs.com/hhh5460/p/5079465.html
Use of XPath: positioning, getting text and property values