3. Matching ResultsYou can use the XPath () method to match, notice that the method returns a matching list, and that each item in the list is a _element object
(1) /
represents a descendant, such as E1/e2, that represents the E2 node in the E1 child node, and/E represents the E. node in the text sub-node
>>> test = html.xpath(‘/html/body/div/a‘)>>> print(test)[<Element a at 0x3843bc0>, <Element a at 0x3843c10>, <Element a at 0x3843c38>, <Element a at 0x3843c60>, <Element a at 0x3843c88>]
(2) //
represents descendants, such as E1//e2, which represents the E2 node in the E1 descendant node,//e represents the E node in the document descendant node
>>> test = html.xpath(‘//a‘)>>> print(test)[<Element a at 0x3843bc0>, <Element a at 0x3843c10>, <Element a at 0x3843c38>, <Element a at 0x3843c60>, <Element a at 0x3843c88>]
(3) *
represents an attribute node, such as e/*, which represents all nodes in the E child node
>>> test = html.xpath(‘/html/*‘)>>> print(test)[<Element head at 0x3843be8>, <Element body at 0x3843c10>]
(4) text()
indicates that a text node, such as E/text (), represents a text node in the E child node
>>> test = html.xpath(‘/html/head/title/text()‘)>>> print(test)[‘Example website‘]
(5) @ATTR
represents an attribute node, such as e/@ATTR represents the attr attribute node in the E child node
>>> test = html.xpath(‘//a/@href‘)>>> print(test)[‘image1.html‘, ‘image2.html‘, ‘image3.html‘, ‘image4.html‘, ‘image5.html‘]
(6) 谓语
to match the specified label
#指定第二个a标签>>> test = html.xpath(‘//a[2]‘)>>> print(test)[<Element a at 0x3843c88>]#指定前两个a标签>>> test = html.xpath(‘//a[position()<=2]‘)>>> print(test)[<Element a at 0x3843c60>, <Element a at 0x3843c88>]#指定带有href属性的a标签>>> test = html.xpath(‘//a[@href]‘)>>> print(test)[<Element a at 0x3843c38>, <Element a at 0x385c300>, <Element a at 0x385c2d8>, <Element a at 0x385c350>, <Element a at 0x385c328>]#指定带有href属性且值为image1.html的a标签>>> test = html.xpath(‘//a[@href="image1.html"]‘)>>> print(test)[<Element a at 0x3843c38>]
4. Common properties and methods of _element objectsWe get the matching list first using the XPath () method Tests,tests is a _element object
>>> tests = html.xpath(‘//a‘)
(1) Attribute tag
return label signature
>>> for test in tests: test.tag‘a‘‘a‘‘a‘‘a‘‘a‘
(2) Property attrib
returns a dictionary of properties and values
>>> for test in tests: test.attrib{‘href‘: ‘image1.html‘}{‘href‘: ‘image2.html‘}{‘href‘: ‘image3.html‘}{‘href‘: ‘image4.html‘}{‘href‘: ‘image5.html‘}
(3) Method get()
Returns the value of the specified property
>>> for test in tests: test.get(‘href‘)‘image1.html‘‘image2.html‘‘image3.html‘‘image4.html‘‘image5.html‘
(4) Property text
returns text value
>>> for test in tests: test.text‘Image1‘‘Image2‘‘Image3‘‘Image4‘‘Image5‘
written in the following words : Now we have learned the basic use of requests and lxml.etree modules, the next article we will use them for a basic crawler training, thank you
Basic use of XPath in Python