This article introduces the basic methods for selenium crawling web page data and Automatic Webpage testing. When selenium is used, we often need to find the DOM elements on the page. We must tell selenium how to find the specified element on the page before it can execute events such as clicking and retrieving text. XPath can be used at this time. It is a language used to search for information in XML and HTML documents. Through simple expression, it can describe the location of a node in XML documents, so that selenium can find this node.
The following describes the most common usage of xpath based on the author's experience in using selenium. The example document is as follows:
<div> <div class='center'> <p> test1 </p> <p> test2 </p> </div></div>
Suppose we are looking for the node where the string test1 is located. How do we represent it? It is the content of the P node, and this p node is the first p node of the DIV whose class value is center. Therefore,
Its XPath can be described as follows: // Div [@ class = 'center']/P [1] note that '/' and '@' in the expression '//','@'. '[1]'. What are their functions?
1 .//Search the entire HTML document. Take this article as an example. // Following the DIV, it means to search for all DIV elements in the HTML document. In this example, there are two DIV elements, so there are two that meet the conditions. If/is changed to/, it indicates that the query starts from the root node. Therefore,/Div indicates that the DIV element under the root node is searched. Therefore, although the example document contains two divs, however, only the Div under the root node meets the condition.
2 .@
In the example document, there are two divs. How can we accurately express the DIV we want? Note that the DIV has the class attribute and the value is center. Its XPath expression is Div [@ class = 'center']. The class can also be changed to other property values. For example, if the DIV has an ID and the value is test, the corresponding XPath is Div [@ ID = 'test'].
3. [1]In the example document, if the/Div [@ class = 'center']/P rule is met, there are two P nodes. If we want the first one, you need to add [1]. If you want the second one, you need to add [2], and so on.
4 .** A wildcard, like a regular expression, represents any character. In this example, only one node has a class, so we do not need to specify it as a DIV, change to // * [@ class = 'center']/P [1.