Detailed Jsoup Select selector syntax

Source: Internet
Author: User

Detailed Jsoup Select selector syntax

This article references: Jsoup Chinese documents

Problem

You want to use CSS or jquery-like syntax to find and manipulate elements.

Method

Can be Element.select(String selector) implemented using and Elements.select(String selector) methods:

//从本地加载html文件File input = new File("/tmp/input.html");Document doc = Jsoup.parse(input, "UTF-8", "http://example.com/");//编码以及HTML页面URL前戳Elements links = doc.select("a[href]"); //带有href属性的a元素Elements pngs = doc.select("img[src$=.png]"); //扩展名为.png的图片Element masthead = doc.select("div.masthead").first(); //class等于masthead的div标签Elements resultLinks = doc.select("h3.r > a"); //在h3元素之后的a元素
Description

The Jsoup elements object supports selector syntax similar to CSS (or jquery) to achieve very powerful and flexible search functionality.

This select method Document Element can be used in,, or Elements in an object. and is context-sensitive, so filtering of the specified element can be implemented, or a chain-selectable access.

The Select method returns a Elements collection and provides a set of methods to extract and manipulate the results.

Selector Selector overview
    • tagname: Find elements through tags, such as:a
    • ns|tag: Find elements in namespaces through tags, such as: You can use fb|name syntax to find <fb:name> elements
    • #id: Finds elements by ID, such as:#logo
    • .class: Finds elements by class name, such as:.masthead
    • [attribute]: Use attributes to find elements such as:[href]
    • [^attr]: Use attribute name prefixes to find elements, such as: can be used [^data-] to find elements with the HTML5 DataSet attribute
    • [attr=value]: Use attribute values to find elements, such as:[width=500]
    • [attr^=value], [attr$=value] , [attr*=value] : Finds an element using the Match property value beginning, ending, or containing property values, such as:[href*=/path/]
    • [attr~=regex]: use attribute values to match regular expressions to find elements, such as:img[src~=(?i)\.(png|jpe?g)]
    • *: This symbol will match all elements
Selector selector combination use
    • el#id: element +id, e.g.:div#logo
    • el.class: element +class, e.g.:div.masthead
    • el[attr]: element +class, e.g.:a[href]
    • Any combination, such as:a[href].highlight
    • ancestor child: Finds elements of an element, such as: can be used to .body p find all elements under the "body" element p
    • parent > child: Finds immediate child elements under a parent element, such as: You can div.content > p find an p element, or you can body > * find all the immediate child elements under the body tag
    • siblingA + siblingB: Find the first sibling element B before the A element, for example:div.head + div
    • siblingA ~ siblingX: Finds the sibling x element before the a element, such as:h1 ~ p
    • el, el, el: Multiple selector combinations to find unique elements that match either selector, for example:div.masthead, div.logo
Pseudo Selector Selectors
    • :lt(n): Finds which element's sibling index value (its position is relative to its parent node in the DOM tree) is less than n, for example: td:lt(3) an element that represents less than three columns
    • :gt(n): Finds which elements have a sibling index value greater than n``,比如 : div p:gt(2) indicates which Div contains more than 2 P-elements
    • :eq(n): Finds which elements have a sibling index value n equal to, for example, form input:eq(1) a form element that contains a single input tag
    • :has(seletor): Finds the element that matches the selector containing the element, such as: div:has(p) indicates which div contains the P element
    • :not(selector): Finds elements that do not match the selector, such as: div:not(.logo) represents all div lists that do not contain the Class=logo element
    • :contains(text): Finds the element that contains the given text, and the search does not distinguish between large and non-writable, such as:p:contains(jsoup)
    • :containsOwn(text): Find elements that directly contain the given text
    • :matches(regex): Finds which elements of the text match the specified regular expression, such as:div:matches((?i)login)
    • :matchesOwn(regex): Finds an element that itself contains text that matches the specified regular expression
    • Note: The above pseudo-selector index starts at 0, which means that the first element has an index value of 0, the second element is index 1, and so on

You can view Selector the API reference to learn more

How to select elements of multiple class values

Example:<ul class="ul-ss-3 jb-xx-ks">

Method:

Elements select = document.select(".ul-ss-3").select(".jb-xx-bw");

Or

Elements select = document.getElementsByClass("ul-ss-3 jb-xx-bw");
SelectorAPI documentation

Official API Original: Selector (jsoup Java HTML Parser 1.11.3 API)

Pattern Matches Example
* any element *
tag The element with the given label signature div
*|E The element of type E in any namespace. *|nameFinds <fb:name> elements
ns|E The element of type E in the namespace. fb|nameFinds <fb:name> elements
#id element with ID Property ID div#wrap,#logo
.class An element with the class name "class" div.left,.result
[attr] An element with a property of "attr" (Any value) a[href],[title]
[^attrPrefix] The element whose property name begins with "Attrprefix". Finding elements using the HTML5 dataset [^data-],div[^data-]
[attr=val] The attribute of the element is "attr" and the value is "Val" img[width=500],a[rel=nofollow]
[attr="val"] The attribute of the element is "attr" and the value is "Val" span[hello="Cleveland"][goodbye="Columbus"],a[rel="nofollow"]
[attr^=valPrefix] The attribute of the element is "attr" and the value begins with "Valprefix" a[href^=http:]
[attr$=valSuffix] The attribute of the element is "attr" and the value ends with "Valfix" img[src$=.png]
[attr*=valContaining] The attribute of the element is "attr", which contains the property value "Valcontains" a[href*=/search/]
[attr~=*regex*] Element has a property named "Attr", and the value matches the regular expression img[src~=(?i)\\.(png|jpe?g)]
The above can be combined in any order. div.header[title]
Relationship Selector Combinators
Pattern Matches Example
E F The F element derived from the E element div a,.logo h1
E > F F is the direct sub-node of E ol > li
E + F An F element, immediately before E. li + li,div.head + div
E ~ F Precede the F element with the E h1 ~ p
E, F, G All matched elements e F G a[href], div, h3
Pseudo Selectors
Pattern Matches Example
:lt(*n*) Elements whose sibling index is less than n td:lt(3)Find the first 3 cells in each row
:gt(*n*) Elements whose sibling index is greater than n td:gt(1)Find cells after skipping the first two cells
:eq(*n*) The element whose sibling index equals n td:eq(0)Find the first cell in each row
:has(*selector*) An element that contains at least one element that matches the selector div:has(p)Find the div that contains the P element
:not(*selector*) The element that does not match the selector. SeeElements.not(String) div:not(.logo)Find all divs that do not have a "logo" class. div:not(:has(div))find a div that does not contain a div.
:contains(*text*) The element that contains the specified text. The search is case insensitive. The text can appear in the found element, or it can appear in any of its descendant elements. p:contains(jsoup)Finds the P element that contains "Jsoup" text.
:matches(*regex*) The element whose text matches the specified regular expression. The text can appear in the found element, or it can appear in any of its descendant elements. td:matches(\\d+)Finds table cells that contain numbers. div:matches((?i)login)Find the div that contains the text, not sensitive to the situation.
:containsOwn(*text*) The element that directly contains the specified text. The search is case insensitive. The text must appear in the found element, not in any of its descendant elements. p:containsOwn(jsoup)Finds the P element that has its own text "Jsoup".
:matchesOwn(*regex*) element whose own text matches the specified regular expression. The text must appear in the found element, not in any of its descendant elements. td:matchesOwn(\\d+)Find table cells that contain numbers directly. div:matchesOwn((?i)login)Find the div that contains the text, not sensitive to the situation.
:containsData(*data*) The element that contains the specified data. scriptand style The content elements, and comment nodes (etc.) are considered to be data nodes, not text nodes. The search is case insensitive. The data may appear in the found element or in any of its descendants. script:contains(jsoup)Find the script element that contains the data "Jsoup".
These can be combined in any order with other selectors .light:contains(name):eq(0)
:matchText Treats a text node as an element, allowing you to match and select a text node. Note that using this selector modifies the DOM, so you might want to clone the document before you use it. p:matchText:firstChildWith the input <p>One<br />Two</p> will return one PseudoTextElement with the text " One ".
Structural Pseudo Selectors
Pattern Matches Example
:root The element is the root of the document. In HTML, this is the html element :root
:nth-child(*a*n+*b*) There are sibling elements in the document tree *a*n+*b*-1 , for any positive integer or 0 value of N, and with the parent element. For values A and b greater than 0, this effectively divides the child elements of the element into groups of elements (the last group takes the remainder) and selects the bth element for each group. For example, this allows the selector to process other rows in the table and can be used to replace the color of the paragraph text in a 4-week period. The values of a and B must be integers (positive, negative, or 0). The index of the first child element of the element is 1. In addition, :nth-child() you can use odd and even numbers as parameters. The odd number is the same as the 2n+1, even with the meaning of 2n. tr:nth-child(2n+1)Find each row in the table. :nth-child(10n-1)9th, 19th, 29th, etc., element. The li:nth-child(5) 5h Li
:nth-last-child(*a*n+*b*) There are sibling elements behind the document tree *a*n+*b*-1 . Otherwise like:nth-child() tr:nth-last-child(-n+2)The last two rows of the table
:nth-of-type(*a*n+*b*) Pseudo-class notation represents an element that has *a*n+*b*-1 a sibling element that has the same extension element name in front of the document tree, n for any 0 or positive integer value, and has a parent element img:nth-of-type(2n+1)
:nth-last-of-type(*a*n+*b*) Pseudo-class notation represents an element that has a *a*n+*b*-1 sibling element that, in the document tree, has a parent element for any element that has a value of N of 0 or a positive integer. img:nth-last-of-type(2n+1)
:first-child Element is the first child element of another element. div > p:first-child
:last-child The last child element of the other element. ol > li:last-child
:first-of-type The first sibling element of its type in the list of child elements of the parent element dl dt:first-of-type
:last-of-type element, which is the last sibling element of the type in the list of child elements of its parent element. tr > td:last-of-type
:only-child element with parent element and no other element child element of parent element
:only-of-type An element with a parent element whose parent element has no other element child element with the same expanded element name
:empty No child elements.

Detailed Jsoup Select selector syntax

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.