Detailed Jsoup Select selector syntax

Last Update:2018-07-26 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article references: Jsoup Chinese documents

Problem

You want to use CSS or jquery-like syntax to find and manipulate elements.

Method

Can be Element.select(String selector) implemented using and Elements.select(String selector) methods:

//从本地加载html文件File input = new File("/tmp/input.html");Document doc = Jsoup.parse(input, "UTF-8", "http://example.com/");//编码以及HTML页面URL前戳Elements links = doc.select("a[href]"); //带有href属性的a元素Elements pngs = doc.select("img[src$=.png]"); //扩展名为.png的图片Element masthead = doc.select("div.masthead").first(); //class等于masthead的div标签Elements resultLinks = doc.select("h3.r > a"); //在h3元素之后的a元素

Description

The Jsoup elements object supports selector syntax similar to CSS (or jquery) to achieve very powerful and flexible search functionality.

This select method Document Element can be used in,, or Elements in an object. and is context-sensitive, so filtering of the specified element can be implemented, or a chain-selectable access.

The Select method returns a Elements collection and provides a set of methods to extract and manipulate the results.

Selector Selector overview

tagname: Find elements through tags, such as:a
ns|tag: Find elements in namespaces through tags, such as: You can use fb|name syntax to find <fb:name> elements
#id: Finds elements by ID, such as:#logo
.class: Finds elements by class name, such as:.masthead
[attribute]: Use attributes to find elements such as:[href]
[^attr]: Use attribute name prefixes to find elements, such as: can be used [^data-] to find elements with the HTML5 DataSet attribute
[attr=value]: Use attribute values to find elements, such as:[width=500]
[attr^=value], [attr$=value] , [attr*=value] : Finds an element using the Match property value beginning, ending, or containing property values, such as:[href*=/path/]
[attr~=regex]: use attribute values to match regular expressions to find elements, such as:img[src~=(?i)\.(png|jpe?g)]
*: This symbol will match all elements

Selector selector combination use

el#id: element +id, e.g.:div#logo
el.class: element +class, e.g.:div.masthead
el[attr]: element +class, e.g.:a[href]
Any combination, such as:a[href].highlight
ancestor child: Finds elements of an element, such as: can be used to .body p find all elements under the "body" element p
parent > child: Finds immediate child elements under a parent element, such as: You can div.content > p find an p element, or you can body > * find all the immediate child elements under the body tag
siblingA + siblingB: Find the first sibling element B before the A element, for example:div.head + div
siblingA ~ siblingX: Finds the sibling x element before the a element, such as:h1 ~ p
el, el, el: Multiple selector combinations to find unique elements that match either selector, for example:div.masthead, div.logo

Pseudo Selector Selectors

:lt(n): Finds which element's sibling index value (its position is relative to its parent node in the DOM tree) is less than n, for example: td:lt(3) an element that represents less than three columns
:gt(n): Finds which elements have a sibling index value greater than n``，比如 : div p:gt(2) indicates which Div contains more than 2 P-elements
:eq(n): Finds which elements have a sibling index value n equal to, for example, form input:eq(1) a form element that contains a single input tag
:has(seletor): Finds the element that matches the selector containing the element, such as: div:has(p) indicates which div contains the P element
:not(selector): Finds elements that do not match the selector, such as: div:not(.logo) represents all div lists that do not contain the Class=logo element
:contains(text): Finds the element that contains the given text, and the search does not distinguish between large and non-writable, such as:p:contains(jsoup)
:containsOwn(text): Find elements that directly contain the given text
:matches(regex): Finds which elements of the text match the specified regular expression, such as:div:matches((?i)login)
:matchesOwn(regex): Finds an element that itself contains text that matches the specified regular expression
Note: The above pseudo-selector index starts at 0, which means that the first element has an index value of 0, the second element is index 1, and so on

You can view Selector the API reference to learn more

How to select elements of multiple class values

Example:<ul class="ul-ss-3 jb-xx-ks">

Method:

Elements select = document.select(".ul-ss-3").select(".jb-xx-bw");

Elements select = document.getElementsByClass("ul-ss-3 jb-xx-bw");

SelectorAPI documentation

Official API Original: Selector (jsoup Java HTML Parser 1.11.3 API)

Pattern	Matches	Example
`*`	any element	`*`
`tag`	The element with the given label signature	`div`
`*\|E`	The element of type E in any namespace.	`*\|name`Finds `<fb:name>` elements
`ns\|E`	The element of type E in the namespace.	`fb\|name`Finds `<fb:name>` elements
`#id`	element with ID Property ID	`div#wrap`,`#logo`
`.class`	An element with the class name "class"	`div.left`,`.result`
`[attr]`	An element with a property of "attr" (Any value)	`a[href]`,`[title]`
`[^attrPrefix]`	The element whose property name begins with "Attrprefix". Finding elements using the HTML5 dataset	`[^data-]`,`div[^data-]`
`[attr=val]`	The attribute of the element is "attr" and the value is "Val"	`img[width=500]`,`a[rel=nofollow]`
`[attr="val"]`	The attribute of the element is "attr" and the value is "Val"	`span[hello="Cleveland"][goodbye="Columbus"]`,`a[rel="nofollow"]`
`[attr^=valPrefix]`	The attribute of the element is "attr" and the value begins with "Valprefix"	`a[href^=http:]`
`[attr$=valSuffix]`	The attribute of the element is "attr" and the value ends with "Valfix"	`img[src$=.png]`
`[attr*=valContaining]`	The attribute of the element is "attr", which contains the property value "Valcontains"	`a[href*=/search/]`
`[attr~=regex]`	Element has a property named "Attr", and the value matches the regular expression	`img[src~=(?i)\\.(png\|jpe?g)]`
	The above can be combined in any order.	`div.header[title]`

Relationship Selector Combinators

Pattern	Matches	Example
`E F`	The F element derived from the E element	`div a`,`.logo h1`
`E > F`	F is the direct sub-node of E	`ol > li`
`E + F`	An F element, immediately before E.	`li + li`,`div.head + div`
`E ~ F`	Precede the F element with the E	`h1 ~ p`
`E, F, G`	All matched elements e F G	`a[href], div, h3`

Pseudo Selectors

Pattern	Matches	Example
`:lt(n)`	Elements whose sibling index is less than n	`td:lt(3)`Find the first 3 cells in each row
`:gt(n)`	Elements whose sibling index is greater than n	`td:gt(1)`Find cells after skipping the first two cells
`:eq(n)`	The element whose sibling index equals n	`td:eq(0)`Find the first cell in each row
`:has(selector)`	An element that contains at least one element that matches the selector	`div:has(p)`Find the div that contains the P element
`:not(selector)`	The element that does not match the selector. See`Elements.not(String)`	`div:not(.logo)`Find all divs that do not have a "logo" class. `div:not(:has(div))`find a div that does not contain a div.
`:contains(text)`	The element that contains the specified text. The search is case insensitive. The text can appear in the found element, or it can appear in any of its descendant elements.	`p:contains(jsoup)`Finds the P element that contains "Jsoup" text.
`:matches(regex)`	The element whose text matches the specified regular expression. The text can appear in the found element, or it can appear in any of its descendant elements.	`td:matches(\\d+)`Finds table cells that contain numbers. `div:matches((?i)login)`Find the div that contains the text, not sensitive to the situation.
`:containsOwn(text)`	The element that directly contains the specified text. The search is case insensitive. The text must appear in the found element, not in any of its descendant elements.	`p:containsOwn(jsoup)`Finds the P element that has its own text "Jsoup".
`:matchesOwn(regex)`	element whose own text matches the specified regular expression. The text must appear in the found element, not in any of its descendant elements.	`td:matchesOwn(\\d+)`Find table cells that contain numbers directly. `div:matchesOwn((?i)login)`Find the div that contains the text, not sensitive to the situation.
`:containsData(data)`	The element that contains the specified data. `script`and `style` The content elements, and `comment` nodes (etc.) are considered to be data nodes, not text nodes. The search is case insensitive. The data may appear in the found element or in any of its descendants.	`script:contains(jsoup)`Find the script element that contains the data "Jsoup".
	These can be combined in any order with other selectors	`.light:contains(name):eq(0)`
`:matchText`	Treats a text node as an element, allowing you to match and select a text node. Note that using this selector modifies the DOM, so you might want to clone the document before you use it.	`p:matchText:firstChild`With the input `<p>One<br />Two</p>` will return one `PseudoTextElement` with the text " `One` ".

Structural Pseudo Selectors

Pattern	Matches	Example
`:root`	The element is the root of the document. In HTML, this is the `html` element	`:root`
`:nth-child(an+b)`	There are sibling elements in the document tree `an+b-1` , for any positive integer or 0 value of N, and with the parent element. For values A and b greater than 0, this effectively divides the child elements of the element into groups of elements (the last group takes the remainder) and selects the bth element for each group. For example, this allows the selector to process other rows in the table and can be used to replace the color of the paragraph text in a 4-week period. The values of a and B must be integers (positive, negative, or 0). The index of the first child element of the element is 1. In addition, `:nth-child()` you can use odd and even numbers as parameters. The odd number is the same as the 2n+1, even with the meaning of 2n.	`tr:nth-child(2n+1)`Find each row in the table. `:nth-child(10n-1)`9th, 19th, 29th, etc., element. The `li:nth-child(5)` 5h Li
`:nth-last-child(an+b)`	There are sibling elements behind the document tree `an+b-1` . Otherwise like`:nth-child()`	`tr:nth-last-child(-n+2)`The last two rows of the table
`:nth-of-type(an+b)`	Pseudo-class notation represents an element that has `an+b-1` a sibling element that has the same extension element name in front of the document tree, n for any 0 or positive integer value, and has a parent element	`img:nth-of-type(2n+1)`
`:nth-last-of-type(an+b)`	Pseudo-class notation represents an element that has a `an+b-1` sibling element that, in the document tree, has a parent element for any element that has a value of N of 0 or a positive integer.	`img:nth-last-of-type(2n+1)`
`:first-child`	Element is the first child element of another element.	`div > p:first-child`
`:last-child`	The last child element of the other element.	`ol > li:last-child`
`:first-of-type`	The first sibling element of its type in the list of child elements of the parent element	`dl dt:first-of-type`
`:last-of-type`	element, which is the last sibling element of the type in the list of child elements of its parent element.	`tr > td:last-of-type`
`:only-child`	element with parent element and no other element child element of parent element
`:only-of-type`	An element with a parent element whose parent element has no other element child element with the same expanded element name
`:empty`	No child elements.

Detailed Jsoup Select selector syntax

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Detailed Jsoup Select selector syntax

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Detailed Jsoup Select selector syntax

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support