Jsoup Selector Syntax description

Source: Internet
Author: User

Jsoup is a Java-based HTML parser that can parse a URL address or HTML text content directly. It provides a very labor-saving API that can be used to extract and manipulate data through dom,css and jquery-like operations.
Jsoup is powerful in its retrieval of document elements, the Select method returns a elements collection and provides a set of methods to extract and manipulate the results, mastering Jsoup first to familiarize itself with its selector syntax.
1. Basic syntax of selector selector

    • TagName: Find elements by tags, such as: a
    • Ns|tag: Find elements in namespaces through tags, such as: You can find <fb:name> elements with fb|name syntax
    • #id: Find elements by ID, such as: #logo
    • . Class: Finds elements by class name, for example:. Masthead
    • [attribute]: Use attributes to find elements, such as: [href]
    • [^attr]: Use the attribute name prefix to find elements, such as: You can use [^data-] to find the element with the HTML5 dataset property
    • [Attr=value]: Use attribute values to find elements, such as: [width=500]
    • [Attr^=value], [Attr$=value], [Attr*=value]: Finds an element with a matching attribute value beginning, ending, or containing an attribute value, such as: [href*=/path/]
    • [Attr~=regex]: Use attribute values to match regular expressions to find elements, such as: img[src~= (? i) \. ( PNG|JPE?G)]
    • *: This symbol will match all elements

2. Selector selector combination using syntax

    • El#id: Element +id, for example: Div#logo
    • El.class: Element +class, for example: Div.masthead
    • El[attr]: element +class, for example: A[href]
    • Any combination, such as: A[href].highlight
    • Ancestor Child: Finds a child element of an element, such as: you can use the. Body p to find all p elements under the "body" element
    • Parent > Child: Find immediate sub-elements under a parent element, such as: You can use Div.content > P to find the P element, or you can use body > * To find all the immediate child elements under the body tag
    • Siblinga + SIBLINGB: Finds the first sibling element B before the A element, such as: Div.head + div
    • Siblinga ~ Siblingx: Finds the sibling x element before the a element, such as: H1 ~ P
    • El, El, el: Multiple selector combinations, finding unique elements that match either selector, for example: Div.masthead, Div.logo

3. Selector Pseudo-Selector syntax

    • : LT (n): finds which element's sibling index value (its position is relative to its parent node in the DOM tree) is less than n, for example: Td:lt (3) represents an element less than three columns
    • : GT (N): Find which elements have a sibling index value greater than N, for example: Div p:gt (2) indicates which Div contains more than 2 p elements
    • : EQ (n): Find which elements have the same sibling index value as N, for example: Form Input:eq (1) represents a form element that contains an input tag
    • : Has (Seletor): Finds elements that match selectors that contain elements, such as: Div:has (P), which div contains the P element
    • : Not (selector): Finds elements that do not match the selector, such as: Div:not (. logo) for all Div lists that do not contain class= "logo" elements
    • : Contains (text): Find the element containing the given text, search does not distinguish between large and non-written, such as: P:contains (Jsoup)
    • : Containsown (text): Find the element that directly contains the given text
    • : Matches (regex): finds which elements of text match the specified regular expression, such as: Div:matches ((? i) login)
    • : Matchesown (Regex): Find an element that itself contains text that matches a specified regular expression

Note: The above pseudo-selector index starts at 0, which means that the first element has an index value of 0, the second element is index 1, and so on.

Jsoup Selector Syntax description

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.