Jsoup code interpretation of the five-implementation of a CSS Selector

Source: Internet
Author: User

Jsoup Code Interpretation VII-Implementation of a CSS Selector

When to be! Finally came to the Jsoup features: CSS selector section. Selector is also one of the key points that I wrote about the reptile framework webmagic development. Attach a map of Street Fighter, hope that the future webmagic can also challenge jsoup!

Select mechanism

In the Jsoup select package, the class structure is as follows:

At the beginning of the introduction of Jsoup, it has been said NodeVisitor and Selector . Selectoris the external facade of the select part, while the NodeVisitor underlying api,css of the traversal Tree selector is also based on NodeVisitor the traversal of the implementation.

Jsoup's Select Core is Evaluator . The expression passed by selector will pass through QueryParser and eventually compile into one Evaluator . Evaluatoris an abstract class, it has only one method:

public abstract boolean matches(Element root, Element element);

Note that the root is passed in for some cases to traverse the tree.

Evaluator's design is simple and straightforward, and all selector expression words are compiled to the corresponding evaluator. For example #xx corresponds Id , corresponds .xx Class , [] corresponds Attribute . Here to add the CSS selector specification: http://www.w3.org/TR/CSS2/selector.html

Of course, this is not enough, Jsoup also defines (a CombiningEvaluator and/or combination of evaluator) StructuralEvaluator (combined with the DOM tree structure).

What we may be most concerned about here is how the parent-child structure such as Div ul Li is implemented. The implementation of this method in StructuralEvaluator.Parent , paste the code:

StaticClassParentExtendsStructuralevaluator {public parent (Evaluator Evaluator) {this.evaluator = Evaluator } public Boolean matches (element root, Element Element) {if ( root = = Element) return FALSE; Element parent = Element. Parent (); while (parent! = root) {if ( Evaluator.matches (Root, parent)) return true; parent = parent. Parent (); } return false;}        

Here the parent contains a evaluator property that validates all parent nodes according to the evaluator. Note that the parent can be nested, so the expression "div ul Li" will eventually be compiled into And(Parent(And(Parent(Tag("div")),Tag("ul")),Tag("li"))) such a evaluator combination.

The Select section is simpler than you think, and the code is very readable. After a parser part of the study, this part should be regarded as a very familiar.

A follow-up plan for WebMagic

WebMagic is a reptile framework, its selector is used to crawl the text specified in the HTML, its mechanism and Jsoup evaluator very much like, but WebMagic temporarily is to encapsulate selector into a simpler API, and evaluator directly on the expression. Before also consider their own custom DSL to write an HTML, now see the Jsoup source code, realize the ability to have, but the introduction of DSL, implementation is only a small part, how to make the DSL easy to write easy to understand is the difficulty.

Actually looked at the Jsoup source code, the finer degree is better than the WebMagic, the basic each class corresponds to a real concept abstraction, may later in this aspect work.

Jsoup code interpretation of the five-implementation of a CSS Selector

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.