About Sizzle's "Compilation Principle" _ others-js tutorial

Source: Internet
Author: User
Students who are learning the Sizzle source code or have a certain front-end foundation can read the source code and read these articles for verification. So although I will analyze the regular expressions in the source code, there will be a lot of comments, but I will not talk about the basic usage of regular expressions! Sizzle is the DOM selector engine written by John Resig, the author of jQuery. It ranks first in the industry in terms of speed. As an independent new selector engine, it appeared after jQuery 1.3 and was named John Resig as an open-source project. Sizzle is an independent part and does not rely on any libraries. If you do not want to use jQuery, you can only use Sizzle or use it in other frameworks such as Mool, Dojo, and YUI.

A few days ago, I prepared a PPT to share jQuery. I asked my colleagues about jQuery, except for its usage, some people have mentioned how the selector is implemented, and some people have mentioned how the query speed of jQuery is better than that of other frameworks. With regard to speed, sizzle's official website can download and test examples. Speed is indeed advantageous. But why is it so efficient? It is related to the implementation principle discussed here.

Before learning about Sizzle, you must first understand what the selector is like. Here is a simple example. anyone familiar with jQuery must be familiar with this selector format:

The Code is as follows:


Tag # id. class, a: first

It is basically a step-by-step filtering from left to right to find matching dom elements. This statement is not complicated yet. If we implement this query statement by ourselves, it is not difficult. However, the query statement has only basic rules and there is no fixed number or sequence of delimiters. How can we adapt to this random arrangement and combination by writing our own code? Sizzle can perform normal parsing and execution in various situations.

The source code of Sizzle is indeed complicated and cannot be easily understood. First, let's look at the three methods that I personally think are core to the entire implementation:

The first core method. Line 3 of source code has a tokenize function:

The Code is as follows:


Function tokenize (selector, parseOnly ){}

The second parameter parseOnly is false, which means that only token serialization operations are performed without returning results. In this case, serialized results will be cached for backup. Selector is the query statement.

After this function is processed, for example, if selector = "# idtag. class, a: first" is passed in, a format similar to the following result can be obtained:

[[{matches:" id ",type:"ID"},{matches:" tag ",type:"TAG"},{matches:" class ",type:"CLASS"},...],[    {matches:" a",type:"TAG"},    ...],[…],…]


Seeing the name and function of tokenize makes it easy for me to remember the word "Compilation Principle. This is a bit like lexical analysis, but this lexical analysis is simpler than the lexical analysis performed during program compilation.

The tokenize method performs "Word Segmentation" based on the comma, space, and regular expression of The Link selector to get a two-dimensional array (please allow me to use this is not very accurate ), the first-dimension array is separated by commas (,). It is called groups in the source code.

Let's take a look at the source code line 405th and start with an Expr = Sizzle. selectors = {}. When there is a filter definition in row 567, we can find the basic filter type here: "ID", "TAG", "CLASS", "ATTR", "CHILD", and "PSEUDO", the types finally classified by tokenize are also these types.

After the word segmentation is completed, the Expr = Sizzle. selectors = {} defined in row 405 is still displayed {}. We can find all the selector characters that we are familiar with. Each selector corresponds to a method definition. Here we should think that Sizzle is actually to perform a "Word Segmentation" on the selector, and then find the corresponding method from Expr to execute the specific query or filter operations after the split?

The answer is basically yes. However, Sizzle has more specific and clever practices. Let's look at the second method that I think is very core:

There is a matcherFromTokens function in line 1 of the source code:

The Code is as follows:


Function matcherFromTokens (tokens ){}

The passed parameters are obtained from the tokenize method. Matcher can be understood as a "matching program". Literally, the function is used to generate a matching program through tokens. Actually, this is true. For the time being, this article only shares the implementation principles of some Sizzle I have understood, without the source code. I may try to organize a more detailed source code analysis article later.

The matcherFromTokens method confirms the previous assumption that it acts as a concatenation and bond between the selector "Word Segmentation" and the matching method defined in Expr, it can be said that various permutation and combination of character selection can be adapted. Sizzle is clever because it does not directly match the obtained "Word Segmentation" result with the methods in Expr one by one, but first combines a large Matching Method Based on the rule and implements the last step. However, after the combination, you have to look at the key third method:

Line 2 of source code has a superMatcher method:

The Code is as follows:


SuperMatcher = function (seed, context, xml, results, expandContext ){}

This method is not a directly defined method, but is returned through the matcherFromGroupMatchers (elementMatchers, setMatchers) method of the 1345 rows, but it plays an important role in the final execution.

The superMatcher method determines a starting query range based on the seed, expandContext, and context parameters, which may be directly queried and filtered from seed, or within the parent node range of context or context. If it is not from seed, it will first execute Expr. find ["TAG"] ("*", expandContext & context. parentNode | context) the code waits for an elems set (array ). Then perform a traversal on elems and match the elements one by one using the pre-generated matcher method. If the result is true, the elements are directly heap into the returned result set.

Okay, here we can see that the original running result of the matcher method is a bool value. We will return Row 3 to check that the methods included in the filter in Expr all return the bool value. More PSEUDO-class methods, including PSEUDO classes, are the same. It seems a little revolutionizing my original idea. It wasn't a layer-by-layer lookup, but it was a bit reverse-going to make matching and filtering. In Expr, only find and preFilter return a set.

Although there are still some questions here, that is, why does it use the matching and filtering methods one by one to get the result set, but I think the most basic "Compilation Principle" of Sizzle should have been explained clearly.

But the question cannot be left. Let's continue. In fact, this article has a bit of experience. Those who are interested in the source code will not see these three key methods at the beginning. In fact, Sizzle did a series of other work before entering these three methods.

The real entry of Sizzle can be said to be in the source code line 220:

The Code is as follows:


Function Sizzle (selector, context, results, seed ){}

The previous section of this method is easy to understand. If the selector is matched with a single ID selector (# id), the context is directly used according to the id. the getElementById (m) method finds the element. If the selector is a single TAG selector, use the context. getElementsByTagName (selector) method to find the relevant elements. If the current browser only uses the native getElementsByClassName and matches the selector as a single CLASS selector, the context. getElementsByClassName (m) method will also be used to find the relevant elements. These three methods are all native methods supported by browsers, and the execution efficiency must be the highest.

If the most basic method is not used, the select method is entered. Line 3 of the source code has its definition:

The Code is as follows:


Function select (selector, context, results, seed, xml ){}

In the select method, we will first perform the "word segmentation" operation on the selector. However, after this operation, the matching method is not directly assembled, but some find operations are performed first. The find operation corresponds to the find operation in Expr, which performs the query operation and returns the result set.

As you can understand, the select operator obtained by using the word segmentation first finds the result set that can be searched using the find method based on its type. When performing the find operation, the range of result sets is reduced from left to right according to the sequence of Selection Characters. If all the selector delimiters can execute the find operation after a traversal, the result is directly returned. Otherwise, you will go to the previous "Compilation" to execute the filter process.

Here, we can also follow the process and basically understand the workflow of Sizzle. The questions left above are not in doubt at this time, because when performing reverse matching filtering, its search range is already the smallest set of filtering by layer. The reverse matching filtering method is also an efficient choice for the selection operators corresponding to it, such as pseudo classes.

Let's briefly summarize why Sizzle is very efficient.

First, from the processing process, it always uses the most efficient native method for processing. We have been introducing only the implementation method of Sizzle's selector. When Sizzle is actually executed, it will first determine whether the current browser supports the native method querySelectorAll (source code 1545 lines ). If this method is supported, the method is preferred. The method supported by the browser is more efficient than the method written by Sizzle's own js, and the higher work efficiency of Sizzle can be ensured by prior use. (For more information about querySelectorAll, visit the Internet ). If the querySelectorAll method is not supported, Sizzle gives priority to determine whether the problem can be solved by using methods such as getElementById, getElementsByTag, and getElementsByClassName.

Secondly, in a relatively complex situation, Sizzle always chooses to use native methods to query and select to narrow the scope of selection, then, the "Compilation Principle" described above will be used to match and filter the elements of the selected range one by one. The workflow entering the "Compilation" step is somewhat complicated, and the efficiency will be slightly lower than the previous method, but Sizzle is trying to use these methods as little as possible, at the same time, we try to make the result set processed by these methods as small and simple as possible to achieve higher efficiency.

Once again, even in this "Compilation" process, Sizzle has implemented a cache mechanism that we temporarily ignore to give priority to explaining the process. The 1535 line of source code is the so-called "Compilation" entry, that is, it will call the third core method superMatcher. The system traces the data and looks at the second row. The compile method caches the matching function generated by the selector. The tokenize method is also cached Based on the word splitting result of selector. That is to say, after we run the Sizzle (selector) method once, we will directly call the Sizzle (selector) method next time, the "Compilation" process, which consumes the most internal performance, will not consume too much performance. You can simply use the previously cached method. I am thinking that one of the biggest advantages of "Compilation" may be that it is easy to cache. The so-called "Compilation" can be understood as generating preprocessing functions to store them for backup.

At this point, I want to answer my questions about the selector implementation principle and execution efficiency. In addition, the analysis conclusion of this article is derived from the source code of the latest version of Sizzle. The code line numbers mentioned in this Article refer to the source code of this version and can be downloaded from http://sizzlejs.com. The time was too short. If you have any questions, please be careful when making a brick. If you have any questions, you are welcome to continue communication.

The above is all the content of this article. I hope you will like it.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.