Sizzle engine-Principles and Practices (II)

Source: Internet
Author: User
Main processes and regular expressions are segmented
var chunker = /((?:\((?:\([^()]+\)|[^()]+)+\)|\[(?:\[[^[\]]*\]|['"][^'"]*['"]|[^[\]'"]+)+\]|\\.|[^ >+~,(\[\\]+)+|[>+~])(\s*,\s*)?((?:.|\r|\n)*)/g;

This regular expression is long and mainly used for segmentation and one-step preprocessing.

1,

2,

3,

4,

'div#test + p > a.tab'    --> ['div#test','+','p','>','a.tab']
Extract the corresponding type from the expression:

According to the jquery selector, there are 7 types
Id selector, class selector, tag selector, ATTR attribute selector, child element selector, pseudo class selector, POS position Selector

The method to judge is regular. The specific regular expression is as follows:

ID    : /#((?:[\w\u00c0-\uFFFF\-]|\\.)+)/,
CLASS : /\.((?:[\w\u00c0-\uFFFF\-]|\\.)+)/,
NAME : /\[name=['"]*((?:[\w\u00c0-\uFFFF\-]|\\.)+)['"]*\]/,
ATTR : /\[\s*((?:[\w\u00c0-\uFFFF\-]|\\.)+)\s*(?:(\S?=)\s*(?:(['"])(.*?)\3|(#?(?:[\w\u00c0-\uFFFF\-]|\\.)*)|)|)\s*\]/,
TAG : /^((?:[\w\u00c0-\uFFFF\*\-]|\\.)+)/,
CHILD : /:(only|nth|last|first)-child(?:\(\s*(even|odd|(?:[+\-]?\d+|(?:[+\-]?\d*)?n\s*(?:[+\-]\s*\d+)?))\s*\))?/,
POS : /:(nth|eq|gt|lt|first|last|even|odd)(?:\((\d*)\))?(?=[^\-]|$)/,
PSEUDO: /:((?:[\w\u00c0-\uFFFF\-]|\\.)+)(?:\((['"]?)((?:\([^\)]+\)|[^\(\)]*)+)\2\))?/

 

ID:

Class:

Name:

Tag:

ATTR:

POs:

Pseudo:

Regular Expression tips:

? Non-Greedy quantifiers
\ 3 matching score
? = Forward pre-Query

These regular expressions may not be easy to understand at first, but it is better to understand the specific jquery selector:

Pos --: First: Nth (): Last: GT: Lt: Even: Odd. These are newly added by sizzle and have nothing to do with CSS.

Others are basically the same as CSS. It should be noted that, due to the existence of pseudo, the same expression may match multiple types at the same time, which will be mentioned in the filter section later.

 

The above regular string is saved in the match attribute of expr,

Expr = {
match:{
//ID:....
}
}

This part of the regular expression is not directly used, and further processing is performed.
First, a judgment is added after each string to ensure that the matching result is not included at the end.) or}

/#((? : [\ W \ u00c0-\ Uffff \-] | \.) +)/convert /#((? : [\ W \ u00c0-\ Uffff \-] | \.) + )(?! [^ \ [] * \]) (?! [^ \ (] * \)/

Second, at the same time, sizzle detects escape characters. Therefore, a capture group is added to each part of the header to save the part before the target string,
In this step, because a group is added to the header, the \ 3 and other symbols in the original regular string must be removed sequentially.

/#((? : [\ W \ u00c0-\ Uffff \-] | \.) + )(?! [^ \ [] * \]) (?! [^ \ (] * \)/
/(^ (? :. | \ R | \ n )*?) #((? : [\ W \ u00c0-\ Uffff \-] | \.) + )(?! [^ \ [] * \]) (?! [^ \ (] * \)/

 

/:((? : [\ W \ u00c0-\ Uffff \-] | \.) + )(? : \ (['"]?) ((? : \ ([^ \)] + \) | [^ \ (\)] *) +) \ 2 \))? /Change
/(^ (? :. | \ R | \ n )*?) :((? : [\ W \ u00c0-\ Uffff \-] | \.) + )(? : \ (['"]?) ((? : \ ([^ \)] + \) | [^ \ (\)] *) +) \ 3 \))? (?! [^ \ [] * \]) (?! [^ \ (] * \)/

Corresponding to the source code is:

var    fescape = function(all, num){
return "\\" + (num - 0 + 1);
};

for ( var type in Expr.match ) {
Expr.match[ type ] = new RegExp( Expr.match[ type ].source + (/(?![^\[]*\])(?![^\(]*\))/.source) );
Expr.leftMatch[ type ] = new RegExp( /(^(?:.|\r|\n)*?)/.source + Expr.match[ type ].source.replace(/\\(\d+)/g, fescape) );
}

Expr. leftmatch stores the processed regular part. Another advantage of this is to avoid creating a new Regexp object for each matching.

 

Go back to the main process function introduction:

VaR sizzle = function (selector, context, results, seed ){}
Sizzle has four parameters:
Selector: Select expression
Context: Context
Results: result set
Seed: Candidate Set

Instance description:

Sizzle ('div ', # test, [# A, # B], [# C, # D, # E]) is in the set [# C, # D, # E] searches for elements that meet the condition (in the range of # test and label it as Div), and then saves the results that meet the condition to [# A, # B, assume that # D, # E is met, and the final result is [# A, # B, # D, # E].

Sample Code:
VaR sizzle = function (selector, context, results, seed ){
VaR SOFAR = selector,
Extra, // extra is used to save other parts of the Parallel Selection. Only one expression is processed at a time.
Parts = [],
M;
Do {
Chunker.exe C (""); // This step mainly resets the lastindex of the chunker. Of course, the effect of setting chunker. lastindex is the same.
M = chunker.exe C (SOFAR );
If (m ){
SOFAR = m [3];
Parts. Push (M [1]);
If (M [2]) {// If a parallel selector exists, it is interrupted and other selector parts are saved.
Extra = m [3];
Break;
}
}
} While (m );
}

For 'div # test + P> A. tab'
The parts result is ['div # test', '+', 'P', '>', 'a. Tab '].

After chunking, the next step is to determine the sequence of the selector. We can build two branches by following the instructions in (1:

If (parts. length> 1 & origpos.exe C (selector )){
// From left to right, the criterion is that there is a link selector with a position selector, because if it is just a selection expression similar to Div # test, there is no order problem.

} Else {
// Others, from right to left
}

[Note: origpos stores expr. Match. Pos, source code: 901 lines]

First look at the normal (from right to left) Situation
Then there is the ID problem. If the first selection expression contains the ID, reset context,
When there is a context, [if there is no context, you don't have to look for it, because there is definitely no result ],

After the contexr is reset, since it is from right to left, the first step is to obtain the set waiting for filtering,

Ret = seed? {Expr: parts. Pop (), set: makearray (SEED)}: sizzle. Find (parts. Pop (), context); // sizzle. Find is responsible for searching
Set = ret. expr? Sizzle. Filter (Ret. expr, Ret. Set): Ret. Set; // sizzle. filter filters

If a candidate set is seed, the result set is obtained directly. If no candidate set is available, the result set with the rightmost selector is obtained.

The subsequent process is to extract the Selection Characters in parts in sequence, search in set, and filter them until all of them are checked.

While (parts. Length ){
Expr. Relative [cur] (checkset, context, contextxml); // context indicates the context, not a parameter in the source code
}

 

Instance description:

['Div # test', '+', 'P', '>', 'a. Tab '] processing process
Step 1. There is no seed candidate set. The first item 'div # test' contains ID information and the last item 'a. the tab does not contain ID information, so content = sizzle is reset. find ('div # test', document)
Step 2: ['+', 'P', '>', 'a. tab '], no candidate set seed. First, obtain set a waiting for filtering [tag name a], and filter set B whose class name is tab in set.
Step 3, the remaining part is ['+', 'P', '>'], and link-based filtering is a reverse process, assume that B = [# A, # B, # C, # D] in step 2, first find the element where the direct parent node is P and obtain the set
C = [# A, # B, false, false], then obtain the content element next to the first step, and obtain the set D = [# A, false]
Step 4. Obtain the selected set E = [# A], and merge it into the result set.
Step 5. Follow the above rules to process the second part of the Parallel Selection expression.

 

About Context Selection

In the absence of candidate sets, which of the following conditions do I need to reset the ID?
1. Div # id_1 A # id_2
2. Div # id_1
3. Div A # id_2
In sizzle, context is set only in case (2 ).

In step 2, select "+" and "~" for the link. Indicates the relationship at the same level. Therefore, context [search range] is set to context. parentnode

Instance description:
<body>
<div id="test_a">
<p class="tab" id="a1">a1</p>
<p class="tab" id="a1">a2</p>
<p class="tab" id="a1">a3</p>
</div>
<div id="test_b">
<p class="tab" id="b1">b1</p>
<p class="tab" id="b1">b2</p>
<p class="tab" id="b1">b3</p>
</div>
</body>

Select the expression 'div # test_a ~ Div'
Step 1 reset context to Div # test_a
In step 2, if you directly execute (div # test_a). getelemnetsbytagname ('div '), the Operation will obviously fail, and the operation will be wrong at this time.
Therefore, you should execute (div # test_a). parentnode. getelemnetsbytagname ('div '). Then proceed to step 3.

Next is the sizzle. Find Process Analysis: sizzle engine-Principles and Practices (III)

 

For more information, see http://www.cnblogs.com/xesam /]
Address: http://www.cnblogs.com/xesam/archive/2012/02/15/2352471.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.