[JavaScript Ninja series]-css selector engine Getting Started

Source: Internet
Author: User
Tags tag name unique id xpath expression engine

The target audience for this article is the entry-level Web front-end developer.

This article describes the fundamentals of the CSS selector expression engine. The CSS selector engine is almost the tool that the front-end developers use every day. This article describes the various strategies for implementing the engine. First, we introduce a method based on the standard API.

the standard S lectors API

Platforms to support: Safari, Firefox 3.1+, Internet Explorer 8+, Chrome and Opera

The two most common methods:

queryselector, the function takes a CSS selector string, returns the first element found, and returns null if not found.

Queryselectorall, which takes a CSS selector string, returns a collection of all the elements found (NodeList).

These two methods exist on all DOM elements, DOM document objects, and DOM document fragment (fragment) objects.

<div id= "Test" ><B>HELLO</B>, I ' m a ninja!</div><div id= "Test2" ></div>
<script>window.onload = function () {var divs = Document.queryselectorall ("Body > div"); assert (divs.length = = = 2 , "DIVs found using a CSS selector."); var B = document.getElementById ("Test"). Queryselector ("B:only-child"); assert (b, "The bold element is found relative to Another element. ");}; </script>

One drawback to the above example is that it relies on browser support for CSS selectors (old ie breaks), so consider using an element as the root node of the child node. The code is as follows.

<script>window.onload = function () {var b = document.getElementById ("Test"). Queryselector ("div b"); Assert (b, " The last part of the selector matters. ");}; </script>

The problem with the above code is that when you query a child node with an element as the root node, the query function only checks that the rightmost part is included in the parent node. Notice that there is no div tag underneath the #test, but the query function ignores the previous part of the queried string.

This phenomenon really contradicts the effect of the CSS selector engine we expect, so we need to do some patching work. The most common technique is to temporarily add a new ID to the root node element to forcibly contain its contents. The code is as follows.

<script>    (function () {        var count = 1;        This.rootedqueryselectorall = function (elem, query) {            var oldid = elem.id;            Elem.id = "rooted" + (count++);            try {                return Elem.queryselectorall ("#" + elem.id + "" + query);            } catch (e) {                throw e;            } finally {                elem.id = oldid;            }        };    }) ();    Window.onload = function () {        var b = Rootedqueryselectorall (document.getElementById ("Test"), "div B");        ASSERT (b.length = = = 0, "The selector is now rooted properly.");    </script>

In the above code we need to take note of the following points:

First, you want to give the parent element a globally unique ID, so you need to save the parent element's original ID. The globally unique ID is then added to the query string.

The next closing part is to remove the newly added ID and return the query results, and there may be an API exception thrown in the process (mostly because of a selector syntax error or a selector not supported by the browser). Therefore, we want to wrap the API call statement in the outer layer with the TRY/CATCH statement block and restore the parent element's original ID in the finnally clause. You may find that there is a magical place in the JavaScript language hidden here, even though we have already return in the try statement, but the finnally clause is still executed (before the result value is actually return to the calling function).

The selector API will definitely count as the most promising new API in the standard. Once the mainstream browser supports CSS3 (or at least most CSS3 selectors), it can save programmers from using a lot of JavaScript code.

Using XPath to find elements

XPath is a language that can query a node in a DOM document. It's even more powerful than CSS selectors. Many popular browsers (Firefox,safari, Opera +, Chrome) provide some function implementations of XPath, which can be found in HTML documents. Internet Explorer 6 and earlier versions can only use XPath to find XML documents (not HTML documents).

XPath expressions are performed faster than complex CSS selectors. However, when we implement a CSS selector engine with a pure DOM operation, we have to consider the risk of browser support. In the case of a simple CSS selector, XPath loses its superiority.

So we're thinking about using a threshold value, and we're using XPath when it's more advantageous to use XPath. The decision threshold depends on the experience of the developer, such as: When looking for an ID or tag, using a pure DOM manipulation code is always a faster way.

If the user's browser supports XPath expressions, we can use the following code (dependent on the prototype library).

if (typeof document.evaluate = = = "function") {    function getelementsbyxpath (expression, parentelement) {        var results = [];        var query = document.evaluate (expression,        parentelement | | | document,        NULL, Xpathresult.ordered_node_snapshot _type, null);        for (var i = 0, length = query.snapshotlength; i < length; i++)        Results.push (Query.snapshotitem (i));        return results;}    }

Although using XPath can solve any selector problem, it is not a viable scenario. For a CSS selector expression, the corresponding XPath expression is daunting and complex. The following table shows how to convert a CSS selector to an XPath expression.

When constructing a CSS selector engine based on a regular expression, we can include XPath as a submodule, which converts the CSS selector expression portion of the user query into an XPath expression, and then uses the XPath method to find the DOM.

The code that implements the XPath section may be as much code as the regular expression. Many developers choose to discard XPath parts to reduce the complexity of the CSS selector engine. So, you need to measure the performance gains that XPath brings and the complexity of its code implementation.

Pure DOM Implementation method

The core of the CSS selector engine is implemented in the pure DOM operation mode. It parses the user-given CSS selector and then uses the existing Dom method (such as getElementById, getElementsByTagName) to find the corresponding DOM element. The use of a pure DOM approach is for the following reasons:

First, Internet Explorer 6 and 7. Although the Queryselectorall () method is supported in versions above IE8, support for XPath and selector APIs in IE6, 7 makes it necessary to use a pure DOM implementation.

Second, backwards compatible, if you want your code to "downgrade" support for older browsers (such as Safari 2), then you should use a pure DOM implementation.

Third, for speed. For some CSS selector expressions, using a pure DOM core can be handled faster (for example, by ID).

Knowing the importance of using the pure DOM core, next we look at implementing the selector engine in two ways: parsing from the top down, and parsing from the bottom up.

A top-down engine that parses the CSS selector expression: A matching element from left to right, followed by a partial match on the previous part. This approach is the way the mainstream JavaScript libraries are implemented, more generally, and the best way to find page elements. Let's take a look at a marker.

<body>    <div></div>    <div class= "Ninja" ><span>please </span>        <a href= "/ninja" ><span>click me!</span>        </a>    </div></body>

If we want to choose "Click me!" That element, we can write the selector expression: Div.ninja a span.

Use the top down method to parse this selector expression:

The first item in an expression, Div.ninja indicates a subtree in the document. In that subtree, look for the next sub-tree in the expression. Finally, the target node of span is found.

Note that this is only the simplest case. In any layer of the propulsion process, it is possible to have more than one sub-tree matching expression. There are two principles to consider when implementing the selector engine:

The order of the elements in the returned results should appear in the original order in the document

The elements in the returned result should not be duplicated (for example, an element cannot appear in the result two times)

To avoid these pitfalls, specific code implementations may have a little bit of a trick. The following is a simplified Top-down-mode engine that only supports locating elements according to the tag tag name.

<div> <div> <span>Span</span> </div></div><script> window.onload = f            Unction () {function find (selector, root) {root = root | | document;                var parts = selector.split (""), query = Parts[0], rest = Parts.slice (1). Join (""),            Elems = root.getelementsbytagname (query), results = []; for (var i = 0; i < elems.length; i++) {if (rest) {results = Results.concat (Find (re                St, Elems[i]));                } else {Results.push (elems[i]);        }} return results;        } var divs = find ("div");        ASSERT (Divs.length = = = 2, "Correct number of divs found.");        var divs = find ("div", document.body);        ASSERT (Divs.length = = = 2, "Correct number of divs found in body.");        var divs = find ("body div"); ASSERT (Divs.length = = = 2, "Correct number of divs found in body.");        var spans = find ("div span");    ASSERT (Spans.length = = = 2, "A duplicate span was found."); };</script>

In the example above, we have implemented a simple selection engine that supports parsing from top to bottom for elements that follow the tag tag name. This engine can be decomposed into several sub-sections: Parsing selector expressions, finding elements in documents, filtering elements, recursively/merging results in each layer.

Resolving selector expressions

In the example above, the parsing process is to break down CSS selectors (for example, "div span") into a string array (["div", "span"]). In fact, in the CSS2 and CSS3 standards, using attribute values to find elements is supported. Therefore, it is possible to have extra spaces in the selector so that the simple method above is not OK. However, this simple approach is sufficient to handle most of the situation.

To fully implement parsing, we need a series of parsing rules to handle any expression given by the user. The following code uses regular expressions to break down expressions into small chunks (separate commas if needed)

<script type= "Text/javascript" >    var selector = "Div.class > Span:not (: First-child) a[href]"    var Chunker =/(?: \ ([^\)]+\)|\[[^\]]+\]| [^ ,\(\[]+)+) (\s*,\s*)?/g;    var parts = [];    Reset the position of the Chunker regexp (start from beginning)    chunker.lastindex = 0;    Collect the Pieces    while ((M = chunker.exec (selector))!== null) {        Parts.push (m[1]);        Stop if we ' ve countered a comma        if (m[2]) {            extra = Regexp.rightcontext;            break;        }    }    ASSERT (Parts.length = = 4,        "Our selector are broken into 4 unique parts.");    ASSERT (parts[0] = = = "Div.class", "div selector");    ASSERT (parts[1] = = = ">", "Child selector");    ASSERT (parts[2] = = = "Span:not (: First-child)", "span selector");    ASSERT (parts[3] = = = "A[href]", "a selector");</script>

Obviously, the selectors supported by this code are just a small part of a big puzzle. We need to define more parsing rules to support the various combinations of expressions that are entered by the user. Most CSS selector engines use the map structure to correspond the regular expression to the target handler function. So when a regular expression matches a part of a user expression, the corresponding function handles the selector for that part of the expression.

Looking for elements

There are a number of solutions for finding the right DOM element in the page. Which option to use depends on what selectors the browser supports.

The first is the getElementById() method. It exists only on the root node of the HTML document. Its role is to find the first element that matches the specified ID value, so he can be used to resolve an expression such as "#id". Note that in Internet Explorer and Opera, it also looks for the first element with the name value of the same name. Therefore, if a value is to be looked up according to the ID value, we need an additional step of validation to exclude the element with the same name as the name value.

If you need to support searching for all elements with a given ID value (this is customary in CSS selector expressions, although the HTML grammar stipulates that an ID can only correspond to one element), there are two ways to do this: the first method, which iterates through all the elements, finds all the elements that match the given ID value, and the second method, Using the document.all["id", it returns an array containing the matching ID value elements.

Next is the getElementsByTagName() method, which acts as it says in its name: Find all the elements that match a given signature. Note that it has another use: if you use an asterisk * as the parameter label name, it returns all elements in the document or under a node. This trick is useful for handling selectors based on attribute values, such as ". Class" or "[attr]". Because the ". Class" does not specify a tag name, we need to list all child elements under a node and then determine the class name in turn.

Also, using the asterisk * lookup in Internet Explorer has a disadvantage, and it also returns the comment statement node (because in IE, the comment statement node has a "!" tag name, so it will also be returned). In this way, we need an extra step of filtering to exclude the comment statement node.

Next is the getelementsbyname() method, which has only one function: To find all nodes that match the given name value (for example, the,<input> element has a name value). This method can therefore be used to resolve expressions such as "[Name=name]".

Finally, the getelementsbyclassname() method. This method is relatively new and is being implemented by mainstream browsers (Firefox, Safari, and Chrome). Its role is to find based on the class name of the element. This native approach to browsers greatly accelerates the implementation of code found by class name.

Although there are other techniques to solve element lookups, the above methods are still the main tools we use. Once all the matching alternative elements have been identified, the next element is filtered.

Filter elements

A CSS expression is usually made up of several separate, small parts. For example, such an expression "Div.class[id" is made up of three parts: 1. div element 2. Has the given class name 3. Has a property value named ID.

First we need to find out the first part of the selector. For example, in the above expression, we see that the first part is to find the div element, so we immediately think of using the getElementsByTagName () method to find all the <div> elements on the page. Next, we have to filter the elements so that the remaining elements have the given class name and id attribute value.

Filtering elements are a common part of the implementation of the selector engine. The filtering principle relies primarily on element attribute values or on the relationship of elements to other nodes in the DOM tree.

Filter by attribute: Accesses the DOM attribute of an element (typically using the GetAttribute () method), and verifies that its value is equal to the given value. Filtering by class name is a subset of this category (accessing the ClassName property and validating its value).

Filter by Location: This occurs in expressions that are combined with ": Nth-child (even)" or ": Last-child" on a parent element. If the browser supports such a CSS selector, it returns a collection of child elements. In addition, all browsers support ChildNodes, which returns a collection of child elements, which also contain all plain text nodes and comment statement nodes. Using both of these methods, you can filter by the position of elements in the DOM tree.

The element filtering feature has two purposes: first, this function can be provided to the user to test whether any element conforms to a value, and second, when internally computed, it is possible to check whether the element conforms to the user-given selector expression.

Merging elements

In the first piece of code in this article, we can see that the selector engine needs to be able to recursively find the elements (find the descendant elements) and merge all the elements that meet the requirements, eventually returning the result set.

However, in this section, our initial implementation of the code is too simple. Notice that we finally found two <span> elements in the document. Therefore, we need to do an extra step check to ensure that the final result of the array cannot contain duplicate elements. Most of the selector engines in the Top-down approach use a number of methods to ensure the uniqueness of the element.

<div id= "Test" ><B>HELLO</B>, I ' m a ninja!</div><div id= "Test2" ></div><script >    (function () {        var run = 0;        This.unique = function (array) {            var ret = [];            run++;            for (var i = 0, length = array.length; i < length; i++) {                var elem = array[i];                if (Elem.uniqueid!== run) {                    Elem.uniqueid = run;                    Ret.push (Array[i]);                }            }            return ret;}        ;    }) ();    Window.onload = function () {        var divs = unique (document.getElementsByTagName ("div"));        ASSERT (Divs.length = = = 2, "No duplicates removed.");        var BODY = unique ([Document.body, Document.body]);        ASSERT (Body.length = = = 1, "Body duplicate removed.");    }; </script>

The unique () method adds an additional attribute to all elements in the array to mark whether they have been accessed. Therefore, when all the elements are processed, only the non-repeating elements are left behind. Other algorithms similar to this method can be seen in most CSS selector engines.

So far, we have roughly constructed a CSS selector engine from top to bottom (Top-down). Now, let's look at another scenario.

Implementation from bottom to top

If you do not have to consider the unique element, then you can implement the selector parsing process in the bottom-up (bottom-up) way. Its process follows the top-down approach (reviewing the diagram of the parsing process). For example, for such an expression "div span", you need to first find all the <span> elements, and then for each candidate element, see if they have an ancestor element of <div>.

Such a way is not popular from top to bottom. Although it can handle simple CSS selector expressions Well, the traversal of ancestors on each candidate element is too time-consuming and resource-intensive.

Constructing the engine from bottom to top is simple. First find the last part of the CSS selector expression, then find the matching elements, then filter out the non-conforming elements according to a series of filtering rules. The following code illustrates this process.

<div> <div><span>Span</span> </div></div><script> window.onload = funct            Ion () {function find (selector, root) {root = root | | document; var parts = selector.split (""), query = Parts[parts.length-1], rest = Parts.slice (0,-1).            Join (""). toUpperCase (), Elems = root.getelementsbytagname (query), results = []; for (var i = 0; i < elems.length; i++) {if (rest) {var parent = Elems[i].parentnode                    ;                    while (parent && parent.nodename! = rest) {parent = Parent.parentnode;                    } if (parent) {Results.push (elems[i]);                }} else {Results.push (elems[i]);        }} return results;        } var divs = find ("div"); ASsert (Divs.length = = = 2, "Correct number of divs found.");        var divs = find ("div", document.body);        ASSERT (Divs.length = = = 2, "Correct number of divs found in body.");        var divs = find ("body div");        ASSERT (Divs.length = = = 2, "Correct number of divs found in body.");        var spans = find ("div span");    ASSERT (Spans.length = = = 1, "No duplicate span was found."); };</script>

Note that the above code only handles one level of ancestor relationships. If you need to deal with multiple ancestor relationships, the status of the current layer needs to be recorded. Consider using two arrays: The first array records the elements that will be returned (some of which are set to undefined, if they cannot match the expression), and the second array records the ancestor nodes that currently need to be tested.

As mentioned earlier, additional ancestor relationship validation in this step can lead to more performance overhead. But the bottom-to-top approach does not require a step in the result set to remove the duplicate elements, so it has some advantages as well. (Because the bottom-to-top approach at the very beginning of each element is already independent of each other, and if you follow the top-down approach, because the subtree may overlap in recursion, it will contain duplicate elements)

Summary

The CSS selector engine implemented by JavaScript is a powerful tool. It allows us to easily use several selector syntaxes to find DOM elements on the page. While there is a lot of detail to consider when fully implementing a selector engine, this situation is greatly improved (thanks to the native method of the browser).

Take a look back at the points discussed in this article:

    • Modern browsers have begun to implement support for the standard selector API, but there is still a long way to go.
    • Given the performance issues, it is still necessary to implement our own selector engine.
    • To create a selector engine, we can:
    • Using the standard selector API
    • Using XPath
    • For best performance, use a pure DOM operation
    • The top-down approach is very popular, but it requires some cleanup: such as ensuring that return elements are not duplicated.
    • The bottom-up approach avoids that cleanup effort, but it can lead to more performance overhead.

As browsers gradually support the standard selector, the details of the engine implementation may become a thing of the past. But for many developers, that day may not come soon.

[JavaScript Ninja series]-css selector engine Getting Started

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.