Personal Summary: It takes 15 minutes to finish this article, this article introduced the abstract syntax tree and JS engine parsing these syntax tree process, referred to lazy parsing-that is, the process of converting to AST is not directly into the function body parsing, when the function body needs to be executed when the corresponding conversion. (because some function bodies are simply declared, and are not actually called)Parsing, Syntax abstraction tree, and 5 tips for minimizing parsing time
This is the 14th chapter of how JavaScript works.
Overview
We all know the performance of running a large piece of JavaScript code can get pretty bad. The code needs to be not only transmitted over the network but also parsed, compiled into bytecode, and then run. Previous articles discussed topics such as JS engines, runtime and call stacks, and V8 engines that are widely used by Google Chrome and NodeJS. They all play an important role in the operation of the entire JavaScript process.
Today's topic is also very important: learn how most JavaScript engines interpret text as code that the machine understands, what happens after the conversion, and how the developer uses that knowledge.
Programming language Principles
So, first let's review the principles of programming languages. No matter what programming language you use, you often need some software to process the source code so that the computer can understand it. The software can be an interpreter or a compiler. Whether using an interpreted language (JavaScript, Python, Ruby) or a compiled language (C #, Java, Rust), they all have one thing in common: the data structure that parses the source code as plain text into a syntax abstract tree (AST). AST not only shows the source code in a structured way, but also plays an important role in semantic analysis, and the compiler checks that the syntax of the validator and language elements is used correctly. After that, use the AST to generate the actual bytecode or machine code.
AST Program
AST applies not only to language interpreters and compilers, but also to other uses in the computer world. One of the most common uses is static code analysis. Static code analysis does not run the input code. However, they still need to understand the structure of the code. For example, implement a tool to find common code structures for code refactoring to reduce duplication of code. Perhaps you can use string comparisons to do this, but the tools are fairly simple and limited. Of course, if you are interested in implementing such a tool, you do not have to write the parser yourself, there are many open source projects that are perfectly compatible with the Ecmascript specification. Esprima and Acorn are gold partners. There are other tools that can be used to help the parser output code, i.e. asts.asts is widely used in transcoding. For a chestnut, you might want to implement a converter to convert Python code to JavaScript. The general idea is to use a Python code converter to generate the AST, and then use that AST to generate the JavaScript code. You may find it hard to believe. The fact is that ASTs is just a different representation of some languages. Before parsing, it behaves as text, which adheres to some grammatical rules that make up the language. After parsing, it behaves as a tree structure that contains almost the same information as the input text. Therefore, you can also reverse parse and then return to the text.
JavaScript parsing
Let's look at the structure of the AST. Take the following simple JavaScript function as an example:
function foo(x) {
if (x > 10) {
var a = 2;
return a * x;
}
return x + 10;
}
The parser will produce the following AST.
Note that this is to show a simplified version of the parser output only. The actual AST needs to be more complex. However, the idea here is to understand the first step before running the source code. You can access the AST Explorer to view the actual AST tree. This is an online tool that you can write JavaScript code on, and then the site will output the AST of the target code.
You may ask why I have to learn how the JavaScript parser works. Anyway, the browser is responsible for running JavaScript code. You're right about the slightest. The table shows how time-consuming the different phases of JavaScript are running. Stare at the eyes, maybe you can find something interesting.
Did you find it? Typically, the browser consumes about 15% to 20% of the total elapsed time to parse JavaScript. I don't have a specific statistic about these values. These statistics are derived from the various JavaScript gestures used in real-world programs and websites. Now maybe 15% doesn't look much, but believe me, a lot of. A typical single-page program will load approximately 0.4M of JavaScript code, then consume the browser for about 370ms of time to parse. Perhaps you will say, this is not a lot. It does not take much time for itself. But remember, this is just the time it takes to convert the JavaScript code into ASTs. It does not include the time it takes to run itself or other processes such as CSS and HTML rendering during page load time. This is just a problem for desktop browsers. Mobile browsers can be more complicated. In general, mobile browser resolution code time is 2-5 times the desktop browser.
The table shows the time taken by different mobile and desktop browsers to parse 1MB JavaScript code.
In addition, to get more of the native user experience and to accumulate more and more business logic at the front end, the Web program becomes more and more complex. The Web program is getting fatter and faster. You can easily think about the performance impact of network applications. Simply open the browser developer tool and use the tool to detect parsing, compiling, and other time spent in the browser until the page has been fully loaded.
Unfortunately, the mobile browser does not have developer tools for performance testing. Do not worry. Because there are devicetiming tools. It can be used to help detect the parsing and running time of scripts in a controlled environment. It encapsulates local code by inserting code so that it can measure parsing and running time locally whenever it is accessed from different devices.
The good news is that the JavaScript engine does a lot of work to avoid redundancy and be more efficient. The following technologies are used by mainstream browsers.
For example, V8 implements the script stream and code caching techniques. The script stream is when the script starts to download, and the async and deferred scripts are parsed in a separate thread. This means that the resolution is completed as soon as the script download is complete. This will increase the page load speed by 10%.
JavaScript code is usually compiled into bytecode whenever a page is accessed. However, when a user accesses another page, that bytecode is invalidated. This is because the compiled code relies heavily on the state and context of the build phase machine. The bytecode cache has been brought in from Chrome 42. This technique caches compiled code locally so that all steps, such as downloading, parsing, and compiling, are skipped when the user returns to the same page. This saves Chrome about 40% of the code parsing and compiling time. In addition, this will also save your phone's battery power.
In Opera, the Carakan engine can reuse the most recently compiled output from another program. Code is not required on the same page or under the same domain name. The caching technique is very efficient and can skip the compilation step completely. It relies on typical user behavior and browsing scenarios: the same JavaScript code is loaded whenever a user follows a specific user browsing habit on a program/site. However, Carakan has long been replaced by the Google V8 engine.
The SpiderMonkey engine used by Firefox does not use any caching technology. It can transition to the monitoring phase, where the number of script runs is recorded. Based on this calculation, it derives the part of the code that is frequently used and can be optimized.
Obviously, some people choose not to do anything. Safari chief developer Maciej Stachowiak points out that Safari does not cache compiled bytecode. They may have thought of caching technology but not implemented it because it took less than 2% of the total elapsed time to generate code.
These optimizations do not directly affect the parsing time of JavaScript source code, but are avoided as completely as possible. After all, better than nothing.
There are many ways to reduce the initialization load time of a program. Minimize the number of JavaScript loaded: The less code, the less time it takes to parse, and the fewer runs. In order to achieve this, it is possible to transfer the required code in a special way instead of loading a large piece of code with a single load. For example, the PRPL pattern represents the type of code transfer. Alternatively, you can check dependencies and see if there is a useless, redundant dependency that causes the code base to swell. However, these things need a lot of space to discuss.
The goal of this article is how developers can help speed up the parsing of JavaScript parsers. The modern JavaScript parser uses heuristics (heuristics) to decide whether to run the specified code snippet immediately or to postpone running at some point in the future. Based on these heuristics, the parser will parse immediately or lazily. Immediate resolution runs a function that needs to be compiled immediately. It does three things: Build an AST, build the scope hierarchy, and then check all the syntax errors. Lazy parsing only runs the non-compiled function, it does not build the AST and checks for any syntax errors. Only the scope hierarchy is built, which saves about half the time relative to immediate resolution.
Obviously, this is not a new concept. Even an old browser like IE9 supports this optimization technique, although it is implemented in a rudimentary way compared to the way modern parsers work.
Give me a chestnut. Suppose you have the following code snippet:
function foo() {
function bar(x) {
return x + 10;
}
function baz(x, y) {
return x + y;
}
console.log(baz(100, 200));
}
Similar to the previous code, parse the code input parser and output the AST. This is stated as follows:
Declares the bar function to receive the x parameter. has a return statement. The function returns the result of adding X and 10.
Declares that the Baz function receives two parameters (x and y). has a return statement. The function function x and y Add the result.
Call the Baz function to pass in 100 and 2.
Call the Console.log parameter as the return value of the previous function call.
So what happened during the period? The parser discovers the bar function declaration, Baz function declaration, calls the bar function and calls the Console.log function. However, the parser does a completely unrelated extra useless work that parses the bar function. Why isn't it relevant? Because the function bar has never been called (or at least not at the corresponding point in time). This is just a simple example and may be somewhat unusual, but in many of the real-life programs, many function declarations have never been called.
The bar function is not parsed here, and the function declares that it does not indicate its purpose. True parsing is done only when needed, before the function is run. Lazy parsing still only needs to find out the entire function body and then declare it. It does not require a syntax tree because it will not be processed. In addition, it does not allocate memory from the memory heap, which consumes a significant portion of the system resources. In short, skipping these steps can have a huge performance boost.
So the previous example, the parser will actually parse as follows:
Note that this is just a confirmation of the function Bar declaration. Did not enter the bar function body. In the current case, the function body has only one simple return statement. However, as with most programs in the modern world, the function body can be much larger, including multiple return statements, conditional statements, loops, variable declarations, and even nested function declarations. Since the function has never been called, this is a waste of time and system resources.
This is actually a fairly simple concept, but its implementation is very difficult. Not limited to the example above. The entire method can also be applied to functions, loops, conditional statements, objects, and so on. In general, all code needs to be parsed.
For example, the following is a fairly common pattern for implementing JavaScript modules.
Var myModule = (function() {
// The logic of the entire module
// return module object
})();
This pattern can be recognized by most modern JavaScript parsers and the code inside the logo needs to be parsed immediately.
So why don't all parsers use lazy parsing? If you lazy parse some code, and the code must run immediately, it will slow down the code. You need to run a lazy parse once to make another immediate resolution. The run speed will be reduced by 50% compared to immediate resolution.
Now, there is a general understanding of the underlying principle of the parser, and it is time to consider how to help improve the parsing speed of the parser. You can write the code in such a way that you can parse the function at the correct time. Here is a pattern that is recognized by most parsers: use parentheses to encapsulate functions. This tells the parser to need the immediate function. If the parser sees an opening parenthesis and then a function declaration, it immediately resolves the function. You can use an explicit declaration to run a function immediately to help the parser speed up parsing.
Suppose there is a Foo function
function foo(x) { return x * 10;}
Because there is no obvious indication that the function needs to be run immediately, the browser will be lazy parsing. However, we determine that this is not right, so you can run two steps.
First, store the function as a variable.
var foo = function foo(x) { return x * 10;};
Note the function name between the functions keyword and the opening parenthesis of the function argument. This is not necessary, but it is recommended, because when an exception is thrown, the stack trace contains the actual function name instead.
The parser will still do lazy parsing. A small change can be made to solve this problem: enclose the function in parentheses.
var foo = (function foo(x) { return x * 10;});
The parser now sees the left parenthesis immediately before the function keyword to parse it.
Because you need to know in which case the parser lazy parsing or immediately parse the code, so operability will be very poor. Similarly, developers need to take the time to consider whether the specified function needs to be parsed immediately. Surely no one wants to do it with effort. Finally, this is sure to make the code difficult to read and understand. You can use Optimize.js to handle this kind of situation. This tool is only used to optimize the initial load time of JavaScript source code. They run static analysis of the code, and then encapsulate the functions that need to be run immediately by using parentheses so that the browser resolves immediately and prepares to run them.
Then, as usual, the miscellaneous code and then a small snippet is as follows:
(function() { console.log(‘Hello, World!‘);})();
Everything looks nice because the opening parenthesis is added before the function declaration. Of course, code compression is required before entering the production environment. The following is the output of the compression tool:
!function(){console.log(‘Hello, World!‘)}();
Everything looks fine. The code runs as expected. However, there seems to be something missing. The compression tool removes the parentheses of the encapsulated function instead of an exclamation mark. This means that the parser skips the code and will run lazy parsing. In short, to run the function parser will parse immediately after lazy parsing. This can cause the code to run slower. Fortunately, optimize.js can be used to solve such problems. The code passed to Optimize.js compressed will output the following code:
!(function(){console.log(‘Hello, World!‘)})();
Now, take advantage of each of the advantages: the compression code and the parser to correctly identify lazy parsing and immediately resolved functions.
Pre-compilation
But why not do the work on the server? In summary, it's better to just run it once and output the results on the client than forcing each client to do the same thing over and over again. So, there is an ongoing discussion of whether the engine needs to provide a capability to run precompiled code to save the browser's run time. Essentially, the idea is to use a server-side tool to generate bytecode, so that you only need to transfer the bytecode and run it on the client. After that, you'll see some major differences in startup time. It sounds tempting but it can be difficult to achieve. may be counterproductive, because it will be large and likely to require signature and processing for security reasons. For example, the V8 team has solved the problem of repeated parsing internally, so that precompilation might actually not be a bird.
Some suggestions for improving the speed of network applications
- Check dependencies. Reduce unnecessary dependencies.
- The split code is a smaller block instead of a whole block. such as the code-spliting function of Webpack.
- Delay loading JavaScript code as much as possible. You can load only the code fragments that are required by the current route. For example, only when you click on an element to introduce a code module.
- Use developer tools and devicetiming to detect performance bottlenecks.
- Use tools like Optimize.js to help the parser select immediate parsing or lazy parsing to speed up parsing.
Expand
Sometimes, especially the mobile browser, such as when you click the Forward/Backward button, the browser will be cached. But in some scenarios, you may not need this functionality of the browser. The following solutions are available:
window.addEventListener(‘pageshow‘, (event) => {
// Check forward/backward cache, whether to load page from cache
If (event.persisted || window.performance &&
Window.performance.navigation.type === 2) {
/ / Perform the corresponding logic processing
}
};
How JavaScript works (JavaScript works) (14) parsing, Syntax abstraction tree, and 5 tips for minimizing parsing time