[WebKit] javascriptcore analysis-basics (2) interpreter basics and JSC Core Components

Last Update:2018-12-03 Source: Internet

Author: User

Tags lexer

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article mainly describes the basic working process of the interpreter and the implementation of the core components of JSC.

As a language, just like when people communicate at ordinary times, when receiving information, there are two processes: Understanding first and then taking action. The process of understanding is the process of language parsing, and the action is to execute the corresponding behavior according to the parsing result. In the computer field, understanding is to compile or explain. This has been thoroughly studied and is assisted by tools. The ever-changing execution is also the focus of performance optimization. Next let's take a look at how JSC understands and executes Javascript scripts.

Interpreter Working Process

The basic workflow of javascriptcore is as follows:

For an interpreter, the supported language must first be clarified, and the JSC supports EMCAScript-262 specifications.

Lexical Analysis and syntax analysis are the process of understanding. The input text is converted into a semantic form that can be understood (Abstract syntax tree ), or generate intermediate code (bytecode) for later use ).

The interpreter is responsible for executing the parsing output results. Because execution is the focus of optimization, JIT is available to improve execution efficiency. Based on the data, V8 will optimize the parser output, saving bytecode. When the interpreter has the ability to directly execute Based on AST.

The most famous tools for lexical analysis and syntax analysis are lex/YACC and the successor flex/Bison (the Lex & YACC page ). They provide language or text parsing functions for many software, which are quite powerful and interesting. Although javascriptcore does not use them, it is implemented on its own, but the basic idea is similar.

Lexical Analysis (lexer) is actually a scanner. Based on the definition of the language, the content in the source file is extracted into tokens that can be recognized by syntax, such as keywords, operators, and constants. Define rules in a file.

Syntax analysis (paser) is used to identify different semantics (target operations) based on the syntax (sequential combination of tokens ).

For example:

I = 3;

Lexer may be recognized as the following tokens:

Variable equal constant end

After parser analysis, we can see that this is a "value assignment operation, assigning a constant 3 to variable I ". Then, call the corresponding operation for execution.

If you are not familiar with lexer and parser, you can refer to a lot of materials. Here is a basic Getting Started Guide: YACC and Lex quick start.

The interpreter and JIT are described in section 3rd.

Basic execution environment (register-based VM)

The code generated by JSC resolution is put on a virtual machine for execution (in a broad sense, the main body of JSC is a virtual machine ). JSC uses a register-based Vm and stack-based VM ). The difference between the two can be simply understood as the method for passing parameters in the instruction set, whether to use registers or stacks.

Compared with stack-based virtual machines, the register-based VM is more efficient than stack-based virtual machines because it does not require frequent pressure, output, and support for ternary operations.

The so-called ternary operator, where add is a ternary operation,

Add DST, src1, src2

The function is to add src1 and src2 and save the result in DST. DST, src1, and src2 are registers.

To make it easier to compare with the example in <deep understanding of Java Virtual Machine>, we also use JSC to output the following bytecode:

[   0] enter[   1] mov               r0, Cell: 0133FC40(@k0)[   4] put_by_id         r0, a(@id0), Int32: 100(@k1)[  13] mov               r0, Cell: 0133FC40(@k0)[  16] put_by_id         r0, b(@id1), Int32: 200(@k2)[  25] mov               r0, Cell: 0133FC40(@k0)[  28] put_by_id         r0, c(@id2), Int32: 300(@k3)[  37] resolve_global    r0, a(@id0)[  43] resolve_global    r1, b(@id1)[  49] add               r0, r0, r1[  54] resolve_global    r1, c(@id2)[  60] mul               r0, r0, r1[  65] ret               r0

* Reference: JSC bytecode specifications (the WebKit is not updated in time, but only for reference. The latest content depends on the Code .)

The bytecode generated by stack-based virtual machines is as follows:

0： bipush 1002:    istore_13:    sipush 2006:    istore_27:    sipush 30010:  istore_311:  iload_112:  iload_213:  iadd14:  iload_315:  imul16:  ireturn

It helps to understand the differences between them.

Core Components

* This part is basically translated from the first half of the javascriptcore description on the WebKit website.

Javascriptcore is an evolving virtual machine),Including the following modules: lexer, parser, start-up interpreter (Llint), Baseline JIT, and an optimizing JIT (Dfg).

LexerLexical analysis is responsible for breaking the script into a series of tokens. javascriptcore
Lexer is manually written, and most of the code is in parser/lexer.In H and parser/lexer. cpp.

ParserSyntax tree is created based on the tokens of lexer).Javascriptcore uses a manually written recursive descent parser (recursive
Descent parser), the code is located in parser/jsparser.H and parser/jsparser. cpp.

LlintIs called low level interpreter and is responsible for executing bytecodes generated by paser).The code is in the llint/directory.,It uses a portable Assembly implementation and is also offlineasm (the Code is under the offlineasm/directory). It can be compiled into x86 and armv7 compilation and C code. In addition to lexical parsing and syntax interpretation, the call, stack, and register conversion executed by the JIT compiler basically do not have startup overhead (start-up
Cost. For example, calling an llint function is similar to calling a function that has already compiled the original code, unless the machine code entry is exactly the sameLlint Prologue(Public Function header, shared llint prologue).Llint also includes some optimizations, such as using inline cacheing to accelerate attribute access.

Baseline JITAfter a function is called six times, or a code segment loops for 100 times (or a combination, for example, three calls with 50 enumerations), baseline JIT is triggered. These numbers are only approximate estimates. In fact, the heuristics process depends on the function size and memory status at the time. When JIT is stuck in a loop, it executes on-Stack-Replace (OSR) to redirect all callers of the function to the new compiled code. Baseline JIT is also a back-up for further function optimization. If the Code cannot be optimized, it will be adjusted to baseline through OSR.
JIT. Baseline JIT code executes almost all heap access for inline caching in JIT/. Baseline JIT.

Llint and baseline JIT will collect lightweight performance information for machine selection to a higher level (dfg) for execution. The collected information includes recent data in parameters, heap, and return values. In addition, all inline caching has done some processing to facilitate dfg type determination. For example, by querying the inline cache status, you can detect the frequency of heap access using specific classes. This can be used to determine whether to enter the dfg (this behavior is called speculation in this article. It is a bit of a bet. It can optimize and achieve the best performance, otherwise it will be returned ). In the next section, we will focus on the type inference of javascriptcore.

Dfg JITWhen a function is called at least 60 times, or the Code loops for 1000 times, the dfg JIT is triggered. Similarly, these are approximate numbers, and the entire process tends to be heuristic. Dfg actively performs Type Estimation Based on the data collected by baseline JIT & interpreter, so that it can obtain the forward-propagate type information as soon as possible ), this reduces the number of Type checks. Dfg will also speculate on its own. For example, to enable inlining, it may recognize a known function object from the content loaded in heap. If the speculation fails, dfg cancels optimization (deoptimization), also known as "OSR
Exit ". deoptimization may be synchronous (a type of detection branch is being executed) or asynchronous (for example, runtime observes a value change and conflicts with the dfg assumption ), the latter is also called "watchpointing ". Baseline JIT and dfg JIT share a two-way OSR: Baseline can enter dfg when a function is frequently called, while dfg will return to baseline JIT when deoptimization. the repeated OSR exit (OSR exits) also has a statistical function: The dfg OSR exit records the reason for the exit as frequently as the record occurs (for example, the speculative failure of the value type ), if you exit a certain number of times, a reoptimization will be triggered ),
The caller of the function will be relocated to baseline JIT and then collect more statistics. You may call dfg again as needed. Re-optimized the use of the exponential rollback Policy (exponential back-off, will be more and more) to deal with some amazing code. The dfg code is in dfg /.

At any time, functions, Eval code blocks, and Global Code may run simultaneously by llint, baseline JIT, and dfg. An extreme example is a recursive function. Because there are multiple stack frames, one may run in llint, the other may run in baseline JIT, And the other may be running in dfg. Even more extreme, when re-optimization is triggered during the execution process, a stack frame is executing the old dfg compilation, while the other is executing the new dfg compilation. Therefore, the three are designed to maintain the same execution semantics (Execution
Semantics), their mixed use is also to bring obvious efficiency improvement.

* If you want to observe their work, you can use JSC: Options to add some log output in JSC. cpp of the subproject JSC in WebKit.

Refer:

Virtual Machine (I): Interpreter, tree traversal interpreter, based on Stack and register-based, hodgedge http://rednaxelafx.iteye.com/blog/492667

Reprinted please indicate the source: http://blog.csdn.net/horkychen

Series indexes:

Basics (1) JSC and WebCore

Basics (ii) interpreter basics and JSC Core Components

Basic (3) code implementation from script code to JIT compilation

Basic (4) page parsing and JavaScript element execution

Advanced Article (1) SSA (Static Single Assignment)

Advanced (ii) type inference)

Advanced (iii) register allocation & trampoline

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More