In-depth understanding of webkit kernel Series 2: Deep parsing of JavaScript Engine, webkitjavascript
The article [WebKit] JavaScriptCore parsing-basics (iii) from script code to JIT compiled code implementation written by HorkeyChen is well written and deeply inspired. I want to add some details not written by the Horkey, such as how the bytecode is generated.
In fact, JSC's JavaScript processing is similar to Webkit's processing of CSS in many places. It has the following parts:
(1) lexical analysis-> tokens );
(2) Syntax analysis-> extract the Abstract Syntax Tree (AST: Abstract Syntax Tree );
(3) traverse the abstract syntax tree-> generate Bytecode );
(4) use the Interpreter (LLInt: Low Level Interpreter) to execute bytecode;
(5) If the performance is not good enough, use Baseline JIT to compile the bytecode to generate the machine code and then execute the machine code;
(6) If the performance is not good enough, use the dfg jit to re-compile the bytecode to generate a better machine code and then execute the machine code;
(7) In the end, if it is not good, it will sacrifice the parallel tool-the LLVM: Low Level Virtual Machine) to compile the intermediate DFG code, generate and execute a higher-optimized Machine code. Next, I will use a series of articles to describe this process.
Steps 1 and 2 are similar. Steps 3, 4, and 5 use similar methods. For details, refer to [1]. I want to write a JSC article, and use cainiao and yugong to move the mountains to open the tip of the JSC iceberg.
This article describes the lexical and Syntax Parsing details.
I. workflow analysis of JavaScriptCore lexical analyzer
W3C explains the lexical and syntax workflow in this way:
The Tokenizer process is as follows: it constantly searches for words (tokens) from strings. For example, if a continuous "true" string is found, a TokenTrue is created. The word Sub-device works as follows:
JavaScriptCore/interpreter. cpp:
Template <typename CharType>
template <ParserMode mode> TokenType LiteralParser<CharType>::Lexer::lex(LiteralParserToken<CharType>& token){ while (m_ptr < m_end && isJSONWhiteSpace(*m_ptr)) ++m_ptr; if (m_ptr >= m_end) { token.type = TokEnd; token.start = token.end = m_ptr; return TokEnd; } token.type = TokError; token.start = m_ptr; switch (*m_ptr) { case '[': token.type = TokLBracket; token.end = ++m_ptr; return TokLBracket; case ']': token.type = TokRBracket; token.end = ++m_ptr; return TokRBracket; case '(': token.type = TokLParen; token.end = ++m_ptr; return TokLParen; case ')': token.type = TokRParen; token.end = ++m_ptr; return TokRParen; case ',': token.type = TokComma; token.end = ++m_ptr; return TokComma; case ':': token.type = TokColon; token.end = ++m_ptr; return TokColon; case '"': return lexString<mode, '"'>(token); case 't': if (m_end - m_ptr >= 4 && m_ptr[1] == 'r' && m_ptr[2] == 'u' && m_ptr[3] == 'e') { m_ptr += 4; token.type = TokTrue; token.end = m_ptr; return TokTrue; } break; case '-': case '0':
<Span style = "font-family: Arial, Helvetica, sans-serif;"> after this process, a complete JSC world Token is generated. Then, analyze the syntax to generate an abstract syntax tree. </span>
UString Parser <LexerType>: parseInner ()
{UString parseError = UString (); unsigned oldFunctionCacheSize = m_functionCache? M_functionCache-> byteSize (): 0; // abstract syntax tree Builder: ASTBuilder context (const_cast <JSGlobalData *> (m_globalData), const_cast <SourceCode *> (m_source )); if (m_lexer-> isReparsing () m_statementDepth --; ScopeRef scope = currentScope (); // start parsing a node that generates the syntax tree: sourceElements * sourceElements = parseSourceElements <CheckForStrictMode> (context); if (! SourceElements |! Consume (EOFTOK ))
}
For example, based on the Token type, JSC considers that the input Token is a constant declaration, and then uses the following template function to generate the syntax Node (Node), and puts it in ASTBuilder:
Next, BytecodeGenerator: generate will be called to generate bytecode, which can be analyzed in the following sections. Let's take a look at the process of generating bytecode from each of the following JavaScript syntax tree nodes:
JavaScriptCore/bytecompiler/NodeCodeGen. cpp:
RegisterID * BooleanNode: emitBytecode (BytecodeGenerator & generator, RegisterID * dst)
Get a blog update reminder and share more technical information as soon as possible. Welcome to the personal public platform: coder_online)
1. Help you answer wekit technical questions directly
2. Get technical articles from more than 10 fields in the industry in the first time
3. Ask questions in the article, reply to you immediately, and help you answer questions patiently
4. Let you and the original author become good friends and expand their own network resources
Scan the QR code below or search for the coder_online code. We can contact you online.