MYSQL source code compilation principle AST and parse tree code syntax parsing

Source: Internet
Author: User

MYSQL source code compilation principle AST and parse tree code syntax parsing

using the AST treeCategory: ANTLR 2013-12-02 22:39 255 people read Comments (0) favorite reports

Directory (?) [+]

    1. The fifth chapter evaluates the expression values using the AST-tree intermediate results
    2. Create ASTs
The fifth chapter, useAST Tree Intermediate result to evaluate expression values

Now we know that by creating a ANTLR grammar file and adding some actions to implement a "converter", this chapter introduces another way to implement the same functionality, which requires some additional tree structure. We'll use the same grammar syntax to create an intermediate data result, just to replace some of the actions we added earlier with the tree's creation rules. Once we have a tree structure, we can parse the tree with a tree parser and perform some actions.

ANTLR will create a tree parser from the grammar file. Parser Grammar converts the input character stream into a tree structure, and the output parser evaluates it.

Although the previous approach was more straightforward, it was not well done in terms of language planning. Adding a method call, a while loop to grammar means that the parser executes the same code multiple times. The parser needs to re-parse this method whenever a method needs to be called. Therefore, compared to the AST method, the previous method is not flexible enough, theAST method generates an AST tree to store intermediate results, and then traverse the tree to perform related operations. It is clear that repeating traversal trees is more efficient than repeating parsing grammar .

An intermediate result is usually a tree in which nodes are not only symbols, but also nodes that represent relationships between symbols. For example, the following figure represents the expression 3+4:

In many cases, you will see that the tree is represented as a textual form. For example,3+4 can be expressed as (+ 3 4 ). The first sign after the parentheses is the root node, the thing that follows his child. an AST tree of the expression 3+4*5 , which resembles the text (+ 3 (* 4 5)) as shown in:

As we can see, theAST tree structure strictly represents the operation priority. Here, the multiplication must be executed first, because the addition requires the result of multiplication as an operation parameter.

The AST is different from the parse tree, and the parse tree represents the parsing rule. Shows the parse tree in this example:

The leaf node of the parse tree is the input symbol, and the non-leaf node is the rule name. The root node prog indicates that3+4 is a prog. More specifically a stat, and stat is made up of expr . So the parse tree, which records how the recognizer matches the input.

In practice, it is very useful to have the syntax and the tree decoupled. Therefore, the AST tree is not affected by the parse tree. A grammar usually alters the structure of the parse tree without affecting the AST, which makes sense for the code that handles the AST tree.

Once the AST tree has been created , you can access it in a number of ways to calculate the results you want. In general, I recommend that you use tree grammar to generate code to access the tree.

In the next section, you'll learn how to create an AST tree, how to access it through a tree grammar , and how to set up Actions in the tree grammar. Most likely, you'll get a converter that has the same functionality as before.

MySQL source code: From SQL statements to MySQL internal objects

2012-11-2 |  10:40 Category: MySQL, code details | Tags: MySQL, Source Code, SQL parsing |

Since the small post on the blog has been written less, the previous blog has been written in June ... It has been a long time since I wrote the first article about MySQL source code, and continue on the road.

The optimizer is an important and distinctive part of the relational database, and the theory and practice of the optimizer are also mostly complex, and this series of articles hopes to use MySQL, to make it long and short, by parsing the MySQL optimizer. By the way also a glimpse of the implementation of relational database optimizer ideas. The article will focus on the relationship between the important data structures and structure, rather than focusing on the code ("bad programmers worry. Good programmers worry about data structures and their relationships. " )。

Directory [Hide]

    • 0 write in front
    • 1 SQL Statement Parsing basics
      • 1.1 Parsing Basics/flex and Bison
      • 1.2 MySQL syntax parsing sample and
    • 2 SQL statements to MySQL internal objects
      • 2.1 Notable item objects
      • Where in the 2.2 bison syntax
      • 2.3 Where the data structure and the relationship between them
      • 2.4 Printing a Where object through GDB
    • 3 about the Item object
    • Reference
0 write in front

What this article solves: Hopefully these articles will help you understand the behavior of the MySQL optimizer more smoothly, and learn more about the ideas before you read the MySQL source code.

This article does not solve any problem: teach you how to read the source code;

This series is very long and presumably goes on like this: basic data structures, parsing, key algorithms for joins, join sequence, and single table access. Data structures (and their relationships) and algorithmic processes are always interspersed with each other.

Suggested reading: references to articles and books are recommended to read before reading this article.

1 SQL statement parsing Basics1.1 Parsing Basics/flex and Bison

MySQL syntax parsing encapsulation is done in the function Mysqlparser. Like any other parser, it contains two modules: Lexical analysis (Lexical scanner) and grammar rules (Grammar rule module). Lexical analysis breaks the entire SQL statement into a single word (Token), and the grammar rules module generates the corresponding data structure according to the grammar rules defined by MySQL, and is stored in the object Thd->lex structure. The final optimizer, based on the data here, generates the execution plan and then invokes the storage engine interface execution.

Lexical analysis and grammar rules module has two more mature open source tools Flex and bison are respectively used to solve these two problems. MySQL is in a performance and flexible consideration, choosing to complete the lexical parsing section, using Bison in the syntax rules section. The core function of lexical parsing and bison communication is the function interface Yylex () provided by the lexical parser, which, in Bison, calls Yylex () to obtain lexical parsing data and complete its own parsing. Bison's entrance is yyparse (), in MySQL, Mysqlparse.

If the lexical analysis and grammar rules module is unfamiliar, it is recommended to read the reference [4][5][6] First Note 1, otherwise it is difficult to understand the entire structure, or at least a strong sense of fault. Furthermore, it is very effective to track the storage structure of MySQL data based on the action of Bison.

1.2 MySQL Syntax parsing sample and

The simple parsing process can use the following instructions:

Specifically parse the where part of an SQL statement:

2 SQL statements to MySQL internal objects

Bison The parsing result (a parse tree/ast) is stored in the Thd::lex after parsing the syntax. This will examine the results of parsing by examining the data structures where they are stored.

2.1 Notable item objects

Before we get to know the parse tree of MySQL, we need to get to know an important data structure item. This is a base object, in the optimizer section code, all over the floor. Also described separately in MySQL Internal manual: The Item Class.

Item is a basic class that derives a lot of descendants on his basis. These subclasses basically describe the objects in all the SQL statements , and they include:

    • A text string/numeric object
    • A column of a data table (for example, select C1,c2 from dual ... In the C1,C2)
    • A comparison action, such as C1>10
    • All information for a WHERE clause
    • ......

You can see that the item is basically all the objects in the code SQL statement. In the parse tree, the item exists in the form of a tree. As follows:

where in the 2.2 bison syntax

Starting with the SELECT clause, we see that the corresponding Where_clause is where we are concerned:

Let's take a look at some of the important actions in Bison, reference note 1:

where_clause:/* Empty */{} | WHERE Expr {Thd->lex->current_select->where = $ $} Expr: ... | Expr and expr {$$ = new (Yythd->mem_root) Item_cond_and ($, $)} |ident comp_op NUM/* This line is not part of the source code, easy to understand simplifies this */{$$ = New Item_func_ge (A, b); /* This line is not part of the source code, easy to understand simplifies this */}

Based on the Bison syntax here, you can produce the where syntax tree above. If you are the same as I just know flex/bison/ast, must also decide very ingenious!

2.3 Where the data structure and the relationship between them

The following diagram is drawn to describe the various branches of the where and where parse tree:

For example where condition where c1= "Orczhou" and C2 >, where itself (Lex->select->where) is a Item_cond_and object, There is an item list in this object that takes the value of each item in the list and transports it, that is, where the value is.

Here, there are two item objects in the Where list, representing c1= "Orczhou" and C2 > 10 respectively. Specifically, the types of these two objects are item_func_eq and ITEM_FUNC_GT, respectively.

Take a look at ITEM_FUNC_GT (for C2 > 10) object, this object is derived from Item_func (of course, traced source is the child of the item), this object has members: item **args. Args stores the item that the comparison operation needs to use.

For C2 > 10, there are two Item (s) in this inequality, representing field C2 and integer 10, respectively: Item_field and Item_int.

2.4 Printing a Where object through GDB

Where condition is: where id = 531389273 and reg_date > ' 2012-02-12 09 ';

Print a list in where

(GDB) p ((Item_cond *) select_lex->where)->list $13 = {<base_list> = {<Sql_alloc> = {<no data Fields>}, members of Base_list:first = 0x7f5bbc005860, last = 0x7f5bbc005870, elements = 2}

Because where has two judgments, there are two elements in the list here.

Print the first judgment in list (id = 531389273)

(GDB) p * (Item_func *) ((Item_cond *) select_lex->where)->list->first->info $69 = {<item_result_ field> = {<Item> = {...} next = 0x7f2134005320, ...}, the members of Item_func:args = 0x7f21340054, and so on. Tmp_arg = {0x7f2134005228, 0x7f2134005320}, Arg_count = 2, ...}

This equals operation has two action elements (arg_count=2) and is stored in args as an array

Print the first object (i.e. ID) of the above equation

Print the first Item's type P ((Item_func *) ((Item_cond *) select_lex->where)->list->first->info)->args[0]-> Type () $74 = Item::field_item Converts the first Item to the correct type and then prints p * (Item_field *) ((Item_func *) ((Item_cond *) select_lex->where)- >list->first->info)->args[0] $78 = {<Item_ident> = {<Item> = {.... name = 0x7f2134005208 "id ", ...}, ..... Members of Item_ident:orig_field_name = 0x7f2134005208" id ", Field_name = 0x7f2134005208" id ", ..... . }, members of Item_field:field = 0x0, Result_field = 0x0, ...}

You can see that the type of the ID object here is Item::field_item, which is the Item_field type.

3 about the item object

Continue starting with the Item_cond_and object where it is stored:

(Click to see a larger image)

See Item_cond_and's Inheritance Relationship: Item_cond->item_bool_func->......->item_result_filed->item

Item A very important member function is type, so when GDB is not clear about the type of item, you can call the method to determine:

(GDB) p ((* (* (ITEM_FUNC *) thd->lex->current_select->where)->tmp_arg[0])->type () $42 = Item::FIELD_ ITEM

This article is here, hope to continue.


[1] OReilly Understanding MySQL Internals

[2] The Skeleton of the Server [email protected] Internals Manual

[3] Explore MySQL source code –sql Adventures @by Hoterran

[4]mysql Source analysis of the network connection process @by Timchou

[5] Using Flex and Bison by Aaron Montgomery (from shallow to deep, more complete introduction)

[6] OReilly by John Levine (full introduction)

[7] [email protected] Aaron Myles Landwehr (Simple and concise introduction)

[8] MySQL Source Code

[Note 1] I have never known the compiler principle, grammar parsing related knowledge, this time, if you are the same, it is recommended to read [4][5][6] in the reference document, I am mediocre, about spent a week/20 hours or so

MYSQL source code compilation principle AST and parse tree code syntax parsing

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.