On the source of TIDB

Last Update:2017-02-09 Source: Internet

Author: User

Tags logical operators scalar

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This is a creation in Article, where the information may have evolved or changed.

Author: @ Shen Shi

This document is intended for TIDB community developers and focuses on TIDB's system architecture, code structure, and execution processes. The goal is to enable developers to read the document, to have a holistic understanding of the TIDB project, to better participate in. First, the general structure and the structure of the Golang package are introduced, and then the internal execution process is introduced, and finally, the two most important components of the optimizer, the actuator, are described.

System architecture

Tidb Server is located in the entire system between Load Balancer (or application) and the underlying storage engine, with the main part divided into three tiers:

MySQL Protocol Layer

Receive the MySQL Client request, parse the MySQL Protocol package and convert it to various commands in the Tidb Session, and after processing is completed, the result is the MySQL Protocol format, which is returned to the Client.

SQL Layer

Parse and execute the SQL statement, make the query plan and optimize, generate the executor and read or write the data through the KV layer, and finally return the result to the MySQL protocol layer. This layer is the focus and will be described in more detail later.

KV Layer

Provides a transactional (distributed/stand-alone) storage, between the KV layer and the SQL layer, there is a layer of abstraction, so that the SQL layer can ignore the following different KV storage differences, see a unified interface.

Code Structure Overview

First, all the package is listed, and then the main features are introduced. This chapter will be scattered, the information is relatively large, can be combined with the next chapter to understand.

Tidb

This package can be thought of as the interface between MySQL Protocol layer and SQL layer, with three main files:

Session.go: each Session object corresponding to a MySQL Client's connection,mysql Protocol layer is responsible for managing the binding relationship between the connection and the session, and the various M The Ysql query/command is executed by invoking the interface of the Session.

Tidb.go: Some functions for session.go calls

Bootstrap.go: When the Tidb Server is started, the initialization process is performed if the system is found to be uninitialized, and details are described in the following sections.

Docs

Some brief documents, more detailed documentation in Chinese documents and English documents

Executor

Tidb Executor, the SQL statement eventually translates into a combination of a series of actuators (physical operators). The main interface of this package for external exposure is Executor:

type Executor interface {    // 返回下一行数据（如果返回为空，则表明没有更多数据）Next() (*Row, error) // 关闭当前执行器，做一些清理工作    Close() error    // 改执行器返回结果的 Schema，包括每个 Field 的详细信息    Schema() expression.Schema}

All actuators implement this interface, TIDB's execution engine uses the Volcano model, and the actuators interact through the above three interfaces, each of which only needs to fetch the data from the other actuators via the Next interface and get the meta-information of the data through the Schema interface.

Plan

Here is the core of the entire SQL layer, after the SQL statement is parsed into an AST, the query plan is developed in this package, and the query plan is optimized, including logical optimization and physical optimization.

The package also includes the following features:

Validator.go: Verifying the validity of an AST

Preprocess.go: Currently only name resolve this item

RESOLVER.GO: name resolution that resolves the identifier (database/table/column/alias) in the SQL statement and binds it to the corresponding column or Field.

Typeinferer.go: Derivation result type. For SQL statements, the type of the result can be inferred without execution.

Logical_plan_builder.go: Develop an optimized logical query plan

Physical_plan_builder.go: Develop a physical query plan based on a logical query plan

Privilege

Permissions control related interfaces, specifically implemented in the Privilege/privileges package

Sessionctx

Storing the status information in the session, such as session variable information, can be obtained in the session and placed in a separate package mainly to clarify dependencies and avoid cyclic dependency problems.

Table

Table interface, an abstraction of the tables in the database, providing many operations on the table (such as getting the table's column information, reading a row of data, etc.), specifically implemented in Table/tables. There is also an abstraction for Column and Index.

Tidb-server

The Main.go of the Tidb-server program is primarily the code that is started for the server.

Server

MySQL Procotol layer Implementation, the main work is to parse the protocol, pass command/query.

Ast

Define a visitor and then call the Accept method of the node in the tree to traverse the tree. SQL statements are parsed into an abstract syntax tree (AST), and the node definitions in the tree are under this package. Each node implements the visitor interface, which can be very simple

Ddl

Asynchronous Schema change related code, similar to Google F1 's paper implementation. Algorithm detailed.

Domain

Domain can be thought of as a storage space where you can create databases, create tables, and have databases of the same name between different domain names, a bit like name space. Information schema information is bound on domain.

Expression

The most important interface for the definition of an expression is:

type Expression interface {.....}

The expressions that currently implement this interface include:

Scalar functions: Scalar function expressions
Aggregate function: Aggregate functions expression
Column: Columns expression
Const: Constant expression

Evaluator

Expression evaluation-related logic, all expression evaluation methods are here.

Infoschema

Informationschema to provide db/table/column related information.

KV-related interface definitions and partial implementations, including Retriever/mutator/transaction/snapshot/storage/iterator, etc. A unified abstraction for multiple KV storage in the lower layer is made.

Model

TIDB supported DDL/DML-related data structures, including dbinfo/tableinfo/columninfo/indexinfo, etc.

Parser

Parsing module, which mainly includes lexical parsing (lexer.go) and parsing (PARSER.Y), which is the main external interface of parse (), used to parse SQL text into an AST.

Store

The implementation of the underlying KV storage, if you want to access the new storage engine, it can be packaged, the code is placed under this package, currently access two engines: Distributed Engine (TIKV) and single engine (Localstore/{goleveldb/boltdb}). The new engine here needs to implement the interfaces defined in the KV package.

For KV and store, refer to the TIDB Storage Engine Access Guide

Terror

Defines the error system for TIDB.

Context

Context interface, session is an implementation of the context interface, where the interface is abstracted, mainly to avoid cyclic dependence. Session various state information is accessed through this interface.

Inspectkv

Tidb SQL Data and KV auxiliary check packages, which will later be used for external access to TIDB kv, will be redefined and extended

Meta

TIDB Meta data-related constant definitions and common function definitions. Meta/autoid defines a api,meta information that is used to generate a self-increment ID within a globally unique session.

Mysql

MySQL-related constant definitions.

Structure

A layer of encapsulation is done on the key-value to support a richer KV type that supports Transaction, including String/list/hash. Primarily used in asynchronous Schema changes.

Util

Some tool classes, where a package is more important, is the types package, which has many and type definitions and operations on various types of objects.

Distsql

The distributed SQL execution interface, if the underlying storage engine supports distributed actuators, can send requests through this interface, which is described in detail later in this module.

Protocol Layer

The protocol layer is an interface that interacts with the application, and currently TIDB only supports the MySQL protocol, and the associated code is in the server package. The primary function of this layer is to manage the client connection, parse the MySQL command and return the execution results. This layer is implemented according to the MySQL protocol, the specific protocol can be consulted: MySQL client/server Protocol

The entry method for a single connection processing command is the dispatch method of the Clientconn class, where the protocol is parsed and different handler functions are called.

SQL Layer

Following the above chapters, the reader has been briefed on the overall framework of the TIDB and the details of each package, and this chapter begins with a brief introduction to the core chapters on how the Tidb SQL layer work. This ignores the specific KV layer and focuses only on the SQL layer. Hopefully through this chapter, readers can see how an SQL statement changes from a piece of text to an execution result set. Detailed procedures throughout the process are described in the next few chapters.

In general, an SQL statement needs to go through, parse-and-Validate the query plan--and optimize the query plan--build a query based on the plan--and execute and return the results, a series of processes. The TIDB implementation process is based on this and the flowchart is as follows:

The entry of the entire process is in the session.go of the TIDB package, Tidb-server calls the Session.execute () interface, and enters the SQL statement in the text format, implemented in session. Execute ().

Compile () is called first to parse the SQL statement (TIDB. Parse ()), parsing will get a list of stmt, where each statement is an AST (abstract syntax tree), each syntax unit is a Node of the number, the structure is defined in the AST package.

After getting the AST, call the Compiler in the executor package, enter the AST, and get the Actuator (Compiler.compile ()). In this process, validation of legitimacy, planning, and optimization of the query plan are completed.

Enter the Compiler.compile () function and the plan is called first. Validate () Validates the statement (see PLAN/VALIDATOR.GO), then enters the preprocess process, and at this stage, preprocess only does the name resolution work, and the column name referred to in the SQL statement or Alias name is bound to the corresponding field. For example, "Select C from T;" This statement binds the name C to the corresponding column of the T table (see Plan/resolver.go for a specific implementation). After that, enter the optimizer. Optimize ().

Optimize () method, the results of each node in the AST are first deduced, as

Select 1, ' xx ', c from t;

For select fields, the first field is 1, its type is Longlong, the second field is ' xx ', its type is varstring, the third field is C, and its type is the type of column C in table T. Note that in addition to the category of charset and other information, are to be deduced, concrete implementation see PLAN/TYPEINFERER.GO. After the type derivation is done, the logic optimization (Planbuilderbuild ()) is performed, and the main work is the equivalent transformation of the AST based on the algebraic operation, simplifying the AST. Like what

Select C from T where C > 1+1*2;

Can be converted to the equivalent

Select C from T where C > 3;

After the logical optimization is completed, the physical optimization is made, the query plan tree is generated, and the tree is transformed using the index, according to some rule and cost model, to reduce the price of the query process, and to import the Dooptimize () method in Plan/optimizer.go.

After the query plan is generated, it is converted to an executor and executed by the Exec interface to get the Recordset object, which calls the Next () method to get the query results.

Optimizer

The optimizer is the core of the database and determines how each statement executes. If the database is an army, then the optimizer is the commander of this army, military advisers, need to strategize, the victory in the thousands of miles away. As the saying goes, the inability of the armed forces, the same statement, choose a different query plan, the final run time may vary greatly. The study of the optimizer has always been a more active field in academia, the optimization is endless, it can be said that in this piece of energy is not too much effort.

From the optimization method, can be broadly divided into three categories:

Rule based optimizer: optimization of plan by heuristic rules
Cost based optimizer: optimization of plan by calculating query costs
History based optimizer: optimization of plan with historical query information

Rule-based optimizer is relatively easy to implement, as long as the selection of some common rules, you can have a good effect on most commonly used queries. But the defects are also obvious: it is impossible to choose the optimal scheme according to the real situation of the data. For example, for the query statement "SELECT * FROM t where C1 = c2 > 100" When selecting the index, if only according to the rules, then it is necessary to select C1 above the index to query, but if T C1 all the value is 10, then this query meter It's a bad row. This time if there is information on the data distribution in the table, it is helpful to select a good query plan.

The cost-based optimizer is more complex, with two core problems, one is how to get the real distribution information of the data, and the other is how to estimate the cost of a query plan based on this information.

The query optimizer based on historical information is relatively small and is not covered in the general OLTP database.

TIDB Optimizer-related code in the plan package, the main task of this package is to convert the AST into a query plan tree, the nodes in the tree are various logical operators or physical operators, the various optimization of query planning is by invoking the tree root node of various methods, recursively to optimize all nodes, And the nodes in the tree are constantly being transformed and cropped.

The most important interfaces are in Plan.go, including the following:

Plan: Interfaces for all query plans
Logicalplan: Logical query plan, all logical operators need to implement this interface
Physicalplan: Physical query plan, all physical operators need to implement this interface

The logic-optimized entry is Planbuilder.build (), the input is the AST, and the output is the logical query plan tree. Then make logical query optimizations on this tree:

Call Logicalplan's Predicatepushdown interface to push the predicate down as far as possible
Call Logicalplan's Prunecolumns interface to cut out unwanted columns
Call Aggpushdownsolver.aggpushdown to push the aggregation operator before the Join

After obtaining the logical optimized query plan tree, the physical optimizations are made, and the entry of the code is called Convert2physicalplan (&requiredproperty{}) on the root node of the logical query plan tree, where requiredproperty is the requirement to return the result order and the number of rows in the lower layer.

The logical query plan tree starts with the root node, constantly recursive calls, transfers each node from the logical operator to the physical operator, and finds a better query path based on the query cost of each node.

Actuator

Although the optimizer is the most core component, the lack of an excellent actuator still fails to form an excellent database. As an example of the army, the actuator is the soldier in the army. A more powerful general, if not a group of soldiers to be able to battle, also can not win victory.

Compared to the MYSQL,TIDB actuator has two a bit, the first is that the entire computational framework is an MPP framework, the calculation will be performed on multiple tikv and TIDB nodes, to maximize efficiency and speed, and the second is that a single operator will be as parallel as possible, such as join/union operators, will start multiple threads at the same time, the entire data calculation process constitutes a pipeline, minimizing the waiting time for each operator. So TIDB is better at handling large amounts of data than MySQL.

The most important interfaces of actuators are in Executor.go:

// Executor executes a query.type Executor interface {    Next() (*Row, error)    Close() error    Schema() expression.Schema}

The physical query plan tree obtained through the optimizer is converted into an executor tree, and each node in the tree implements this interface, and the executor passes the data through the Next interface. For example select C1 from T where C2 > 10; The resulting actuator is the projection->filter->tablescan of the three actuators, the topmost Projection will constantly call the lower level of the Next interface, the final transfer to the underlying Tablescan, from the table to obtain data.

Distributed actuators

TIDB as a distributed database, built-in a distributed computing framework, Query in the execution of the time, as far as possible distributed + parallel. The entry of this framework is in the Distsql package, the most important of which is the following two interfaces:

One of the most important external interfaces provided by Distsql is Select (). The first parameter of KV. Client, as long as the KV engine satisfies with the transaction, satisfies the KV interface, and satisfies this Client some interface, can access the TIDB. There are some other vendors working with TIDB to run Tidb on the other KV and support distributed SQL. The second parameter is selectrequest. This thing is constructed by the upper executor, it takes the logic of the calculation, for example, some expressions to do not sort, to do aggregation, all the information is placed in the req inside, is a PROTOBUF structure, and then sent to the Select interface, will eventually be sent to the calculation of the TIKV region On the server.

The main task of distributed actuators on the TIDB side is to do the distribution of tasks and collect the results. The Select interface returns a data structure called Selectresult, which can be considered an iterator because the lower layer has many region servers, and the result returned by each node is a partialresult. On top of these partial results, a selectresult is encapsulated, which is a partialresult iterator. With this next method you can get the next partialresult.

The internal implementation of Selectresult can be considered a pipeline. TIDB will send requests to each region server in parallel, and return the results to the upper level in a predetermined order, where the order is determined by the order of the lower result return and the Keeporder parameter of the Select interface.
_
This section of the relevant code can be see Distsql package and Store/tikv/coprocessor.go.

Source Address: Https://github.com/pingcap/tidb

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More