Open source Database middleware Kingshard was born

Source: Internet
Author: User
Tags mysql connection pool


Kingshard Introduction

Kingshard is a high-performance MySQL proxy project developed by Go, Kingshard is dedicated to simplifying MySQL sub-Library table operations in satisfying basic read/write separation functions, enabling DBAs to easily and smoothly implement MySQL database expansion through Kingshard. The performance of the Kingshard is approximately 80% of the performance of the directly connected MySQL.


Main functions:

    1. Read/write separation.

    2. Cross-node sub-table.

    3. supports a transparent MySQL connection pool without having to create a new connection each time.

    4. Supports client IP access control.

    5. Smooth on-line DB or offline db with no sense of front-end application.

    6. Supports load balancing through weights between multiple slave,slave.

    7. Supports forced read of the main library.

    8. Support for sending SQL to a specific node.

    9. supports the execution of transactions on a single node and does not support the execution of transactions across multiple node.

    10. Supports functions such as Count,sum,max and min across node.

    11. A JOIN operation that supports a single table, which supports a table and another join operation without a table.

    12. Support for operations such as order By,limit.

    13. Support for MySQL's prepare feature in the mainstream language (Java,php,python,c/c++,go) SDK.

    14. The maximum number of connections to the back-end DB is supported.

    15. Supports SQL log and slow log output.


Kingshard's application scenario


Many internet companies are still using MySQL extensively to store various types of relational data. As the amount of traffic and data grows, developers have to consider some new MySQL-related issues:

    1. Read and write separation problems. Because of the increase in front-end application access, single MySQL is not sufficient to support the entire system's write and query operations. At this point, we had to spread some time-consuming query operations across multiple slave.

    2. Single-table capacity issues. If at the beginning of the system design, do not take into account the problem of the table. As the amount of data grows, the single-table capacity is increasing. The author has seen a single-table capacity of 500 million records, then a simple delete operation will cause the system slow log, and may cause the MySQL IO transient surge. Many students might think of adding an index to the fields of the query, but when the amount of data grows to such a large extent, it is not obvious that the index will be added. In the final analysis, the volume of single-table data is so large that MySQL still needs to scan a lot of records even if it is positioned by index.

    3. Operation and maintenance of the database. If you configure the main library and from the library host in your code, the system runs, of course, is no problem. However, this greatly increases the operational pressure, such as: MySQL database io pressure due to the increase in traffic, DBAs need to add a slave, this time will have to modify the code, and then packaged and online. There are many very practical examples, which are not listed here.

    4. Connection pool. Front-end applications are frequently connected to MySQL, and the additional performance costs to MySQL are not negligible. If you cache a certain number of MySQL connections per db by adding a connection pool, when an application needs to connect to the backend MySQL, it can send a SQL request directly from the connection pool, which will greatly speed up the data query. And it can reduce the performance cost of MySQL.

    5. SQL logs. In the event of a problem with the program, we want to get some SQL logs, such as which SQL is sent to which db at what point. Viewing this log can help us locate the problem quickly.


In the face of these problems, we can implement them in the client code. But this will also make the client more and more heavy, less flexible. The author has been engaged in the development of database related work, it is based on the pain point of database development, designed and implemented the Kingshard database middleware. Kingshard has a more appropriate solution for the above 5 types of problems.


Overall Architecture


Kingshard uses go development to take full advantage of the concurrency features of the go language. The go language is very well packaged in terms of concurrency, which greatly simplifies the development of Kingshard. The overall workflow of Kingshard is described below:

    1. Reads the configuration file and starts the listening port set in the configuration file to listen for client requests.

    2. When a client connection request is received, a goroutine is started to process the request separately.

    3. Login verification is preferred, the verification process is fully compatible with the MySQL authentication protocol, and because the user name and password are already set in the configuration file, this information can be used to verify that the connection request is legitimate. When the user name and password are correct, go to the following steps, or return an error message to the client.

    4. After the authentication is passed, the client sends the SQL statement.

    5. Kingshard to the SQL statements sent by the client, lexical and semantic analysis, identification of the type of SQL and the generation of SQL routing plan. If necessary, the SQL is rewritten and then forwarded to the appropriate DB. It is also possible not to do lexical and semantic analysis directly forwarded to the corresponding backend db. If the forward SQL is a sub-table and spans multiple db, each DB corresponds to starting a goroutine sending SQL and receiving the results returned by that DB.

    6. Receives and merges the results and forwards them to the client.


Kingshard the whole process of work can refer to the following picture


650) this.width=650; "Src=" http://mmbiz.qpic.cn/mmbiz/ Tzia4bcy5heklv1owjgnn8uxkktqwibcyzi96b4byqrfbxbtsf8strasnu4cl6youxudsu1nzxicw2xbfpiclfn3ag/640?wx_fmt=jpeg &wxfrom=5&wx_lazy=1 "style=" border:0px none;width:auto;height:auto; "alt=" 640?wx_fmt=jpeg&wxfrom=5 &wx_lazy=1 "/>


Kingshard The overall architecture diagram is as follows


650) this.width=650; "Src=" http://mmbiz.qpic.cn/mmbiz/ Tzia4bcy5heklv1owjgnn8uxkktqwibcyzhgklzmvibpl634aa5jdcaiaebvlejiaeelziiddo6lqgkfarbbichn0yca/640?wx_fmt=jpeg &wxfrom=5&wx_lazy=1 "style=" border:0px none;width:auto;height:auto; "alt=" 640?wx_fmt=jpeg&wxfrom=5 &wx_lazy=1 "/>


lexical and semantic analysis


To make the function of kingshard powerful enough, we have to do the lexical and semantic analysis of SQL. The lexical analysis of an SQL statement is a keyword that divides the SQL statement into one. For example, the SQL statement: select name from Stu where ID < 13 for lexical analysis, the result is: {"select", "Name", "from", "Stu", "where", "id", "<", "13"}. The purpose of this is primarily to generate an abstract syntax tree, also known as the AST (Abstract syntax tree), where semantic analysis is based on the syntax tree. The main purpose of semantic analysis is as follows:

    1. Read and write separation, only the type of SQL statement can be identified, to perform the correct read and write separation operation.

    2. Data shards, parse out table names and query criteria, and route SQL to the correct db.

    3. SQL blacklist, through lexical and semantic analysis, can also quickly identify the need to block the SQL statement. For example, if a DELETE statement is detected without a where operation, it can block the forwarding of the SQL directly.


Kingshard does not consider fully compatible with all MySQL syntax, because full compatibility with MySQL syntax makes lexical and semantic analysis modules extremely complex and inefficient. There is no need for DDL statements to be parsed, as long as they are correctly forwarded to the corresponding DB on the backend.


Kingshard only resolves partial DML statements (Select,update,insert,delete,replace), which can satisfy most of the table operations. For other statements, Kingshard sends it to a default DB, or sends it to the specified db in a kingshard-specific way, for example:/*node2*/insert into Stu (id,name) VALUES (12, ' Xiaoming '), for this annotated SQL statement, Kingshard is able to identify and then send the SQL statement to the NODE2 node's master db.


Load Balancing


A large part of the user's use of MySQL proxy is to reduce the load on a single DB and share the read pressure on multiple db. Kingshard supports multiple slave, different slave can be configured with different read weights, and the larger the weights share the more read requests. Kingshard read Request load Balancing uses the weighted polling scheduling algorithm.


Most systems use this algorithm to dynamically calculate the ordinal of the selected DB when the SQL statement is forwarded. The SQL statement of the read request is then sent to the DB. Careful analysis, it is not necessary to do so. Because the weight of the DB is relatively fixed and does not change very often, it is possible to calculate a fixed polling sequence and then save the sequence in an array. This does not require dynamic computation, and every time you read an array. For example, configure the slave option in the Kingshard node configuration item: Slave:[email protected],[email protected] Kingshard A polling array is generated when the configuration information is initialized to the system : [0,0,1,1,1]. In Kingshard, the array is scrambled to the following order: [0,1,1,0,1]. This avoids the problem of dynamic calculation of DB subscript, which is helpful for performance improvement.


sharding Implementation


The first two concepts need to be introduced:

    1. The child table, in Kingshard, a large table logically composed of several small sub-tables. For example, divide the Stu table into stu_0000,stu_0001,stu_0002,stu_0003. The Stu table in the database does not exist, it is just a logical table. Only four child tables (stu_0000,stu_0001,stu_0002,stu_0003) exist in the database. When the SQL statement is sent, Kingshard identifies the SQL statement that needs to be divided into tables and overwrites the SQL. For example, the SQL statement that the client sends over is: select name from stu where id = 10; Kingshard when the SQL statement is received, it is recognized from the configuration information that the table is a hash type of the sub-table. According to the rules of the table, the SQL is rewritten as: select name from stu_2 where id = 10; It is then sent to the corresponding db.

    2. node, the child table is distributed on each node, each node contains one maser server and several slave servers (the number of slave can be 0). The write request is sent to the master server in that node, and the read request is sent to the slave server in that node.


Kingshard's sharding uses the idea of a two-level mapping, preferring to calculate the sub-table of the SQL statement based on the table condition of the SQL statement, and then find out which node the child table falls on based on the configuration information. The idea of using two-level mapping can be easily supported for MySQL's expansion and shrinking capacity. Currently Kingshard sharding supports INSERT, delete, select, UPDATE, and replace statements, all of which support cross-child tables. But the write operation only supports cross-sub-tables on a single node, and the select operation spans the child tables across node.


For some tables there is no table, and the SQL statements that manipulate the table are sent to default node. Alternatively, the user can choose to precede the SQL statement with a comment, specify the SQL to be sent to the Node,kingshard after receiving the statement, identify the node specified in the note, and then send the SQL to the corresponding node in the appropriate DB. For example, a user sends a/*node1*/select * from member where Id=100,kingshard receives the SQL and sends it to salve on Node1. In this way, Kingshard can be well compatible with the various application scenarios of the sub-table and non-sub-table.


implementation of the transaction


All proxies support Shard after a problem: the branch does not support distributed transactions? For performance and availability reasons, Kingshard only supports transactions on a single DB and does not allow cross-db transactions. Kingshard processes the transaction flow on a single db as follows:

    1. The user sends a BEGIN statement.

    2. Kingshard the state of the connection is set to a transaction after the SQL statement is received.

    3. The user sends a DML statement, Kingshard identifies the DB to which the statement needs to be sent, and then Kingshard creates a new connection to the back-end DB and sends the statement using that connection.

    4. After the result of the SQL statement is received, the connection is put back.

    5. Kingshard received the next SQL statement to determine whether the SQL was sent to the same db, if not the error. If it is destined for the same DB, the statement is sent using the connection.

    6. Receives a commit statement sent by the user, sets the status of the connection to non-transactional, and ends the transaction.


back-end db survival detection


Kingshard Each node initiates a goroutine to detect the status of the back-end master and slave. When the goroutine for a period of time (set by the down_after_noalive parameter in the configuration file) pings the backend db, the state of the DB is set to down, and subsequent kingshard will not send the SQL statement to that DB.


client Whitelist mechanism


Sometimes users want to allow only a few servers to connect Kingshard for security reasons. There is a parameter in the Kingshard configuration file: Allow_ips, which implements the client whitelist mechanism. When the administrator sets this parameter, it means that only the IP specified by allow_ips can connect to Kingshard, and the other IP will be kingshard denied connection. If this parameter is not set, the client connecting to Kingshard is not restricted.


Management end design and implementation


The Kingshard management port is multiplexed with the work port, which is indicated by a specific keyword (admin). Kingshard is the parsing of SQL statements into a Kingshard-aware command by lexical and semantic analysis of the management-specific SQL. currently supports smooth upper and lower line master and slave, and view Kingshard configuration and backend db status. The subsequent intention is to integrate the Web page into the management side so that users can operate on the Web page without entering command-line operations. Significantly reduce the user's threshold for using Kingshard.


Open source Database middleware Kingshard was born

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.