Mapping of relational models to Key-value models

Source: Internet
Author: User
Tags comparison unique id value store
Https://pingcap.com/blog-tidb-internal-2-zh

Here we simply understand the relational model as table and SQL statements, then the question becomes how to save the table on the KV structure and how to run the SQL statement on the KV structure. Let's say we have a definition for this table:

    CREATE TABLE User {
        id int,
        Name varchar (),
        Role varchar (), age
        int,
        PRIMARY key (ID),
        key Idxage (age)
    };

There is a huge difference between SQL and KV structures, so how to easily and efficiently map is a very important issue. A good mapping scheme must facilitate the need for data manipulation. So let's take a look at what needs to be done with the data, and what are the characteristics of each.

For a table, the data that needs to be stored consists of three parts: The table's meta-information in the Row index data

Table meta-Information we don't discuss it for the time being, there will be special chapters to introduce. For row, you can choose either a row or a column, both of which have advantages and disadvantages. The primary goal of TIDB is the OLTP business, which needs to support the fast reading, saving, modifying, and deleting of a row of data, so it is more appropriate to use a row store.

For INDEX,TIDB not only need to support Primary index, also need to support secondary index. The role of Index auxiliary queries, improve query performance, and ensure certain Constraint. When querying, there are two modes, one is to check, for example, by Primary key or Unique key equivalent conditions, such as select name from user where id=1; , this need to quickly navigate to a row of data through the index, and the other is a Range query, such as the select name from the user where age > 35 and the time of the "Idxage", which needs to be queried through the index of age in 20 and 30 of those data. Index is also divided into unique index and non-unique index, both of which need support.

After analyzing the characteristics of the data that needs to be stored, we take a look at the operational requirements for these data, mainly considering the four statements of Insert/update/delete/select.

For Insert statements, you need to write the Row to KV and set up the index data.

For the UPDATE statement, you need to update the index data (if necessary) while updating the Row.

For DELETE statements, you need to delete the Row as well as the index.

The above three statements are simple to handle. For Select statements, the situation can be complicated. First we need to be able to read a row of data simply and quickly, so each row needs to have an ID (either a display or an implicit ID). Second, multiple rows of data may be read, such as Select * from user;. Finally, there is the need to read the data by index, and the use of the index may be a point or range query.

The general requirements have been analyzed, now let's see what can be used in the hand: a globally ordered distributed Key-value engine. The importance of global order can help us solve many problems. For example, to get a row of data quickly, assuming that we can construct one or several keys and navigate to this line, we can use the Seek method provided by TIKV to quickly navigate to the location of this row of data. For example, the need to scan the full table, if you can map to a Key Range, from Startkey Scan to EndKey, then you can simply get the full table data in this way. Manipulating the Index data is a similar idea. Next, let's see how tidb is doing.

TIDB assigns a TableID to each table, each index is assigned a INDEXID, each row is assigned a RowID (if the table has an integer-type Primary key, the value of Primary key is used as RowID), where TableID is in the entire cluster Only, Indexid/rowid are unique within the table, and these IDs are int64 types. Each row of data is encoded into Key-value pair according to the following rules:

    Key:tableprefix_rowprefix_tableid_rowid
    Value: [col1, Col2, Col3, Col4]

The tableprefix/rowprefix of the Key is a specific string constant used to differentiate the other data within the KV space. For Index data, it is encoded as key-value pair according to the following rules:

    Key:tableprefix_idxprefix_tableid_indexid_indexcolumnsvalue
    Value:rowid

The index data also needs to consider unique index and non-unique index two cases, for unique index, you can follow the above coding rules. However, for a non-unique index, this encoding does not construct a unique Key, because the same index tableprefix_idxprefix_tableid_indexid_ is the same, there may be multiple rows of data columnsvalue is , the code for the non-Unique Index is adjusted a bit:

    Key:tableprefix_idxprefix_tableid_indexid_columnsvalue_rowid
    Value:null

This makes it possible to construct a unique Key for each row of data in the index. Note that the various xxprefix in the Key in the above coding rules are string constants that distinguish between namespaces to avoid conflicting types of data, as defined below:

    var (
        tableprefix     = []byte{' t '}
        recordprefixsep = []byte ("_r")
        indexprefixsep  = []byte ("_i")
    )

Please also note that in the above scenario, either row or index Key encoding scheme, a Table inside all the Row has the same prefix, an Index of the data also have the same prefix. Such specific prefixes of data, within the tikv Key space, are arranged together. At the same time, as long as we carefully design the suffix part of the coding scheme, to ensure that the pre-encoding and after the comparison between the code is not changed, then the Row or Index data can be stored in an orderly manner in tikv. This ensures that the comparison between the pre-and post-coded comparisons is the same as memcomparable, for any type of value, two objects before encoding the original type comparison result, and encoded into a byte array (note that the Key in tikv and value are the original byte array) comparison The results remain consistent. Refer to the TIDB codec package for the specific coding scheme. With this encoding, all Row data for a table is arranged in the TIKV key space in the order of RowID, and the data of one index is also listed in the key space in the order of index Columnvalue.

Now let's take a look at the requirements that we started with and the mapping scheme for TIDB to see if the solution meets the requirements. First we use this mapping scheme to convert both row and index data to key-value data, and each row and index data has a unique Key. Second, this mapping scheme is very friendly to the point and range queries, and we can easily construct a key for a row, an index, or a key range for an adjacent row or index value. Finally, in order to guarantee some Constraint in the table, it is possible to determine whether the corresponding Constraint can be satisfied by constructing and checking whether a Key exists.

Now that we've talked about how to map the table to KV, here's a simple example for you to understand, or take the table structure above as an example. Suppose there are 3 rows of data in the table: "Tidb", "SQL Layer", "tikv", "KV Engine", "PD", "Manager", 30

So first, each row of data will be mapped to a key-value pair, note that the table has an Int type of Primary key, so the value of RowID is the value of this Primary key. Let's say the table ID is 10 and its Row data is:

    T_r_10_1--["Tidb", "SQL Layer", "T_r_10_2", "
    tikv", "KV Engine", [] t_r_10_3--[
    "PD", "Manage R ", 30]

In addition to Primary Key, this table also has an index, assuming that the index has an ID of 1, then its data is:

    T_i_10_1_10_1-t_i_10_1_20_2 null--and null T_i_10_1_30_3--
    NULL

You can use the above coding rules to understand this example, I hope you can understand why we chose this mapping scheme, what is the purpose of this. meta-Information management

The previous section describes how the data and indexes in a table are mapped to KV, and this section describes the storage of meta information. Database/table all have meta information, which is the definition and attributes, and this information needs to be persisted, and we also store this information in TIKV. Each database/table is assigned a unique ID, which is uniquely identified, and when encoded as Key-value, the ID is encoded into Key, plus the m_ prefix. This allows you to construct a serialized meta-information stored in a key,value. In addition, there is a dedicated Key-value store version of the current Schema information. Tidb using Google F1 's Online schema change algorithm, there is a background thread that constantly checks to see if the schema version stored on the TIKV has changed, and is guaranteed to be able to get a version change (if it does change) for a certain amount of time. The implementation of this section is described in the TIDB Asynchronous schema change implementation article. SQL on KV architecture

The overall architecture of the TIDB is shown in the following figure

The main role of the

Tikv Cluster is to store data as a KV engine, which has been described in detail in the previous article and is no longer covered here. This article mainly introduces the SQL layer, that is, tidb Servers layer, the nodes of this layer are stateless nodes, itself does not store data, the nodes are completely equivalent. TIDB Server This layer of the most important work is to handle user requests, execute SQL operations logic, and then we do some simple introduction. SQL Operation

After understanding the SQL-to-KV mapping scheme, we can understand how relational data is stored, and then we will understand how to use this data to meet the user's query requirements, that is, how a query statement operates the underlying stored data. The simplest solution imaginable is to map SQL queries to KV queries through the mapping scheme described in the previous section, then obtain the corresponding data through the KV interface, and finally perform various calculations. For example  select count (*) from user where name= "tidb";  Such a statement, we need to read all the data in the table, and then check whether the  Name  field is  TIDB, If so, the line is returned. Such an operation process is converted to a KV operation flow: Constructs a key range: all RowID in a table are in  [0, MaxInt64)   in this range, then we use 0 and MaxInt64 according to the key coding rules of Row, we can construct Create a  [startkey, EndKey)   left-close right open interval scan key range: reads the data filtering data in the TIKV based on the key range constructed above: For each row of data read, calculate  name= " Tidb "  This expression, if true, returns this line up, otherwise discards this row of data calculated by count: on each line that meets the requirements, accumulate to the count value above this scheme is certainly work, but it is not very good, the reason is obvious: When scanning data, each line is to be read through the KV operation with TIKV, at least one RPC overhead, if there is a lot of data to scan, then this cost is very large not all rows are useful, if not meet the conditions, in fact, can not read out the value of the line to meet the requirements and no meaning, Actually, there's only a few rows of data here. distributed SQL Operations

How to avoid these flaws is also obvious, first we need to calculate as close to the storage node as possible to avoid a large number of RPC calls. Second, we need to push the Filter down to the storage node for calculation, so we just need to return valid rows to avoid meaningless network transmissions. Finally, we can push the aggregate function, GroupBy also down to the storage node, to pre-aggregate, each node only need to return a count value, and then by the Tidb-server count value Sum up. Here is a schematic diagram of the data being returned by layer:

Here is an article detailing how TIDB makes SQL statements run faster, so you can refer to them for a moment. SQL Layer Schema

The above sections briefly describe some of the functions of the SQL layer, and I hope you have a basic understanding of the processing of SQL statements. In fact, the SQL layer of TIDB is much more complex, the module and the level is very many, the following diagram lists the important modules and the call relationship:

The user's SQL request will be sent directly or through the Load Balancer to Tidb-server,tidb-server will parse the MySQL Protocol Packet, get the request content, then do the parsing, query plan development and optimization, Execute a query plan to get and process data. The data is all stored in the TIKV cluster, so in this process tidb-server need to interact with Tikv-server to get the data. Finally Tidb-server needs to return the query results to the user. Summary

Here, we have learned from a SQL perspective how data is stored and how it is used for computing. A more detailed introduction to the SQL layer will be given in future articles, such as how the optimizer works and the details of the distributed execution framework. In the next article we will introduce some information about PD, which is interesting, and many of these things are not visible in the process of using the TIDB, but are very important to the overall cluster. It mainly involves the management and scheduling of the cluster. Share

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.