Interpretation of SHARDING-JDBC and realization of sub-database sub-table middleware

Source: Internet
Author: User
Tags sort table name xmlns
Sub-Library sub-table applicable scenarios

The Library sub-table is used to address the two common scenarios of the current Internet-large data volumes and high concurrency. It is usually divided into two types: vertical split and horizontal split.

Vertical splitting is the splitting of a library (table) into multiple libraries (tables) based on the business. For example, to split frequently and infrequently accessed fields into different libraries or tables. Due to the close relationship with the business, the current sub-database sub-table products are using horizontal splitting method.

Horizontal splitting is the splitting of a library (table) into multiple libraries (tables) based on the Shard algorithm. such as: According to the last digit of the ID to 3, the mantissa is 1 into the 1th Library (table), the mantissa is 2 into the 2nd Library (table) and so on.

Relational databases have a steep drop in retrieval performance when they are larger than a certain amount of data. In the face of the Internet massive data situation, all the data are stored in a table, it is clear that the database table can easily exceed the amount of data threshold. This single-table can withstand the data volume threshold, according to the database and the concurrency of the difference, through the actual test obtained.

Although the simple table can solve the problem of slow retrieval of data, it can't solve the problem that too many concurrent requests access the same library, causing the database to respond slowly. Therefore, the horizontal split is usually used at least in the form of a library to solve the problem of large data volume and high concurrency. This is also part of the open source Shard database middleware support only the reason for the sub-Library.

But the sub-table also has an irreplaceable application scenario. The most common table requirements are transaction issues. The same library does not need to consider distributed transactions, good at using the same library table can effectively avoid the problem of distributed transactions. The current strong consistency of distributed transactions due to performance problems, resulting in the use of not necessarily more than the library sub-table faster. At present, the ultimate consistency of flexible transactions mostly. Another reason for the existence of a sub-table is that too many database instances are not conducive to operations management. In summary, the best practice is to use the Sub-Library + sub-table reasonably.

Transaction principle Link sharding-jdbc Introduction

SHARDING-JDBC is a database horizontal shard framework, which is separated from the relational database module DD-RDB when applying framework Ddframe, and realizes the transparent database sub-table access. SHARDING-JDBC is the 3rd project of the Ddframe series open source following Dubbox and Elastic-job.

SHARDING-JDBC directly encapsulates the JDBC API, which can be understood as an enhanced version of the JDBC driver, and the legacy code migration cost is almost zero: applicable to any Java-based ORM framework such as JPA, Hibernate, Mybatis, Spring JDBC template or use JDBC directly. Can be based on any third-party database connection pools, such as DBCP, C3P0, BONECP, Druid, and so on. It is theoretically possible to support any database that implements the JDBC specification. Although only MySQL is currently supported, there are plans to support databases such as Oracle, SQL Server, and so on.

The SHARDING-JDBC is positioned as a lightweight Java framework that uses a direct-attached database of clients, provides services in the form of a jar package, no proxy layer, no additional deployment, no other dependencies, and no DBA needs to change the original operation mode.

The SHARDING-JDBC shard strategy is flexible and can support multi-dimensional shards such as equals, between, in, or multiple sharding keys.

SQL parsing function is perfect, support aggregation, grouping, sorting, limit, or and other queries, and support binding table and Cartesian product table query. compare with common open source products

In order to show respect for other open source projects, we have no intention of commenting on the items that are still being updated. Here are just a few items that are currently stopping the update, but are still very influential in the database Shard area, see table 1.


Table 1 Database Shard tool Comparison

As you can see from the table above, Cobar belongs to the middle tier scheme and builds a proxy between the application and MySQL. The middle tier between the application and the database, need to do a forwarding, and based on the JDBC protocol and no additional forwarding, directly from the application to connect to the database, performance has a slight advantage. This is not to show that the middle tier is not as direct as the client, in addition to performance, there are many factors to consider, the middle layer is more convenient for monitoring, data migration, connection management and other functions.

Cobar-client, TDDL, and SHARDING-JDBC are all part of the client direct connection program. The benefits of this scenario are portability, compatibility, performance, and minimal impact on DBAs. The implementation of Cobar-client is based on the ORM (Mybatis) framework, which has less compatibility and extensibility than the latter two based on the JDBC protocol. Implementation Principle

The SHARDING-JDBC is a jar file that implements the JDBC protocol, as described in the previous article. The implementation of JDBC-based protocol is slightly different from the middle tier based on MySQL and other database protocols.

Regardless of which architecture, the core logic is very similar, in addition to the protocol implementation layer (JDBC or database protocol), will be divided into the Shard rule configuration, SQL parsing, SQL rewrite, SQL routing, SQL execution and result merging modules.

The overall architecture of the SHARDING-JDBC is shown in Figure 1.


Figure 1 overall architecture diagram of SHARDING-JDBC

Shard Rule Configuration

Sharding-jdbc's Shard logic is very flexible and supports the functions of sharding policy customization, complex Shard key, and multi-operator sharding.

such as: According to the user ID sub-Library, according to the Order ID sub-table such a sub-library sub-table combination of sharding strategy, or according to the annual library, month + User area ID sub-table such as multi-chip key sharding.

SHARDING-JDBC, in addition to supporting the equal sign operator for sharding, also supports In/between operator sharding, which provides more powerful sharding functionality.

SHARDING-JDBC provides a spring namespace for simplified configuration, and a rule engine for simplifying policy authoring. As a result of the current open source Shard Core Logic, these two modules are not open source, waiting for the core stability will be open source other modules. JDBC Specification rewrite

SHARDING-JDBC's approach to the JDBC specification is to encapsulate five core interfaces for DataSource, Connection, Statement, PreparedStatement, and ResultSet, Incorporating multiple real JDBC implementation class collections (e.g. MySQL JDBC implementation/dbcp JDBC implementation, etc.) into the SHARDING-JDBC implementation class management.

SHARDING-JDBC maximize the implementation of the JDBC protocol, including Addbatch, a batch update feature that will be used in JPA. However, after all, Shard JDBC is different from native JDBC, so there are still non-implemented interfaces, including connection cursors, stored procedures and savepoint-related, resultset forward traversal and modification, and other less commonly used functions. In addition, to ensure compatibility, JDBC 4.1 and its post-release interfaces are not implemented (e.g., JDBC 4.1 is not supported in the DBCP 1.x version). SQL parsing

SQL parsing, as the core of the sub-list products, performance and compatibility are the most important metrics. At present, the common SQL parser mainly has fdb/jsqlparser and druid. SHARDING-JDBC uses Druid as the SQL parser, and the Druid parsing speed is dozens of times times that of the other two parsers after being tested.

Currently SHARDING-JDBC supports parsing of complex SQL such as join, aggregation (including AVG), order BY, group by, limit, or even query. The SQL parsing that is not expected to occur in the Shard scene is not supported by union, partial subqueries, intra-function shards, and so on. SQL Overwrite

The SQL rewrite is divided into two parts, in part by replacing the logical table name of the table with the real table name. The other part is to replace some of the incorrect functionality in the Shard environment based on the SQL parsing results. Here is a two example:

The 1th example is AVG calculation. In the context of a shard, it is incorrect to calculate the average value in Avg1 +AVG2+AVG3/3, which needs to be rewritten as (SUM1+SUM2+SUM3)/(count1+count2+ count3). This requires that the SQL that contains the AVG be overwritten with sum and count, and then the average is recalculated when the result is merged.

A 2nd example is paging. Let's say that every 10 data is one page and the 2nd page of data is taken. It is incorrect to obtain the limit 10, 10, and then remove the first 10 data according to the sorting criteria in the Shard environment. The correct approach is to rewrite the sub-conditions to limit 0, 20, remove all the first 2 pages of data, and then combine the sorting criteria to calculate the correct data. You can see that the higher the limit paging efficiency will be lower, but also the more wasted memory. There are a number of ways to avoid paging using limit, such as building a two-level index of record row and row offsets, or using the last-paged data end ID as the next query condition. SQL Routing

SQL routing is based on the Shard rule configuration, which locates SQL to a true data source. It is mainly divided into single-table routing, binding table routing and Cartesian product routing.

Single-table routing is the simplest, but routing results do not necessarily fall into the unique libraries (tables) because they support sharding based on operators such as between and in, so the end result can still fall into multiple libraries (tables).

The binding table can be understood as a master-slave table with fully consistent sub-table rules. Example: The order form and the Order Details table are based on the order ID as the Shard key, at any time the Shard logic is the same. Such association queries and single-table queries are quite difficult and performance-like.

The Cartesian product query is the most complex, because the consistency of the Shard rules cannot be located according to the binding relationship, so the associative query of the non-binding table needs to be disassembled for the Cartesian product combination execution. Query performance is low, and the number of database connections is high, you need to use caution. SQL Execution

After routing to the real data source, SHARDING-JDBC will execute SQL in multi-threaded concurrency and complete the processing of batch methods such as Addbatch. Merge Results

Result merging consists of 4 classes: normal traversal classes, sort classes, aggregation classes, and grouping classes. Each type skips unwanted data first based on pagination results.

The common traversal class is the simplest, simply traversing the collection of resultset in sequence.

The sorting class result sorts the result first and then outputs, because each shard result is sorted according to the respective condition, so the merge sort algorithm is used to integrate the final result.

There are 3 types of aggregation classes, comparative, cumulative, and average. The comparison type includes Max and Min, returning only the maximum (small) result. The additive type includes sum and count, which need to be returned after the result is accumulated. The average value is calculated by the sum and Count of SQL overrides, which are covered in SQL rewriting and are not included in the details.

Grouping classes are the most complex, requiring all resultset results to be put into memory, grouped using the map-reduce algorithm, and related processing based on sorting and aggregation conditions. This is where most of the memory is consumed and the most loss of performance is, consider using limit to restrict the packet size reasonably.

As a result, the merge section does not use pipeline parsing at the moment, and more improvements will be made here. Performance

Performance test report of routing results in library single table:

Query operation: The TPS for SHARDING-JDBC is 99.8% of the TPS for JDBC;
Insert operation: The TPS for SHARDING-JDBC is 90.2% of the TPS for JDBC;
Update operation: The TPS for SHARDING-JDBC is 93.1% of the TPS for JDBC;
As you can see, the SHARDING-JDBC performance loss is very low.

Performance test report for routing results in multi-Library multi-table:

Query operation: TPS dual Kubidanku can increase performance by approximately 94%;
Insert operation: TPS dual Kubidanku can increase performance by approximately 60%;
Update operation: TPS dual Kubidanku can increase performance by approximately 89%;
The results show that SHARDING-JDBC can effectively utilize multi-threading and distributed resources to improve performance greatly.
For more details, see the SHARDING-JDBC Performance test report. Roadmap

At present, SHARDING-JDBC concentrates on the core logic development of the sub-database, and after the stable function, it will be continuously updated according to the following lines: Read/write separation, flexible distributed transaction, distributed primary key generation strategy, SQL rewrite optimization, and further improve performance; SQL Hint, You can specify that a SQL is executed in a specific library table, based on business rules rather than SQL parsing routes;
Small table broadcast; ha correlation; flow control; Database building table tools; Data migration; Complex SQL parsing support such as subqueries, stored procedures, etc. Oracle, SQL Server support;

Configuration Center;

Recently busy project has been a long time no blog, the first 2 articles I gave you to build the foundation Springmvc+mybatis MAVEN Project, this simple framework has been able to deal with the general small projects. But there are many complex scenarios in our actual project, such as how to ensure performance when the data is very large. Today I will introduce to you the database sub-list optimization, this paper introduces MyBatis combined with Dangdang SHARDING-JDBC sub-database sub-table technology (principle here does not introduce)

First, the required dependencies are introduced in the Pom file

<dependency>
            <groupId>com.dangdang</groupId>
            <artifactid>sharding-jdbc-core </artifactId>
            <version>1.4.2</version>
        </dependency>
        <dependency>
            <groupId>com.dangdang</groupId>
            <artifactid>sharding-jdbc-config-spring</ artifactid>
            <version>1.4.0</version>
        </dependency>

Second, create a new Sharding-jdbc.xml file, to achieve the configuration of the sub-database sub-table

<?xml version= "1.0" encoding= "UTF-8"?> <beans xmlns= "Http://www.springframework.org/schema/beans" xmlns: Xsi= "Http://www.w3.org/2001/XMLSchema-instance" xmlns:context= "Http://www.springframework.org/schema/context" XM lns:tx= "Http://www.springframework.org/schema/tx" xmlns:rdb= "Http://www.dangdang.com/schema/ddframe/rdb" xsi: schemalocation= "Http://www.springframework.org/schema/beans http://www.springframework.org/schema/ Beans/spring-beans.xsd Http://www.springframework.org/schema/tx http://w 
                        Ww.springframework.org/schema/tx/spring-tx.xsd Http://www.springframework.org/schema/context Http://www.springframework.org/schema/context/spring-context.xsd Http://www.da
                
    Ngdang.com/schema/ddframe/rdb http://www.dangdang.com/schema/ddframe/rdb/rdb.xsd "> <rdb:strategy id= "TablEshardingstrategy "sharding-columns=" user_id "algorithm-class=" Com.meiren.member.common.sharding.MemberSingleKeyTableShardingAlgorithm "/> <rdb:data-source id=" Shardingda
                Tasource "> <rdb:sharding-rule data-sources=" DataSource "> <rdb:table-rules>  <rdb:table-rule logic-table= "Member_index" actual-tables= "member_index_tbl_${[0,1,2,3,4,5,6,7,8,9]}${0..9}" table-strategy= "Tableshardingstrategy"/> <rdb:table-rule logic-table= "Member_details" actual-tables= "Member_details_tbl_${[0,1,2,3,4,5,6,7,8,9]}${0..9}" table-strategy= "Tableshardingstrategy"/> </rdb:tab Le-rules> </rdb:sharding-rule> </rdb:data-source> <

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.