Using SHARDING-JDBC to realize the sub-table

Last Update:2016-06-16 Source: Internet

Author: User

Tags cpu usage

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Your team used the SPRINGMVC+SPRING+JPA framework to quickly develop a NB system, the customer orders after the launch of the same as snowflakes.

Slowly, your mood starts to get worse, as customers and products complain more and more frequently, and one of the biggest complaints is that the system is getting slower.

1 General Optimizations

You organize your team and perform a series of optimizations.

1.1 Data Table Index optimization

After a preliminary analysis, the bottleneck is found in the database. The CPU of the Web server is idle, but the CPU usage of the database server is not high.

Then, come to the DBA colleague of the schema group, monitor the database access, sort out the time-consuming SQL, and perform the SQL query analysis. The data table index is re-organized according to the analysis results. The parameter setting of the database itself is also optimized.

After the optimization, the page speed significantly increased, customer complaints decreased, and after a period of comfort days.

1.2多-point deployment + load balancing

Slowly, the access speed is not good, this time the Web server pressure is very large, the database server is relatively idle. After analysis, it is found that the number of concurrent users is too many, the single Web server can not support so many concurrent requests.

As a result, the architecture assists with web multipoint deployment, and the front end uses Nginx for load distribution. One problem that must be addressed at this time is the problem of user session retention. There are several different solutions to this:

1, nginx implementation sticky distribution

Because nginx default does not have sticky mechanism, can use Ip_hash way to replace.

2. Configuring Tomcat for Session Replication

3, the code uses Springsession, uses the Redis realization session copy.

The specific approach is not introduced. The method of using Springsession, can refer to my article "cluster environment CAS problem and solution".

2 Try when the sharding JDBC framework

After the multi-point deployment, the system has been running for a period of time, adding more web nodes, and basically responding to customer needs. Slowly, increasing the Web server does not solve the problem because the system bottleneck is back to the database server. SQL execution time is getting longer and can not be optimized. The reason is also very simple, the amount of data is too large.

Single-table data has been more than tens of millions of lines, through the database optimization has not met the requirements of speed. The sub-list mentioned on the schedule must be resolved.

Because JPA is used, if the sub-database table needs to make a large change to the data access layer, the workload is too large, the risk of modification is too high. Just see when open source has its SHARDING-JDBC component, excerpt from an introduction:

Https://github.com/dangdangdotcom/sharding-jdbc

Sharding-jdbc Direct encapsulation of the JDBC API , which can be understood as an enhanced version of the JDBC Drive, the cost of the old code migration is almost zero:

can be applied to any Java-based the ORM frameworks such as: JPA, Hibernate, Mybatis, Spring JDBC Template or use JDBC directly .
can be based on any third-party database connection pool, such as: DBCP, C3P0, BONECP, Druid and so on.
can theoretically support arbitrary implementation of JDBC the canonical database. Although only MySQL is currently supported, there areplans to support databases such as Oracle, SQL Server, DB2 , and so on.

It supports JPA and enables the implementation of a sub-library table with little or no code modification. Therefore, choose this framework to do an attempt at a sub-database table.

First do a simple trial, do not do the sub-Library, only to do the table. Select the data table Operate_history, which records all the operating history and is the largest data table in the whole system.

I would like to split this table into four data tables, Operate_history_0operate_history_1 operate_history_2 operate_history_3, respectively. Data can be allocated to four data tables, reducing the amount of data in a single table. At the same time, in order to minimize the cross-table query operation, the decision to use the field Entity_key to the table basis, so that all operations of the same entity object will be recorded in the same data table. The data table structure after splitting is:

3 implementation Process

The following is the modification process for JPA projects. For other projects, please refer to the official website documentation.

3.1 Modify Pom.xml Add dependency

You need to add two Jar,sharding-jdbc-core and sharding-jdbc-config-spring.

<groupId>com.dangdang</groupId>

<artifactId>sharding-jdbc-core</artifactId>

</dependency>

<groupId>com.dangdang</groupId>

<artifactId>sharding-jdbc-config-spring</artifactId>

</dependency>

3.2 Modifying the configuration of the database section in spring

Original Database Configuration

<bean id= "DataSource" class= "Org.apache.tomcat.jdbc.pool.DataSource" destroy-method= "Close" >

<propertyname= "Driverclassname" value= "Com.mysql.jdbc.Driver" ></property>

<propertyname= "url" value= "jdbc:mysql://localhost:3306/sharding" ></property>

<propertyname= "username" value= "root" ></property>

<propertyname= "Password" value= "sharding" ></property>

</bean>

The modified configuration

<beanid= "db-node-0" class= "Org.apache.tomcat.jdbc.pool.DataSource" destroy-method= "Close" >

</bean>

<rdb:strategyid= "Historytablestrategy"

Sharding-columns= "Entity_key"

algorithm-class= "Cn.codestory.sharding.SingleKeyTableShardingAlgorithm"/>

<rdb:data-sourceid= "DataSource" >

<rdb:sharding-ruledata-sources= "db-node-0" default-data-source= "db-node-0" >

<rdb:table-rules>

<rdb:table-rulelogic-table= "Operate_history"

Actual-tables= "Operate_history_0,operate_history_1,operate_history_2,operate_history_3"

table-strategy= "Historytablestrategy"/>

</rdb:table-rules>

</rdb:sharding-rule>

</rdb:data-source>

3.3 Writing class Singlekeytableshardingalgorithm

This class is used to determine the name of the table used based on the Entity_key value. Refer to the sample code provided by Sharding for modification. The core code is as follows

Publiccollection<string> doinsharding (

Collection<string>availabletargetnames,

Shardingvalue<long>shardingvalue) {

int targetcount = Availabletargetnames.size ();

collection<string> result = newlinkedhashset<> (Targetcount);

Collection<long> values =shardingvalue.getvalues ();

for (Long value:values) {

for (String tablenames:availabletargetnames) {

if (Tablenames.endswith (value% targetcount+ "")) {

Result.add (Tablenames);

}

return result;

}

This is a simple implementation, the Entity_key is modeled, with the remainder to determine the data table name.

3.4 Modifying the primary key generation method

The data table primary key cannot be generated using the Identify method because the data table is saved. If the primary key is a string type, you might consider using the UUID generation method, but its query efficiency will be relatively low.

If you use a long primary key, you can use other methods to ensure that the primary key in each child table is not duplicated.

3.5 Processing of historical data

According to the rules of the data table, the data of the original packet needs to be migrated and moved to four data tables respectively. If you do not do this, or the data is migrated to the wrong data table, the data will not be queried later.

At this point, the modification of the project is basically complete, restart the project and increase the operate_history data, you will see the newly added data, according to our table rules, inserted into a data table. When querying, it is possible to query data from multiple actual data tables at the same time.

4 Some considerations of the data sub-table rules

The previous example shows a table based on Entity_key, or a table with other fields such as a primary key. Here are some of the table rules that I think of:

Assign based on primary key

In this way, the most average allocation method is achieved, and each new data generation is saved to the next data table in turn.

Assign based on user ID

This approach ensures that all data from the same user is kept in the same data table. If you frequently query data by User ID, this is a relatively economical approach.

Assign based on the value of a foreign key

This is the approach used in the previous example, as this data can often be queried against this foreign key.

Allocate according to TIME

It is suitable for some data which is frequently queried by time period, and the data in one time period is saved in the same data table. For example, the order system, the default query data within one months.

Using SHARDING-JDBC to realize the sub-table

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More