MySQL Horizontal Split (Reading Notes)

Last Update:2015-06-05 Source: Internet

Author: User

Tags database join

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

MySQL Horizontal Split (Reading Notes)
1. Introduction to horizontal split

In general, a simple horizontal split mainly refers to distributing a table with extremely ordinary access to multiple tables according to certain rules of a field. Each table contains part of the data.

To put it simply, we can split the data horizontally to split some rows in the table into a database, some other rows are split into other databases. Of course, in order to be able to easily determine which database each row of data is split into, segmentation always requires certain rules.
For example, the modulo operation is based on a specific number of numeric fields, the range of a certain time type field, or the hash value of a certain character type field. If most of the core tables in the system can be associated using a certain field, this field is naturally the top choice for horizontal partitioning. Of course, if it is very special and cannot be used, you can only select another one.

2. Advantages and Disadvantages of Horizontal Split

Advantages of horizontal split:
◆ Table association can be completed at the database end;
◆ There will be no bottlenecks in tables with large data volumes and high loads;
◆ The overall architecture of the application is relatively small;
◆ Transaction processing is relatively simple;
◆ As long as the splitting rules can be defined, it is basically difficult to meet scalability restrictions;

Disadvantages of horizontal Splitting:
◆ Splitting rules are more complex and it is difficult to abstract a segmentation rule that can satisfy the entire database;
◆ Data maintenance becomes more difficult in the future, making it more difficult to manually locate data;
◆ The modules of the application system have a high degree of coupling, which may cause some difficulties for subsequent data migration and splitting.

3. sharding Rule 3.1 Horizontal sharding Based on Modulus

Generally, Internet companies, especially e-commerce and game businesses, which are currently very popular on the Internet, are able to relate most of their data through member user information, many core tables may be suitable for horizontal data sharding by member IDs. For example, the Forum community discussion system makes it easier to split data horizontally by Forum number.
After splitting, there is basically no interaction between databases.

Therefore, for our example database, most tables can be horizontally split based on the user ID.
Data related to different users is split and stored in different databases. For example, all user IDs are stored in two different databases through 5 modulo. Each table associated with the user ID can be split in this way. In this way, the data related to each user is basically in the same database. Even if it needs to be associated, it can be very simple
Connected.

3.2 horizontal segmentation by region

For example, China is divided into 10 major zones, Jiangsu, Zhejiang, and Shanghai are regarded as one brother, and qilu is regarded as one. The two lakes are regarded as one, the central plains are counted as one, the southwest is counted as one, the Inner Mongolia is, and the Northeast is counted as one, one in the northwest, one in the north, and one in the southeast.

Servers in North China, southeast China, Jiangsu, Zhejiang, and Shanghai regions with relatively large business volumes can allocate a large amount of server resources, such as cpu, io, and network resources, with relatively high-end configurations.

Servers in northwest China, Qilu, lake, and Northeast China with normal business volume can allocate medium and high-end server resources.
If the business volume is relatively small, servers in southwest China, Inner Mongolia, and central plains can be slightly more general servers.

Of course, the division of these resources cannot be made explicit to the outside, so we can consider it well when doing internal planning, so that we will not be criticized for being biased and not paying attention to it.

PS: This division is not qualitative. You can upgrade the resources of good business areas at any time according to the business.

As shown in:

4. integrated view of horizontal split and Application

5. Problems After horizontal split

Before implementing the data splitting scheme, we still need to analyze some possible problems. Generally,
We may encounter the following problems:
◆ Introduce distributed transactions;
◆ Cross-node Join;
◆ Cross-node merge and sorting paging;

5.1 Introduction of distributed transactions

Once the data is split and stored in multiple MySQL servers, no matter how perfect our splitting rules are (in fact there is no perfect splitting rule ), it is possible that the data involved in some previous transactions is no longer in the same MySQL Server.
In such a scenario, if our applications still follow the old solution, distributed transactions are required. In MySQL versions, only versions after MySQL 5.0 support distributed transactions. Currently, only Innodb supports distributed transactions. In addition, even if we use the MySQL version that supports distributed transactions and the Innodb Storage engine, distributed transactions consume a lot of system resources, the performance itself is not too high. In addition, the introduction of distributed transactions will bring more difficult factors for exception handling.
What should I do? In fact, we can solve this problem through a work und. The first thing to consider is: is the database the only one that can solve the transaction? In fact, this is not the case. We can work together with databases and applications. Each database solves the transactions on its own, and then controls the transactions on multiple databases through applications.
That is to say, as long as we are willing, we can split a distributed transaction across multiple databases into multiple small transactions that are only on a single database, and control each small transaction through the application. Of course, the requirement for this is that our Russian applications must be robust enough, and of course it will bring some technical difficulties to the applications.

5.2 cross-node Join

The preceding section describes the possible introduction of distributed transactions. Now let's look at the cross-node Join problem. After data splitting, some old Join statements may fail to be used, because the data source used for Join may be split into multiple MySQL servers.
What should I do? From the MySQL database perspective, if you have to solve this problem directly on the Database End, I am afraid it can only be solved through a special storage engine Federated of MySQL. The Federated storage engine is a solution for MySQL to solve problems similar to Oracle's DB Link. The main difference with OracleDB Link is that Federated stores a definition of the remote table structure locally. At first glance, Federated is indeed a very good solution for cross-node Join. However, we should also be clear that if the remote table structure is changed, the local table definition information will not change accordingly. If the local Federated table definition information is not updated when the remote table structure is updated, the Query operation may fail and the correct results may not be obtained.
To solve such problems, we recommend that you use an application to obtain the corresponding driver result set in the MySQL Server where the driver table is located, then, the data is retrieved from the MySQL Server where the driver table is located based on the driver result set. Many readers may think that this will have a certain impact on the performance. Yes, it does have a certain negative impact on the performance, but in addition to this method, basically, there are not many other better solutions. In addition, since the database has been well expanded, the load of each MySQL Server can be well controlled, simply for a single Query, the response time may be higher than before splitting, so the negative impact of performance is not too great. What's more, there are not many requirements for cross-node Join, which may be a small part of the overall performance. Therefore, for the sake of overall performance, it is worthwhile to sacrifice a little bit occasionally. After all, system optimization itself is a process of many trade-offs and balances.

5.3 cross-node Merge Sorting Paging

After the data is horizontally split, not only cross-node Join operations may fail, but some data sources of Query statements sorted by pages may also be split to multiple nodes, the direct consequence is that the sorting paging Query cannot continue to run normally. In fact, this is the same as cross-node Join. The data source exists on multiple nodes. To solve this problem through a Query, it is the same operation as cross-node Join. Federated can also be partially resolved, and of course there are also risks.
What should I do with the same problem? I also continue to suggest using applications.
How can this problem be solved? The solution is basically similar to the solution of cross-node Join, but one thing is different from that of Cross-node Join. Join often has a drive-driven relationship, therefore, Data Reading between multiple tables involved in the Join operation usually has an ordered relationship. However, sorting pages are not the same. The data source of sorting pages is basically a table (or a result set) and there is no sequential relationship, therefore, the process of retrieving data from multiple data sources can be completely parallel. In this way, the efficiency of retrieving the paging data is higher than that of Cross-database Join, resulting in a lower performance loss, in some cases, it may be more efficient than in databases where data is not split. Of course, whether it is cross-node Join or cross-node sorting paging, our application server will consume more resources, especially memory resources, because we need to process more data than the original process in the Process of reading access and merging result sets.

After analysis, many of my friends may find that all of the above problems are basically solved through applications. Everyone may be confused, isn't it because I am a DBA, so I threw a lot of things to the application architects and developers?

In fact, this is not the case at all. First, because of its particularity, the application can easily achieve good scalability, but the database is different. It must be expanded in many other ways, in addition, it is difficult to avoid some problems that can be solved in a centralized database but are split into a database cluster. To maximize the overall expansion of the system, we can only let the application do more things to solve the problem that the database cluster cannot better solve.

5.4 summary:

Data splitting technology is used to split a large MySQL Server into multiple small MySQL servers, which not only solves the write performance bottleneck problem, it also improves the scalability of the entire database cluster. Both vertical and horizontal splitting make the system less likely to encounter bottlenecks. Especially when we use the vertical and horizontal splitting methods, we will not encounter expansion bottlenecks theoretically.

6. Case study 6.1 create three database instances

Create multiple instances:

6.2 create databases, tables, and users

Create a database table

create database `hwdb` /*!40100 default characterset utf8 */;create table uc_user(user_id bigint primarykey, uc_name varchar(200), created_time datetime) engine=innodb charset utf8;

Create user

grant insert,update,delete,select on hwdb.*to tim@'192.168.%' identified by 'timgood2013';

Execution Process:

mysql> create table uc_user(user_idbigint primary key, uc_name varchar(200), created_time datetime) engine=innodbcharset utf8;Query OK, 0 rows affected (0.53 sec)mysql>mysql> grant insert,update,delete,selecton hwdb.* to tim@'192.168.%' identified by 'timgood2013';Query OK, 0 rows affected (0.03 sec)mysql>

6.3 create a java code example

Package mysql; import java. math. bigInteger; import java. SQL. connection; import java. SQL. driverManager; import java. SQL. preparedStatement; import java. SQL. resultSet; import java. SQL. SQLException; import java. SQL. statement; import java. util. calendar; public classMySQLTest {public static void main (String [] args) {MySQLTestmt = newMySQLTest (); // BigIntegerbi = newBigInteger ("2015053010401005"); Stringport = mt. g EtDBPort (bi. longValue (); Connection conn = mt. getConn (port); mt. insert (conn, bi, "tim --" + bi. longValue ();} // obtain the database port to be accessed public String getDBPort (long user_id) {Stringport = "3307"; long v_cast = user_id % 3; if (v_cast = 1) {port = "3308";} else if (v_cast = 2) {port = "3309" ;}else {port = "3307 ";} return port;} // obtain the database connection. If it is extended, You can independently provide an interface to the programmer to call it public ConnectiongetConn (String port) {Connectionconn = Null; try {Class. forName ("com. mysql. jdbc. driver ");} catch (ClassNotFoundException e) {// TODO Auto-generated catch block e. printStackTrace ();} Stringurl = "jdbc: mysql: // 192.168.52.130:" + port + "/hwdb"; try {conn = DriverManager. getConnection (url, "tim", "timgood2013");} catch (SQLException e) {// TODO Auto-generated catch block e. printStackTrace ();} System. out. println ("the current db is:" + url ); Return conn;} // obtain the date string public StringgetTimeByCalendar () {Calendar ar cal = Calendar ar. getInstance (); int year = cal. get (Calendar. YEAR); // obtain the YEAR int month = cal. get (Calendar. MONTH); // obtain the MONTH int day = cal. get (Calendar. DATE); // get the day int hour = cal. get (Calendar. HOUR); // HOUR int minute = cal. get (Calendar. MINUTE); // MINUTE int second = cal. get (Calendar. SECOND); // SECOND String strdate = year + "-" + month + "-" + day + "" + hour + ":" + minute + ":" + sec Ond; return strdate;} // start to input public int insert (Connectioncnn, BigInteger user_id, String name) {Stringsql = "insert into hwdb. uc_user (user_id, uc_name, created_time) values (?,?,?) "; Int I = 0; long uid = user_id.longValue (); Connectionconn = cnn; try {PreparedStatement preStmt = conn. prepareStatement (SQL); preStmt. setLong (1, uid); preStmt. setString (2, name); preStmt. setString (3, getTimeByCalendar (); i1_prestmt.exe cuteUpdate ();} catch (SQLException e) {e. printStackTrace () ;}return I; // return the number of affected rows. 1 indicates execution successful }}

6.4 test code

User_id is based on the Registration year, month, day, hour, minute, second, and more than 9999. The idea is that one second can satisfy 9999 concurrent requests, and no. You can set a static global variable for how to uniformly plan the 9999 global records, in addition, this global variable will be saved to a database in time, which basically ensures that there are no duplicates, such as user_id: 2015053010401005, 2015053010401006, 2015053010401007. The test code is as follows:

        MySQLTestmt=newMySQLTest();        //        BigIntegerbi = newBigInteger("2015053010401005");        Stringport=mt.getDBPort(bi.longValue());         Connection conn=mt.getConn(port);        mt.insert(conn,bi, "tim--"+bi.longValue());

1) Get the DB connection through % 3;

More than 0 --> db1 (Port 3307) more than 1 --> db2 (Port 3308) more than 2 --> db3 (Port 3309)

2) view results

After running, you can view the entered data in three instances as follows:
The Id 2015053010401005 should be input to db1 (Port 3307)

[root@data02 ~]# mysql--socket=/usr/local/mysql3307/mysql.sock -e "select * fromhwdb.uc_user;";+------------------+-----------------------+---------------------+| user_id          | uc_name               | created_time        |+------------------+-----------------------+---------------------+| 2015053010401005 | tim--2015053010401005 |2015-04-30 09:27:48 |+------------------+-----------------------+---------------------+[root@data02 ~]#

The Id 2015053010401006 should be input to db2 (Port 3308)

[root@data02 ~]# mysql--socket=/usr/local/mysql3308/mysql.sock -e "select * fromhwdb.uc_user;";+------------------+-----------------------+---------------------+| user_id          | uc_name               | created_time        |+------------------+-----------------------+---------------------+| 2015053010401006 | tim--2015053010401006 |2015-04-30 09:27:57 |+------------------+-----------------------+---------------------+[root@data02 ~]#

Enter the Id 2015053010401007 to db3 (Port 3309)

[root@data02 ~]# mysql--socket=/usr/local/mysql3309/mysql.sock -e "select * fromhwdb.uc_user;";+------------------+-----------------------+---------------------+| user_id          | uc_name               | created_time        |+------------------+-----------------------+---------------------+| 2015053010401007 | tim--2015053010401007| 2015-04-30 09:28:01 |+------------------+-----------------------+---------------------+[root@data02 ~]#

6.5 conclusion

Basically the same as expected, the data is split horizontally into different databases by using the method of MOD id to obtain the remainder. Here is a simple demonstration, the actual production is much more complex than this, so there will be more problems, but the concept of horizontal split is similar. This road is bright, you can rest assured to proceed.

Reference document: MySQL performance tuning and Architecture Design PDF

This article permanently updates the link address:

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More