MySQL Horizontal Split (Reading Notes) and mysql Reading Notes

Source: Internet
Author: User
Tags database join table definition

MySQL Horizontal Split (Reading Notes) and mysql Reading Notes
1. Introduction to horizontal split

In general, a simple horizontal split mainly refers to distributing a table with extremely ordinary access to multiple tables according to certain rules of a field. Each table contains part of the data.

To put it simply, we can understand the horizontal split of data as sharding by data rows, that is, some
The rows are split into one database, and some other rows are split into other databases. Of course, to make it easier
To determine the database to which each row of data is split, the split always needs to follow a specific rule.
For example, the modulo operation is based on a specific number of numeric fields, the range of a certain time type field, or a character class.
Type field hash value. If most of the core tables in the system can be associated using a certain field
A field is naturally the top choice for a horizontal partition. Of course, you can only select another field if it is very special and cannot be used.

2. Advantages and Disadvantages of Horizontal Split

Advantages of horizontal split:
◆ Table association can be completed at the database end;
◆ There will be no bottlenecks in tables with large data volumes and high loads;
◆ The overall architecture of the application is relatively small;
◆ Transaction processing is relatively simple;
◆ As long as the splitting rules can be defined, it is basically difficult to meet scalability restrictions;

Disadvantages of horizontal Splitting:
◆ Splitting rules are more complex and it is difficult to abstract a segmentation rule that can satisfy the entire database;
◆ Data maintenance becomes more difficult in the future, making it more difficult to manually locate data;
◆ The modules of the application system have a high degree of coupling, which may cause some difficulties for subsequent data migration and splitting.

<All Rights Reserved. This document can be reprinted, but the source address must be indicated by link. Otherwise, we will be held legally responsible.>
Original blog address: http://blog.csdn.net/mchdba/article/details/46278687
Original Author: Huang Shan (mchdba)


3. sharding Rule 3.1 Horizontal sharding Based on Modulus

Generally, Internet companies, especially e-commerce and game businesses, which are currently very popular on the Internet, are able to access most of the data.
In association with member user information, many core tables may be very suitable for horizontal data sharding using Member IDs.
For example, the Forum community discussion system makes it easier to split data horizontally by Forum number.
After splitting, there is basically no interaction between databases.

Therefore, for our example database, most tables can be horizontally split based on the user ID.
Data related to different users is split and stored in different databases. For example, all user IDs are modeled through 5
Then they are stored in two different databases. Each table associated with the user ID can be split in this way. In this way,
Basically, the data related to each user is stored in the same database. Even if you need to associate the data, it can be very simple.
Connected.

3.2 horizontal segmentation by region

For example, China is divided into 10 major zones, Jiangsu, Zhejiang, and Shanghai are regarded as one brother, and qilu is regarded as one. The two lakes are regarded as one, the central plains are counted as one, the southwest is counted as one, the Inner Mongolia is, and the Northeast is counted as one, one in the northwest, one in the north, and one in the southeast.

Servers in North China, southeast China, Jiangsu, Zhejiang, and Shanghai regions with relatively large business volumes can allocate a large amount of server resources, such as cpu, io, and network resources, with relatively high-end configurations.

Servers in northwest China, Qilu, lake, and Northeast China with normal business volume can allocate medium and high-end server resources.
If the business volume is relatively small, servers in southwest China, Inner Mongolia, and central plains can be slightly more general servers.

Of course, the division of these resources cannot be made explicit to the outside, so we can consider it well when doing internal planning, so that we will not be criticized for being biased and not paying attention to it.

PS: This division is not qualitative. You can upgrade the resources of good business areas at any time according to the business.

As shown in:


4. integrated view of horizontal split and Application

5. Problems After horizontal split

Before implementing the data splitting scheme, we still need to analyze some possible problems. Generally,
We may encounter the following problems:
◆ Introduce distributed transactions;
◆ Cross-node Join;
◆ Cross-node merge and sorting paging;

5.1 Introduction of distributed transactions

Once the data is split and stored in multiple MySQL servers
(In fact, there is no perfect splitting rule), which may be involved in some previous transactions.
The data is no longer in the same MySQL Server.
In such a scenario, if our applications still follow the old solution, distribution is required.
To solve the problem. In MySQL versions
Distributed transactions are supported, and currently only Innodb supports distributed transactions. Not only that, even if we just
The MySQL version that supports distributed transactions is used, and the Innodb Storage engine and distributed transactions are also used.
The consumption of system resources is very large, and the performance itself is not too high. In addition, distributed transactions are introduced in
Exception Handling brings about more difficult factors to control.
What should I do? In fact, we can solve this problem through a work und. The first thing we need to consider is
The thing is: is the database the only place that can solve the transaction? This is not the case. We can
It can be solved together with databases and applications. Each database solves its own transactions, and then
Use a program to control transactions in multiple databases.
That is to say, we can split a distributed transaction across multiple databases into multiple
Small transactions on a single database, and the application to control each small transaction. Of course, this requirement is
Our Russian applications must be robust enough, which of course brings some technical difficulties to the applications.

5.2 cross-node Join

The preceding section describes the possible introduction of distributed transactions. Now let's look at the cross-node Join problem. Quantity
After data splitting, some old Join statements may not be available, because the data source used for Join may
Split into multiple MySQL servers.
What should I do? From the MySQL database perspective, if you have to solve this problem directly on the Database End,
I'm afraid it can only be solved through a special storage engine Federated of MySQL. Federated storage engine is
MySQL provides a solution similar to Oracle's DB Link. And OracleDB Link
The difference is that Federated stores the definition information of a remote table structure locally. At first glance, Federated does
Is a very good solution for cross-node Join. But we should also be clear that if the remote table
When the structure changes, the local table definition information will not change accordingly. If
If the local Federated table definition information is not updated, the Query operation may fail.
Correct results.
To deal with this type of problem, we recommend that you use an application to handle the problem, first in the MySQL Server where the driver table is located.
And then extract the corresponding driver result set to the MySQL Server where the driver table is located.
Corresponding data. Many readers may think that this will have a certain impact on performance.
Performance has a certain negative impact, but there are basically not many other better solutions except this method. Besides
After the database is expanded, the load of each MySQL Server can be well controlled.
For a single Query, the response time may be higher than before splitting, so the performance is negative.
The impact is not too big. Furthermore, similar requirements for cross-node Join are not too many.
Overall performance may be a small part. Therefore, for the sake of overall performance, the occasional sacrifice
Point is actually worth it. After all, system optimization itself is a process of many trade-offs and balances.

5.3 cross-node Merge Sorting Paging

After the data is horizontally split, not only cross-node Join operations may fail
The data sources of Query statements sorted by pages may also be split into multiple nodes.
These sorting paging queries cannot continue to run normally. In fact, this is the same as cross-node Join. The data source exists.
On multiple nodes, a Query is used to solve the problem, which is the same as cross-node Join. Similarly
Federated can also be partially solved, and of course there are also risks.
What should I do with the same problem? I also continue to suggest using applications.
How can this problem be solved? The solution is basically similar to the Cross-node Join solution, but it is similar to the Cross-node Join solution.
Not quite the same. In most cases, Join has a drive-driven relationship. Therefore, Join involves multiple tables.
There is usually an ordered relationship between reading data. However, sorting pages are not the same.
The source database can be said to be a table (or a result set), and there is no sequential relationship in itself.
Data sources can be retrieved in parallel. In this way, we can compare the efficiency of sorting the number of paging data
Database Join is higher, so the performance loss is relatively lower. In some cases, it may be worse than the original data.
Split databases are more efficient. Of course, whether it is cross-node Join or cross-node sorting paging, it will make us
The application server consumes more resources, especially memory resources, because we are reading and merging
Processes need to process more data than the original.

After analysis, many of my friends may find that all of the above problems are basically solved through applications. Everyone may be confused, isn't it because I am a DBA, so I threw a lot of things to the application architects and developers?

In fact, this is not the case at all. First, because of its particularity, the application can easily achieve good scalability,
It is different from databases. It must be expanded in many other ways, and it is difficult
This avoids some problems that can be solved in a centralized database but become one after being split into a database cluster.
Difficulties. To maximize the overall expansion of the system, we can only let the application do more things,
To solve the problem that the database cluster cannot better solve.

5.4 summary:

Data splitting technology is used to split a large MySQL Server into multiple small MySQL servers.
It also improves the scalability of the entire database cluster. Whether it is through vertical segmentation,
Or horizontal segmentation, which makes the system more likely to encounter bottlenecks. Especially when we use vertical and horizontal integration
In theory, there will be no expansion bottleneck.

6. Case study 6.1 create three database instances

Create multi-instance participation: http://blog.csdn.net/mchdba/article/details/45798139

6.2 create databases, tables, and users

Create a database table

create database `hwdb` /*!40100 default characterset utf8 */;create table uc_user(user_id bigint primarykey, uc_name varchar(200), created_time datetime) engine=innodb charset utf8;

Create user

grant insert,update,delete,select on hwdb.*to tim@'192.168.%' identified by 'timgood2013';

Execution Process:

mysql> create table uc_user(user_idbigint primary key, uc_name varchar(200), created_time datetime) engine=innodbcharset utf8;Query OK, 0 rows affected (0.53 sec)mysql>mysql> grant insert,update,delete,selecton hwdb.* to tim@'192.168.%' identified by 'timgood2013';Query OK, 0 rows affected (0.03 sec)mysql>


6.3 create a java code example
Package mysql; import java. math. bigInteger; import java. SQL. connection; import java. SQL. driverManager; import java. SQL. preparedStatement; import java. SQL. resultSet; import java. SQL. SQLException; import java. SQL. statement; import java. util. calendar; public classMySQLTest {public static void main (String [] args) {MySQLTestmt = newMySQLTest (); // BigIntegerbi = newBigInteger ("2015053010401005"); Stringport = mt. g EtDBPort (bi. longValue (); Connection conn = mt. getConn (port); mt. insert (conn, bi, "tim --" + bi. longValue ();} // obtain the database port to be accessed public String getDBPort (long user_id) {Stringport = "3307"; long v_cast = user_id % 3; if (v_cast = 1) {port = "3308";} else if (v_cast = 2) {port = "3309" ;}else {port = "3307 ";} return port;} // obtain the database connection. If it is extended, You can independently provide an interface to the programmer to call it public ConnectiongetConn (String port) {Connectionconn = Null; try {Class. forName ("com. mysql. jdbc. driver ");} catch (ClassNotFoundException e) {// TODO Auto-generated catch block e. printStackTrace ();} Stringurl = "jdbc: mysql: // 192.168.52.130:" + port + "/hwdb"; try {conn = DriverManager. getConnection (url, "tim", "timgood2013");} catch (SQLException e) {// TODO Auto-generated catch block e. printStackTrace ();} System. out. println ("the current db is:" + url ); Return conn;} // obtain the date string public StringgetTimeByCalendar () {Calendar ar cal = Calendar ar. getInstance (); int year = cal. get (Calendar. YEAR); // obtain the YEAR int month = cal. get (Calendar. MONTH); // obtain the MONTH int day = cal. get (Calendar. DATE); // get the day int hour = cal. get (Calendar. HOUR); // HOUR int minute = cal. get (Calendar. MINUTE); // MINUTE int second = cal. get (Calendar. SECOND); // SECOND String strdate = year + "-" + month + "-" + day + "" + hour + ":" + minute + ":" + sec Ond; return strdate;} // start to input public int insert (Connectioncnn, BigInteger user_id, String name) {Stringsql = "insert into hwdb. uc_user (user_id, uc_name, created_time) values (?,?,?) "; Int I = 0; long uid = user_id.longValue (); Connectionconn = cnn; try {PreparedStatement preStmt = conn. prepareStatement (SQL); preStmt. setLong (1, uid); preStmt. setString (2, name); preStmt. setString (3, getTimeByCalendar (); i1_prestmt.exe cuteUpdate ();} catch (SQLException e) {e. printStackTrace () ;}return I; // return the number of affected rows. 1 indicates execution successful }}


6.4 test code

User_id is based on the Registration year, month, day, hour, minute, second, and more than 9999. The idea is that one second can satisfy 9999 concurrent requests, and no. You can set a static global variable for how to uniformly plan the 9999 global records, in addition, this global variable will be saved to a database in time, which basically ensures that there are no duplicates, such as user_id: 2015053010401005, 2015053010401006, 2015053010401007. The test code is as follows:

        MySQLTestmt=newMySQLTest();        //        BigIntegerbi = newBigInteger("2015053010401005");        Stringport=mt.getDBPort(bi.longValue());         Connection conn=mt.getConn(port);        mt.insert(conn,bi, "tim--"+bi.longValue());
1) Get the DB connection through % 3;
More than 0 --> db1 (Port 3307) more than 1 --> db2 (Port 3308) more than 2 --> db3 (Port 3309)
2) view results

After running, you can view the entered data in three instances as follows:
The Id 2015053010401005 should be input to db1 (Port 3307)

[root@data02 ~]# mysql--socket=/usr/local/mysql3307/mysql.sock -e "select * fromhwdb.uc_user;";+------------------+-----------------------+---------------------+| user_id          | uc_name               | created_time        |+------------------+-----------------------+---------------------+| 2015053010401005 | tim--2015053010401005 |2015-04-30 09:27:48 |+------------------+-----------------------+---------------------+[root@data02 ~]#

The Id 2015053010401006 should be input to db2 (Port 3308)

[root@data02 ~]# mysql--socket=/usr/local/mysql3308/mysql.sock -e "select * fromhwdb.uc_user;";+------------------+-----------------------+---------------------+| user_id          | uc_name               | created_time        |+------------------+-----------------------+---------------------+| 2015053010401006 | tim--2015053010401006 |2015-04-30 09:27:57 |+------------------+-----------------------+---------------------+[root@data02 ~]#


Enter the Id 2015053010401007 to db3 (Port 3309)

[root@data02 ~]# mysql--socket=/usr/local/mysql3309/mysql.sock -e "select * fromhwdb.uc_user;";+------------------+-----------------------+---------------------+| user_id          | uc_name               | created_time        |+------------------+-----------------------+---------------------+| 2015053010401007 | tim--2015053010401007| 2015-04-30 09:28:01 |+------------------+-----------------------+---------------------+[root@data02 ~]#
6.5 conclusion

Basically the same as expected, the data is split horizontally into different databases by using the method of MOD id to obtain the remainder. Here is a simple demonstration, the actual production is much more complex than this, so there will be more problems, but the concept of horizontal split is similar. This road is bright, you can rest assured to proceed.

Reference: MySQL performance tuning and Architecture Design

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.