MySQL Horizontal Split (reading notes collation)

Source: Internet
Author: User
Tags db2 joins table definition

1, the introduction of horizontal splitting

In general, simple horizontal slicing is the main way to spread an extremely mundane table into multiple tables, with a subset of the data in each table, according to some rule of a field.

In simple terms, we can interpret the horizontal segmentation of the data as a segmentation of the data rows, that is, some of the tables in the
Rows are sliced into one database, and some other rows are sliced into other databases. Of course, in order to be easier
Determines which database the rows of data are being sliced into, and the Shard always needs to follow a specific rule.
If a number Type field is based on a specific number of modulo, the range of a Time Type field, or a character class
The hash value of the Type field. If most of the core tables in the system can be associated with a field, the
The field is naturally a great choice for horizontal partitioning, and of course, it is very special to use.

2, the pros and cons of horizontal splitting

Advantages of horizontal splitting:
Table Association is basically able to complete on the database side;
There is no problem that some very large data volumes and high-load tables encounter bottlenecks;
Application-side Overall schema changes are relatively small;
Transaction processing is relatively simple;
As long as the segmentation rules can be defined, it is basically more difficult to meet the extensibility limit;

Disadvantages of horizontal slicing:
The segmentation rules are relatively more complex, and it is difficult to abstract a segmentation rule that satisfies the whole database.
Later data maintenance difficulty has increased, manual positioning of data is more difficult;
The coupling degree of each module in the application system is high, which may cause some difficulties in the migration and splitting of the later data.

< copyright, the article allows reprint, but must be linked to the source address, otherwise investigate legal liability!>
Original Blog Address: http://blog.csdn.net/mchdba/article/details/46278687
Hara Douglas Fir (MCHDBA)


3, split rule 3.1 is split according to the modulus level

In general, internet companies, especially e-commerce and gaming businesses, which are now very popular with the Internet, are largely capable of
Member User Information Association, it is possible that many core tables are well suited for horizontal segmentation of data through membership IDs.
And like the Forum community discussion system, it is easier to slice, it is very easy to follow the forum number for the horizontal segmentation of data.
After slicing, there is basically no interaction between the libraries.

So, for our sample database, most of the tables can be sliced horizontally based on the user ID.
Different user-related data are sliced and stored in different databases. If all user IDs are modulo 5
These are then stored in two different databases. Each table that is associated with a user ID can be sliced this way. Such
Basically, each user-related data, in the same database, even if the need to associate, it can be very simple to close
The joint.

3.2 Horizontal Segmentation by region

For example, the country divided into 10 large areas, Jiangsu, Zhejiang and Shanghai, a brother, Qilu calculate a, Guangxi calculate a, two lakes calculate one, the Central Plains calculate one, southwest counts one, Inner Mongolia one, northeast one, northwest one, north China one, southeast one.

In the large business volume of North China, southeast, Jiangsu, Zhejiang and Shanghai, Guangxi Zone Server can allocate more server resources, such as CPU, IO, network and so on can be used better high-end configuration.

In the traffic normal northwest, Qilu, two lakes, northeast of the server can allocate high-end server resources.
In the business volume is relatively small, southwest, Inner Mongolia, central plains of the server can be slightly general server.

Of course, these resources can not be divided into explicit, we do in the internal planning of the time to consider good, so as to avoid being criticized to say something biased and so on.

PS: This division is not qualitative, according to the business can be at any time to the business of a good slice of the resource upgrade.

as shown in the following:


4. Integrated view of horizontal split and application

5, horizontal splitting of subsequent issues

There are some possible problems that we need to do before we can implement a data-splitting scheme. Generally speaking
some of the issues that we may encounter are the following:
The problem of introducing distributed transaction;
Cross-node join problem;
Cross-node merge sorting paging problem;

5.1 Problems with the introduction of distributed transactions

Once the data is sliced and stored in multiple MySQL servers, regardless of our segmentation rule design
The perfect (and there is no perfect segmentation rule) that could have caused some of the previous firms to be involved in
The data is not already in the same MySQL Server.
In such a scenario, if our application still follows the old solution, then it is bound to introduce a distribution
Type of transaction to resolve. In each version of MySQL, only the versions from MySQL 5.0 began to
Distributed transactions provide support, and only INNODB provides distributed transaction support at this time. Not only that, even though we just
Good use of the MySQL version supporting distributed transactions, but also the use of the InnoDB storage engine, distributed transactions
Its own consumption of system resources is very large, the performance itself is not too high. And the introduction of the distributed transaction itself
Exception handling will bring more difficult to control factors.
What to do? In fact, we can solve this problem by a flexible way, first of all to consider a
The thing is: is the database the only place where you can solve a transaction? In fact, this is not the case, we can completely
Together with both the database and the application. Each database resolves its own transactions, and then
Use a program to control transactions above multiple databases.
That is, if we want to, we can split a distributed transaction across multiple databases into multiple
Small transactions on a single database, and the application is used to master small transactions. Of course, the requirement for this is
Our Russian applications have to be robust enough and, of course, bring some technical difficulties to the application.

5.2 Issues with cross-node joins

The above describes the possible introduction of distributed transactions, and now let's look at the problem of cross-node join. Number
After slicing, it may cause some old join statements to be unusable because the data source used by the join May
have been sliced into multiple MySQL servers.
What to do? This problem from the MySQL database point of view, if it has to be directly resolved on the database side,
I'm afraid it can only be solved by MySQL, a special storage engine, federated. The Federated storage Engine is
MySQL solves solutions like Oracle's DB Link issues. And OracleDB Link's main
The difference is that federated will save a copy of the remote table structure's definition information locally. At first glance, Federated really
is a very good solution for cross-node joins. But we should also be clear that it seems that if the remote table
The structure is changed, and the local table definition information is not changed accordingly. If the remote table structure is updated
When the local federated table definition information is not updated, it is likely to cause query to run out of error and not get
The right results.
To deal with this kind of problem, I still recommend using the application for processing, first in the MySQL Server where the driver table is located
Drive result set and then take it out of the MySQL Server where the driver table is located.
The corresponding data. Many readers may think that this will have a certain impact on performance, yes, it does
Performance has a certain negative impact, but in addition to this method, there are basically not too many other better solutions. Moreover, by
After the database through a good extension, each MySQL Server load can be better controlled, simple needle
For a single query, the response time may be higher than the non-segmentation, so the performance of the negative
The impact of the surface is not too large. What's more, similar to the need for cross-node joins is not too much, relative to
Overall performance is probably just a small part of it. So for the sake of overall performance, the occasional sacrifice.
Point, in fact, is worth it, after all, the system optimization itself is a lot of trade-offs and balance process.

5.3 Cross-node merge sort paging problem

Once the data has been sliced horizontally, it is possible that not only the cross-node join does not work,
The data source for some sort of paged query statement may also be sliced to multiple nodes, which is the direct result of
These sort paged query cannot continue to function correctly. In fact, this and cross-node join is a reason that the data source exists
On multiple nodes, it is the same operation as a cross-node join to resolve through a query. Same
Federated can also be partially solved, and of course there are the same risks.
Or the same problem, how to do? I also continue to recommend that the application be addressed.
How to solve? The solution is broadly similar to that of a cross-node join, but with a point and cross-node join
Not quite the same, joins often have a driver-driven relationship, so the join itself involves multiple tables
There is generally a sequential relationship between data reads. But sort of pagination is not the same, sorting the paging data
The source can basically be said to be a table (or a result set), and there is no sequential relationship in itself, so in multiple
The process of fetching data from a data source can be completely parallel. In this way, the fetch efficiency of sorted paging data we can do more than cross
The library join is higher, so the performance loss is relatively small, in some cases it may be less than the original data
More efficient in the segmented database. Of course, whether it's a cross-node join or a cross-node sort of paging, it will make us
Application server consumes more resources, especially memory resources, because we are reading access and merging the result set
Process requires more data than was originally processed.

analysis Here, may be a lot of friends will find that all these problems above, I give the advice is basically through the application to solve. People may start to whispered, is it because I am a DBA, so many things are thrown to application architects and developers?

In fact, this is not the case, the first application due to its particularity, can be very easy to do very good extensibility, but
Database is different, there must be many other ways to expand, and in this expansion process, it is difficult to
Avoid bringing some of the original in a centralized database can be resolved but cut apart into a database cluster becomes a
A difficult situation. In order to maximize the overall system, we can only let the application do more things,
To solve the problem that the database cluster can not be solved better.

5.4 Summary:

Splitting a large MySQL server into multiple small MySQL servers through data segmentation technology solves both
Write performance bottlenecks, and once again improves the scalability of the entire database cluster. Whether it is through vertical slicing,
or horizontal segmentation, which makes the system less likely to experience bottlenecks. Especially when we use vertical and horizontal combination
Segmentation method, you will theoretically no longer encounter expansion bottlenecks.

6, Case demo 6.1 Creating a Database 3 instances

To create a multi-instance participation: http://blog.csdn.net/mchdba/article/details/45798139

6.2 Creating libraries and tables and users

Create a library table

create database `hwdb` /*!40100 default characterset utf8 */;create table uc_user(user_id bigint primarykey, uc_name varchar(200), created_time datetime) engine=innodb charset utf8;

Create user

grant insert,update,delete,select on hwdb.*to [email protected]‘192.168.%‘ identified by ‘timgood2013‘;

Execution process:

mysql>  CREATE TABLE Uc_user (User_idbigint primary key, Uc_name varchar (200 ), Created_time datetime) Engine=  innodbcharset UTF8; Query OK, 0  rows  affected ( 0.53  sec) mysql>  MySQL >  Grant Insert,update,delete,selecton hwdb *  to  [Email protected]  ' 192.168.% '  identified by   ' timgood2013 ' ; Query OK, 0  rows  affected ( 0.03  sec) mysql>   


6.3 Creating Java code samples
 PackageMySQLImportJava.math.BigInteger;ImportJava.sql.Connection;ImportJava.sql.DriverManager;ImportJava.sql.PreparedStatement;ImportJava.sql.ResultSet;ImportJava.sql.SQLException;ImportJava.sql.Statement;ImportJava.util.Calendar; Publicclassmysqltest { Public Static void Main(string[] args) {mysqltestmt=newmysqltest ();//Bigintegerbi = Newbiginteger ("2015053010401005");         Stringport=mt.getdbport (Bi.longvalue ());        Connection Conn=mt.getconn (port); Mt.insert (Conn,bi,"tim--"+bi.longvalue ()); }//Get the DB port to access     PublicStringGetdbport(LongUSER_ID) {stringport="3307";Longv_cast=user_id%3;if(v_cast==1) {port="3308"; }Else if(v_cast==2) {port="3309"; }Else{port="3307"; }returnPort }//Get database connections, if extended, you can make an interface available to the programmer to call it     Public Connectiongetconn(String Port) {Connectionconn =NULL;Try{Class.forName ("Com.mysql.jdbc.Driver"); }Catch(ClassNotFoundException e) {//TODO auto-generated catch blockE.printstacktrace (); } StringUrl ="jdbc:mysql://192.168.52.130:"+port+"/hwdb";Try{conn= drivermanager.getconnection (URL,"Tim","timgood2013"); }Catch(SQLException e) {//TODO auto-generated catch blockE.printstacktrace (); } System.out.println ("The current db is:"+url);returnConn }//Get date string     Public Stringgettimebycalendar() {Calendar cal = Calendar.getinstance ();intYear = Cal.get (calendar.year);//Get year        intMonth=cal.get (Calendar.month);//Get month        intDay=cal.get (calendar.date);//Acquisition Day        intHour=cal.get (Calendar.hour);//hour        intMinute=cal.get (Calendar.minute);//min        intSecond=cal.get (Calendar.second);//SecString strdate=year+"-"+month+"-"+day+" "+hour+":"+minute+":"+second;returnstrdate; }//Start Data entry     Public int Insert(Connectioncnn,biginteger user_id,string name) {stringsql="Insert Intohwdb.uc_user (User_id,uc_name,created_time) VALUES (?,?,?)";intI=0;LongUID =user_id.longvalue (); CONNECTIONCONN=CNN;Try{PreparedStatement prestmt=conn.preparestatement (SQL); Prestmt.setlong (1, UID); Prestmt.setstring (2, name); Prestmt.setstring (3, Gettimebycalendar ());         I=prestmt.executeupdate (); }Catch(SQLException e)         {E.printstacktrace (); }returnI//Returns the number of rows affected, 1 for successful execution}}


6.4 Test Code

USER_ID in accordance with the registration date of +9999 seconds, the idea is that a second to meet 9,999 concurrency, nor, as to how to unify the planning of this global 9,999, you can set a static global variable, and this global variable will be saved to a DB in a timely manner, so that the basic guarantee is not duplicated, For example, user_id:2015053010401005, 2015053010401006, 2015053010401007, the test code is as follows:

        MySQLTestmt=newMySQLTest();        //        BigIntegerbi = newBigInteger("2015053010401005");        Stringport=mt.getDBPort(bi.longValue());         Connection conn=mt.getConn(port);        mt.insert"tim--"+bi.longValue());
1) Obtain the DB connection through%;
       余0 --> db1(3307端口)       余1 --> db2(3308端口)       余2 --> db3(3309端口)
2) View Results

after the end of the run, you can go to 3 instances to view the input data as follows:
ID 2015053010401005 should be input to DB1 (3307 port)

[[email protected] ~]# mysql--socket=/usr/local/mysql3307/mysql.sock -e "select * fromhwdb.uc_user;";+------------------+-----------------------+---------------------+| user_id          | uc_name               | created_time        |+------------------+-----------------------+---------------------+| 2015053010401005 | tim--2015053010401005 |2015-04-30 09:27:48 |+------------------+-----------------------+---------------------+[[email protected] ~]#

ID 2015053010401006 should be input to DB2 (3308 Port)

[[email protected] ~]# mysql--socket=/usr/local/mysql3308/mysql.sock -e "select * fromhwdb.uc_user;";+------------------+-----------------------+---------------------+| user_id          | uc_name               | created_time        |+------------------+-----------------------+---------------------+| 2015053010401006 | tim--2015053010401006 |2015-04-30 09:27:57 |+------------------+-----------------------+---------------------+[[email protected] ~]#


ID 2015053010401007 should be input to DB3 (3309 port)

[[email protected] ~]# mysql--socket=/usr/local/mysql3309/mysql.sock -e "select * fromhwdb.uc_user;";+------------------+-----------------------+---------------------+| user_id          | uc_name               | created_time        |+------------------+-----------------------+---------------------+| 2015053010401007 | tim--2015053010401007| 2015-04-30 09:28:01 |+------------------+-----------------------+---------------------+[[email protected] ~]#
6.5 Summary

Basic and the same as expected, the data through the module ID to the remainder of the method, split horizontally into different libraries, here is just a simple demonstration, the actual production complexity is much higher than this, so we encounter more problems, but the concept of horizontal split is similar, this road is bright, we can rest assured that go on.

Reference document: MySQL performance tuning and architecture design Book

MySQL Horizontal Split (reading notes collation)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.