Options for the summary of Mysql sub-differentiated tables

Last Update:2014-12-07 Source: Internet

Author: User

Tags mysql manual ticket database sharding

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

[TOC]

Quote

Previous time project needs, has been studying MySQL sharding, read some of this information, but also personally tested some data. In this, to do a summary of the notes, to facilitate the review of knowledge, in fact, mostly for reference on the network of predecessors, and then hold the learning attitude to practice, accumulate their own things.

Split Policy selection

In fact, the split is very flexible, there are vertical segmentation , a library is split into two or more, the associated table will be placed in a library. There's a horizontal slice . A table with a large amount of data is split according to certain logic. The personal feeling of vertical segmentation alleviates the bottleneck of Io, while horizontal segmentation aims to relieve the pressure of reading and writing on individual tables or tables.
Our project is based on individual needs, using horizontal segmentation, without going to the sub-Library. After that, we need to see what kind of segmentation to use.
The following is a list of tables, partitions, and merge engines.

Merge Engine sub-tableBrief introduction

The merge table is introduced first, and this method applies only to MyISAM. My database tables are all using the InnoDB engine, so the first is pass, but here is a brief introduction.
The MySQL 5.1 handbook says.

An alternative to a MERGE table was a partitioned table, which stores partitions of a single table in separate files. Partitioning enables some operations to being performed more efficiently and are not limited to the MyISAM storage engine.

Changing to the Merge engine table means becoming a partitioned table, which stores the partitions of a single table in separate files. Partitioning can make some operations more efficient and is not limited by the MyISAM storage engine. (Poor English, you crossing more magnanimous.) ）

The above should be the main reason to use the merge table.

Create a Use

The requirement to be able to create a merge table is first a set of tables with identical data structures, and the storage engine is MyISAM.

Let's first create a

mysql> CREATE TABLE T1 (    -    a INT not NULL auto_increment PRIMARY KEY,    message CHAR) Engin e=myisam;mysql> CREATE TABLE T2 (    -    a INT not NULL auto_increment PRIMARY KEY,    message CHAR ( ) engine=myisam;mysql> INSERT into T1 (message) VALUES (' Testing '), (' table '), (' T1 ');mysql> insert into T2 (Messa GE) VALUES (' Testing '), (' table '), (' T2 ');mysql> CREATE table Total (    -    a INT not NULL auto_increment,    , Message CHAR (a), INDEX (a))    ,    engine=merge union= (t1,t2) insert_method=last;

After query

Mysql> SELECT * FROM total;+---+---------+| A | Message |+---+---------+| 1 | Testing | | 2 | Table   | | 3 | T1      | | 1 | Testing | | 2 | Table   | | 3 | t2      |+---+---------+

You create a total table, which is just equivalent to creating it on the basis of a t1,t2 table, and it should be noted that the primary key or unique index in a single table can no longer be used as a unique reference in the total table after the merge, which should be better understood.
At the same time you can drop ALTER TABLE tbl_name UNION=(...) change the data set of the table so that it can be changed dynamically to eliminate unwanted.

Usage Scenarios

If your data records have a certain time pattern, such as some logs that need to be recorded every day, maybe you only need the last one months or the last few months, so you can create a data table every day or sometime, when you need to query the data for a period of time, you just create a data table for that time.
A total merge table. So the data set can be controlled in a manageable range, yes. So easy.

Sub-table

In fact, the idea of the table is simple, as the name implies is a large number of existing data table to split. If the performance bottleneck of the database is on several key tables, you can include the sub-table in the scope of your consideration.

Problems encountered

I'm talking about the problems and solutions I encountered when I was in the experimental table.

1. How to divide the table
Based on what strategy to divide the data in the existing table into multiple tables, and also take into account the future extensibility.
The German question on this discussion can be borrowed from the next,

What are the MySQL sub-tables and the split strategy? Under what circumstances are they applied? 》
"Another shot of the net split strategy"

is to create an index table, the user ID corresponds to the database ID, (here he will be the same structure of the table in different databases to further reduce pressure, but at the same time for data synchronization also needs to be solved by other means), its essence is also divided into the table of the library. The advantage of this is to facilitate later expansion, but the loss of a bit of performance, because there will be more queries.

Personal thoughts, so that the index table could become a new bottleneck unless the user does not always grow ha.
My approach belongs to another, writing an algorithm by calculating a column value, according to a certain rule, the data is roughly evenly divided into each sub-table. As for extensibility, writing algorithms takes into account the problem of increasing the number of tables in the future.
Choose which strategy, is to look at the business characteristics of their own table, the method does not have absolute advantages, or to choose according to their own needs.

2. maintenance of primary key after sub-table
Before the table, the primary key is the auto-incrementing bigint type. So the primary key format has been determined early, like what UUID and so on is directly pass off.
Also want to write a primary key generator, using Java Atomic atomic weight characteristics, but consider the need to increase the workload and high concurrency, this is likely to be a hidden danger.
There is also the application layer on the management of primary keys, such as Redis in the atomic increment.
The more famous strategy on the internet is the Ticket servers:distributed Unique Primary Keys on the Cheap,
The general meaning is to use a MyISAM storage engine table named TICKETS64, dedicated to storing the primary key, only one row of data, using the

REPLACE into Tickets64 VALUES (' a '); SELECT last_insert_id ();

to fetch. and set up two libraries, the same way, only each growth step is different, to prevent a downtime, but also stable operation.
Other Good article "Database sub-list (sharding) series (ii) Global primary key generation strategy", "about primary Key Management", "Sub-database table (sharding) key global Uniqueness Solution"

2. dynamically Select table names
After the table is divided, the problem comes again, and the database tier uses the MyBatis framework for our project. The SQL statements are written in the XML file, and now I need to set the table name dynamically.
Actually set MyBatis itself, can solve this problem

One of the StatementType statement,prepared or callable. This allows MyBatis to use statement,preparedstatement or CallableStatement, respectively, with the default value: PREPARED

As long as the property statementType is set to STATEMENT , the table name can be passed in as a parameter. To pass in a parameter with a dollar character to pass in the ${columnName} parameter, as for the difference of statement,preparedstatement I think we should all be able to know.

Another solution is to use the "Shardbatis plug-in", which is open source, can achieve the data level segmentation function, interested friends can understand the next.

Partition table

From the mysql5.1, a table of partition engines was provided to see

In effect, different portions of a table is stored as separate tables in different locations.
In fact, each part of a table can be stored in a separate individual table in a different location (slightly crappy)

In my understanding, is it more beneficial to read a whole table if it is partitioned on different disk locations?

Partitioning table pros and cons

Here is the main reading of the MySQL manual, I also played a role in translation.

Partitioning makes it possible to store more data in one table than can is held on a single disk or file system partition.
Compared to a table, it can only be stored on a single hard disk or in a file system partition. Partitioning means making it possible to store more data.

Data that loses its usefulness can often is easily removed from a partitioned table by dropping the partition (or Partitio NS) containing only that data. Conversely, the process of adding new data can in some cases is greatly facilitated by adding one or more new partitions F or storing specifically that data.
The failed data is more easily removed by dropping the partitioning method that contains only this data. Conversely, it is easier to store some new data by adding new partitions.

Some queries can be greatly optimized in virtue of the fact that data satisfying a given WHERE clause can is stored only O n One or more partitions, which automatically excludes any remaining partitions from the search. Because partitions can altered after a partitioned table have been created, can reorganize your data to enhance freq Uent queries that may is not having been often used when the partitioning scheme is first set up. This ability to exclude non-matching partitions (and thus any rows they contain) are often referred to as partition pruning , and was implemented in MySQL 5.1.6.
This sentence is very difficult to translate, I will say the general meaning, when you have a column partition, query statement where if you can specify a specific partition or a range, the query will be optimized. In fact, because you specify the partition in the where, the query will only retrieve the partition you specified, and the other data will not be retrieved. The latter part says that you can modify the partition on the created partition to make it more reasonable.

Queries involving aggregate functions such as SUM () and COUNT () can easily be parallelized.
Those aggregation functions, such as SUM (), COUNT () are easily processed in parallel. (Sounds cool.)

These two articles are well written, the advantages and disadvantages of MySQL partitioned tables, and the limitations of the MySQL partitioned table on partition functions.
There is one more thing to consider when choosing the MySQL partitioning scheme, and there is a bug about MySQL partition table query cache in MySQL BUG: "Partitioning + query Cache" Because of this problem, MySQL has already disable the query cache for partitioned tables, and does not enable query caching, whether you open the query cache or not. If you care about this, please choose the plan carefully.

Real case

Some good sharding examples on the Internet, with links to share with June
"Database sharding at Netlog, with MySQL and PHP".
The design of database sub-Library in the network architecture.
"Amazon's Dynamo".
Ticket servers:distributed Unique Primary Keys on the Cheap.
Some need to turn down the wall to see, as to how to turn the wall to believe that the gods have their own way ha.

The above is about, MySQL three split program summary, data aspects are self-search so inevitably some will be inaccurate, if found please be sure to inform, hope with you grow ~ ~ ~.

Note: Follow-up will also consider writing how to do in the database layer, the establishment of a separate table and data import test related experience.

Options for the summary of Mysql sub-differentiated tables

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More