Analysis of MySQL table sharding Technology

Last Update:2018-12-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Analysis of table sharding Technology in MySQL

* 1. Case Description

In the intelligent transportation project, as the number of vehicles in cities increases rapidly, the unlimited growth of the master table will inevitably affect system performance. To keep the database in good performance, we need to adopt the table sharding mechanism.

First, you need to understand the benefits of table sharding and table sharding. First, the database executes the SQL statement:

Receive SQL --> put in SQL Execution queue --> Use analyzer to break down SQL --> extract or modify data according to analysis results --> return processing results.

This means that if the previous SQL statement is not completed, the subsequent SQL statement will not be executed, because to ensure data integrity, the data table file must be locked, two types of locks are available: Shared locks and exclusive locks. During the lock period, other threads can access the data file, but the modification operation is not allowed. Correspondingly, the exclusive lock means that the entire file is owned by one thread, other threads cannot access this data file.

In general, MyISAM, the fastest storage engine in MySQL, is locked based on tables. That is to say, if one is locked, the entire data file cannot be accessed from outside. After the previous operation is completed, to receive the next operation, the previous operation is not completed, and the latter operation is not executed in the queue. Blocking is usually called a "lock table ".

What are the consequences of locking a table directly? That is, a large number of SQL statements cannot be executed immediately. You must wait until all the SQL statements in front of the queue are executed. This unexecutable SQL will result in no results, or the delay is serious, affecting the user experience.

Especially for tables that are frequently used, such as the user information table in the SNS system and the post table in the forum system, these tables are all tables with a large access volume, to ensure quick data extraction and return to users, you must use some processing methods to solve this problem. This is the table sharding technology I will talk about today.

* 2. Case Analysis and Solution

The following describes the operation methods and basic steps for table sharding:

First, table sharding based on basic tables

The basic idea is to divide the business into the main table and basic table, in which the main table stores all the basic information. If the business needs to use specific data, you must find the specified table name from the base table.

Taking the vehicle passing data of intelligent transportation as an example, the vehicle passing information mainly includes license plate number information, image information (image path, image size, etc.), and other information (such as time and illegal type ). Therefore, we can design the vehicle information table as the basic information table, and design the license plate information table and image information table as the business table.

The relationship between the vehicle information, license plate information table, and image information table is as follows: Vehicle Information <license plate information table <image information table.

Therefore, we can perform table sharding for the license plate information table and image information table. We deepen our table information as follows:

Passvehicle_tb table

Basic table ID int (10)

Data Type name char (50)

Sub-Table ID int (10)

License plate table plate_tb table

License plate table ID int (10)

Basic table ID int (10)

Picture_tb table

Image table ID int (10)

License plate table ID int (10)

We have a passvehicle_tb record:

Basic table ID license plate type subtable ID

1 Blue Card 1

Then we can combine the metric name based on passvehicle_tb, such as blueplate_1. Then all the blue-card vehicle passing data is saved to blueplate_1.

When there are few license plates, this approach may be of no benefit, but when there are many data types, data insertion and search will be greatly improved.

For example, we want to find information about all the blue-card vehicles. We can directly search: Select * From blueplate_1 where baseinfoid = 1;

When there are a lot of blue-card vehicles in a certain place, we can split the table based on the ending card. At this time, our sub-Table IDS can be richer. For example, we divide all the blue-card cars into 10 numbers, such as blueplate_0 and blueplate_1... blueplate_9.

The advantages and disadvantages of this method are as follows:

[Advantage] It is very convenient to add or delete nodes, which facilitates the upgrade and maintenance in the future.

[Disadvantage] It may cause a bottleneck if you need to add or operate a table and cannot leave the database.

Second, based on HashAlgorithmTable sharding

A hash table is a unique value calculated by a special hash algorithm. You can use the calculated value to find the expected value. This is called a hash table.

The design principle of table sharding is similar to this idea: The table name of the data storage table is calculated by the ID or name of the original target using a hash algorithm, and then the corresponding table can be accessed.

Take the traffic information table as an example. The ID of each basic table and the ID of the license plate information are unique. These two values are fixed and unique, then we can consider performing some operations on one of these two values to obtain the name of a target table.

For exampleProgramIf the system allows a maximum of 0.1 billion data records, and each table stores 1 million records, no more than 100 tables can be accommodated in the system. According to this standard, we assume that the hash operation is performed on the basis of the ID of the basic vehicle passing table to obtain a key value. This value is our table name and then access the corresponding table.

We construct a simple hash algorithm:

Int get_hash (int id ){

String STR = bin2hex (ID );

String hash = substr (STR, 0, 4 );

If (strlen (hash) <4 ){

Hash = str_pad (hash, 4, "0"); // if there are less than 4 digits, complete

}

Return stringtoini (hash );

}

The algorithm is to input a basic information table id value, and then the function returns a four-digit number. For example: get_hash (1), the output result is "3100", input: get_hash (23819), the result is: 3233, we continue through a simple combination with the table prefix, you can access this table. When we need to access the content with ID 1, the combined tables will be plate_3100 and picture_3100, so we can directly access the target table.

It should be noted that, even after the hash algorithm is used, some data may be in the same table. This is different from the hash table, and the hash table tries its best to resolve conflicts, but we don't need to. We only need to make the names of the tables that may be stored in the Prediction and Analysis tables, and reasonably allocate the data.

If more data needs to be stored, you can perform the hash operation on the Forum name. For example, the preceding binary conversion is also in hexadecimal format, because Chinese characters are much more likely to be repeated than numbers and letters, but more tables may be combined. Therefore, some other problems must be considered.

In short, if you use the hash method, You must select a good hash algorithm to generate more tables and query data more quickly.

[Advantage: the hash algorithm directly obtains the target table name, which is highly efficient]

[Disadvantage] poor scalability. A hash algorithm is selected to define the amount of data. In the future, it can only run on this amount of data. It cannot exceed this amount of data, and the scalability is slightly poor.

Third, table sharding Technology Based on rename indication

This table sharding technology applies to using a field in the primary table of the database as the base field for table sharding. For example, the vehicle passing information table is used. All vehicle passing data is concentrated in 1 ~ 12. We can divide the data into 12 tables, such as vehicle_20151101 and vehicle_20151102... vehicle_20151112.

At this time, I do not need to query the data of a certain period of time and count the data of a certain month (because many query operations occur in the query information of this month ). However, this table sharding technology is not suitable for queries that often occur in multiple months.

A new table in the month is created at the beginning of each month to store the data of the previous month. In addition, there is a data table indicating that the data remains unchanged. This table is used to temporarily store the data of the new month. You may name it my_table. This table exists all the time. For example, in January, the data in February is temporarily stored in this table. At this time, you need to change the name of my_table,

Rename table my_table to table_20151110; at this time, the my_table table is renamed, but the original my_table table still exists, but all the data has been stored in the table table_20151111, note that the table is locked during rename. However, the advantage of using Rename for batch processing is that a large amount of data does not need to be exported or imported during the transfer, so the efficiency will be very low, in addition, the pressure on the database will be high,

After renaming the my_table table, you need to re-create the my_table table, creat my_table (.......)

In principle, table_20151110 stores all the data in February. However, this operation is only performed because it was No. 3. Therefore, the data in February 10 is stored on the table_20151110table, then we need to re-find the data from No. 1 to No. 3 in this table and insert it to my_table (currently storing data from March October)

Insert into my_table (field1, field2 ....) select field1, field2 .... from my_table_20151110 where filed3> = "2011-11-01"; in this way, the data from No. 1 to No. 3 is saved to the my_table that temporarily stores the data for March, every new month, we need to perform repeated operations on this table for table sharding. However, the my_table table remains unchanged and the SQL statement inserted each time does not need to be changed, however, when querying, You need to judge the time to find the specific table, because we have already performed the table sharding operation.

In this way, such an operation is repeated at the beginning of each month, so that the sub-tables can be smoothly implemented, and this operation must be performed during insertion:

Alter table my_table max_rows = 10000000 avg_row_length = 15000; this operation ensures that the table stores a large amount of data (more than 2 GB) (This operation is not required if the data volume is small)

Iii. Summary

In applications with large loads, databases have always been an important bottleneck and must be broken through. Currently, customers' requirements are getting higher and higher. On the basis of ensuring data stability, this article provides a rough description of the two table sharding methods. Of course, this articleCodeAnd ideas have not been rigorously elaborated, so it is impossible to ensure that the design is completely accurate and practical, forgive me!

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Analysis of MySQL table sharding Technology

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Analysis of MySQL table sharding Technology

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support