Mysql database sharding policy, mysql database sharding

Last Update:2017-12-05 Source: Internet

Author: User

Tags database sharding

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Mysql database sharding policy, mysql database sharding

I. First explain why Table sharding is required:

When a piece of data reaches several millions, it takes more time for you to query it at a time. If there is a joint query, it may die there. The purpose of table sharding is to reduce the burden on the database and shorten the query time. During daily development, we often encounter large tables. The so-called large tables are tables that store millions or even tens of millions of records. Such a table is too large, resulting in a long time-consuming database query and insertion, and poor performance. If a joint query is involved, the performance will be worse. The purpose of table sharding and Table Partitioning is to reduce the burden on the database and improve the efficiency of the database. Generally, it is to improve the efficiency of adding, deleting, modifying, and querying tables. The data volume in the database is not necessarily controllable. When database and table sharding is not performed, as time and business grows, the number of tables in the database increases and the data volume in the table increases, correspondingly, the overhead for adding, deleting, modifying, and querying data operations will also increase. In addition, due to the inability to conduct distributed deployment, the resources of one server (CPU, disk, memory, IO, etc) it is limited, and the data volume and data processing capability that the database can carry will all experience bottlenecks.

Mysql executes an SQL statement as follows:

1. Receive SQL statements;

2. Place the SQL statement in the queue;

3. Execute SQL statements;

4. Return the execution result.

What is the most time spent in this execution process? The first is the waiting time in the queue, and the second is the SQL Execution time. In fact, these two are the same thing. While waiting, there must be SQL Execution. Therefore, we need to shorten the SQL Execution time.

Mysql has a mechanism of table locking and row locking. Why does this mechanism occur to ensure data integrity? For example, if two SQL statements need to modify the same data record of the same table, what should we do? Can Both SQL statements modify this data record at the same time? Obviously, mysql handles this situation by locking tables (myisam storage engine) and row (innodb Storage engine ). If the table is locked, neither of you can operate on the table. You must wait for me to complete the operation on the table. The same is true for row locking. Other SQL statements can operate on this data only after I have finished the operation on this data. If there is too much data, the execution time is too long, and the waiting time is longer, which is why we want to split tables.

Ii. Table sharding Solution

1. Clusters

1. When I create a mysql cluster, someone will ask the mysql cluster, what is the relationship between the root and table shards? Although it is not a table sharding in the actual sense, it starts the role of table sharding. What is the meaning of cluster? To reduce the burden on a database, simply reduce the number of SQL statements in the SQL queue. For example, if there are 10 SQL requests in the queue of a database server, it takes a long time to allocate these 10 SQL requests to the queuing queues of five database servers. There are only two queues of one database server, is the waiting time greatly shortened? This is already obvious. So I listed it within the table sharding range. I have done some mysql clusters:

Installation, configuration, and read/write splitting of linux mysql proxy

Mysql replication is mutually active/standby installation and configuration, and Data Synchronization

Advantage: good scalability, no complex operations after multiple table shards (php code)

Disadvantage: The data volume of a single table remains unchanged, the time spent on one operation is still that large, and the hardware overhead is high.

2. Sub-tables

Two table sharding methods:

Split the lecture fields into different tables and split the string fields in the original table into other tables, which can speed up the query of the master table.

2. Vertical Split is divided by field.

A database has million user records. including the field id, user, password, first_name, last_name, email, addr, and Other cross segments. when a user logs on, the user and password fields are required, and the user and password fields need to be searched for are relatively slow. If the user and password fields are used to create a table, the speed will be faster. create a table independently for other fields of the user. this is only an example.

Split data into multiple tables with the same structure.

Level. It means dividing by record. A database has million user records. The processing speed is relatively slow. In this case, we can divide million records into five parts. Each part is million records, which are placed on different machines.

Horizontal table sharding:

A table with a large data volume and frequent access is pre-estimated and divided into several tables. This is a table with a very poor estimation. The table posted in the Forum is, after a long time, this table must be large, with hundreds of thousands or even millions of users. The chat room information table contains dozens of people chatting for one night. After a long time, the data in this table must be large. There are many situations like this. Therefore, for this big data table that can be estimated, We will separate N tables in advance. The N value depends on the actual situation. Take the chat info table as an example:

We will first create 100 such tables, message_00, message_01, message_02 .......... Message_98, message_99. Then, based on the user ID, determine which table the user's chat information is stored in. You can obtain it by using the remainder method.

3. In actual application:

Vertical table sharding and horizontal table sharding must be used in combination. If a database has million users, you can consider vertical split before horizontal split.

Other fields are first split into user_info tables. The user master table only contains key fields such as user ID, password, and user name.

After horizontal split, the user and user information tables are divided into multiple tables with the same structure.

Next, let's take a look at how MYSQL operates when storing data in table shards:

1. Simple MySQL master-slave replication:

The master-slave replication of MySQL solves the read/write splitting of the database and improves the read performance. The figure is as follows:

The master-slave replication process is shown in:

However, master-slave replication also brings about a series of performance bottlenecks:

1. Writing cannot be extended

2. Write cannot be cached

3. Replication latency

4. Increased lock table Rate

5. The table grows and the cache rate decreases.

Then the problem has to be solved, which leads to the following optimization solution. Let's take a look.

2. MySQL vertical partitioning

If the business is cut independently enough, it would be a good solution to put data of different businesses on different database servers, in addition, if one of the services crashes, it will not affect the normal operation of other services, and also play the role of load distribution, greatly improving the database throughput. The database architecture diagram after vertical partitioning is as follows:

However, although the business is already independent enough, some businesses are more or less associated. For example, users are basically associated with each business. In addition, this Partitioning Method, it cannot solve the problem of soaring data volume in a single table, so why not try horizontal segmentation?

3. MySQL horizontal Sharding)

This is a good idea. Users are grouped by certain rules (by id hash), and the data of this group of users is stored in a database Shard, that is, a sharding, in this way, as the number of users increases, you only need to simply configure a server. The schematic diagram is as follows:

How to determine the shard where a user is located? You can create a data table corresponding to the user and shard. Each request first looks for the user's shard id from this table, query related data from the corresponding shard, as shown in:

Single Database, single table

Single Database, single table is the most common database design. For example, if a user table is stored in the database, all users can find it in the user table in the database.

Single Database, multiple tables

As the number of users increases, the data volume of the user table will increase. When the data volume reaches a certain level, the query of the user table will gradually slow down, thus affecting the performance of the entire DB. If mysql is used, another more serious problem is that when a column needs to be added, mysql locks the table, during which all the read/write operations can only wait.

Users can be horizontally split in some way to generate two tables with identical structures, such as user_0000 and user_0001. user_0000 + user_0001 +... The data is just a complete data.

Multi-database, multi-table

As the data volume increases, the storage space of a single database may be insufficient. As the query volume increases, a single database server cannot support it. In this case, you can further differentiate the database horizontally.

Database/table sharding rules

When designing a table, you must determine the database/table sharding rules for the table. For example, when a new user exists, the program must determine the table to which the user information is added. Similarly, when logging on, we must find the corresponding records in the database through the user account, all of these operations must follow a certain rule.

Routing

Find the corresponding table and database through the database/table sharding rule. For example, the database/table sharding rule is user_id mod 4. When a user registers a new account, the account id is 123, we can use id mod 4 to confirm that this account should be saved to the User_0003 table. When logging on to user 123, we use 123 mod 4 and confirm the record in User_0003.

Issues arising from database/table sharding and precautions

1. database/table sharding

If you have purchased a product, you need to save the transaction records. If you want to store the transaction records in the same table according to the user's latitude, therefore, it is very convenient to find the purchase status of a user, but the purchase status of a product is likely to be distributed in multiple tables, which is troublesome to find. On the contrary, you can easily find the purchase status of the product by table sharding by item dimension, but it is troublesome to find the transaction records of the buyer.

Therefore, common solutions include:

A. This method is basically impossible to solve through table scanning, and the efficiency is too low.

B. Record two data copies, one table sharding by user latitude and one table sharding by item dimension.

C. It can be solved through search engines. However, if real-time requirements are high, real-time search is required.

2. Joint query Problems

Joint query is basically impossible, because the associated tables may not be in the same database.

3. Avoid cross-database transactions

Avoid modifying the table in db1 when modifying the table in db0 in a transaction. One is that the operation is more complicated and the efficiency will also be affected.

4. Try to put the same group of data on the same DB server.

For example, if both the product and transaction information of seller a are stored in db0, when db1 fails, the items related to seller a can be used normally. That is to say, to prevent the data in the database from being dependent on the data in another database.

One master, multiple slave

In practical applications, the majority of cases are reading much larger than writing. Mysql provides a read/write splitting mechanism. All write operations must correspond to the Master. Read operations can be performed on the Master and Slave machines. The Slave and Master structures are identical, A Master can have multiple Slave instances or even Slave instances. This method can effectively improve the QPS of the DB cluster.

All write operations are performed on the Master and then synchronously updated to the Slave. Therefore, synchronization from the Master to the Slave has a certain delay. When the system is busy, the latency problem is more serious, and the increase in the number of Slave machines will also make the problem more serious.

In addition, it can be seen that the Master node is the bottleneck of the Cluster. When there are too many write operations, it will seriously affect the stability of the Master node. If the Master node fails, the entire cluster will not work normally.

Therefore, 1. When the read pressure is very high, you can consider adding Slave machine fraction to solve the problem, but when the Slave machine reaches a certain number, you must consider database sharding. 2. When writing pressure is high, database sharding must be performed.

Why database/table sharding for MySQL?

It can be used in MySQL. As long as the data volume is large, a problem will occur immediately.

Here we reference a question: why do we need to split databases and tables? Can't MySQL process large tables?

In my project, the physical file size of a single table is more than 80 GB, and the number of records in a single table is more than 0.5 billion.

It is a very nuclear table: Friend relationship table.

However, this method is not the best method, because there are also many problems in file systems such as the Ext3 file system for processing larger files.

At this level, the xfs file system can be used for replacement. However, when a single MySQL table is too large, one problem cannot be solved: The operation base related to table structure adjustment.

This is not possible. Therefore, the application of database/table sharding is monitored for all major items in use.

From the perspective of Innodb itself, there are only two locks on the Btree of the data file, namely the leaf node lock and the sub-node lock. You can know when page splitting or adding

When a leaf is added, data cannot be written to the table.

Therefore, database/table sharding is a good choice.

How many database/table shards are suitable?

According to the test, there are 10 million records in a single table, and the write and read performance is good. In this way, when the buffer is reserved, all data fonts in a single table are maintained in

The number of records is less than 8 million, and the number of tables with orders is less than 5 million.

If you plan based on 100 database and 100 table, such as user business:

5 million * 100*100 = 500000 million = 500 billion records.

There is a number in mind, and it is easier to plan by business.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More