MYSQL database data splitting-database/table sharding summary, mysql database sharding
Data storage evolution 1: single database, single table
Single Database, single table is the most common database design. For example, if a user table is stored in the database, all users can find it in the user table in the database.
Data storage evolution concept 2: single database, multiple tables
As the number of users increases, the data volume of the user table will increase. When the data volume reaches a certain level, the query of the user table will gradually slow down, thus affecting the performance of the entire DB. If mysql is used, another more serious problem is that when a column needs to be added, mysql locks the table, during which all the read/write operations can only wait.
Users can be horizontally split in some way to generate two tables with identical structures, such as user_0000 and user_0001. user_0000 + user_0001 +... The data is just a complete data.
Data storage evolution 3: multi-database, multi-table
As the data volume increases, the storage space of a single database may be insufficient. As the query volume increases, a single database server cannot support it. In this case, you can further differentiate the database horizontally.
Mysql database sharding rules
When designing a table, you must determine the database/table sharding rules for the table. For example, when a new user exists, the program must determine the table to which the user information is added. Similarly, when logging on, we must find the corresponding records in the database through the user account, all of these operations must follow a certain rule.
Routing
Find the corresponding table and database through the database/table sharding rule. For example, the database/table sharding rule is user_id mod 4. When a user registers a new account, the account id is 123, we can use id mod 4 to confirm that this account should be saved to the User_0003 table. When logging on to user 123, we use 123 mod 4 and confirm the record in User_0003.
The following are the issues arising from database/table sharding and the precautions
1. database/table sharding
If you have purchased a product, you need to save the transaction records. If you want to store the transaction records in the same table according to the user's latitude, therefore, it is very convenient to find the purchase status of a user, but the purchase status of a product is likely to be distributed in multiple tables, which is troublesome to find. On the contrary, you can easily find the purchase status of the product by table sharding by item dimension, but it is troublesome to find the transaction records of the buyer.
Therefore, common solutions include:
A. This method is basically impossible to solve through table scanning, and the efficiency is too low.
B. Record two data copies, one table sharding by user latitude and one table sharding by item dimension.
C. It can be solved through search engines. However, if real-time requirements are high, real-time search is required.
2. Joint query Problems
Joint query is basically impossible, because the associated tables may not be in the same database.
3. Avoid cross-database transactions
Avoid modifying the table in db1 when modifying the table in db0 in a transaction. One is that the operation is more complicated and the efficiency will also be affected.
4. Try to put the same group of data on the same DB server.
For example, if both the product and transaction information of seller a are stored in db0, when db1 fails, the items related to seller a can be used normally. That is to say, to prevent the data in the database from being dependent on the data in another database.
One master, multiple slave
In practical applications, the majority of cases are reading much larger than writing. Mysql provides a read/write splitting mechanism. All write operations must correspond to the Master. Read operations can be performed on the Master and Slave machines. The Slave and Master structures are identical, A Master can have multiple Slave instances or even Slave instances. This method can effectively improve the QPS of the DB cluster.
All write operations are performed on the Master and then synchronously updated to the Slave. Therefore, synchronization from the Master to the Slave has a certain delay. When the system is busy, the latency problem is more serious, and the increase in the number of Slave machines will also make the problem more serious.
In addition, it can be seen that the Master node is the bottleneck of the Cluster. When there are too many write operations, it will seriously affect the stability of the Master node. If the Master node fails, the entire cluster will not work normally.
Therefore, 1. When the read pressure is very high, you can consider adding Slave machine fraction to solve the problem, but when the Slave machine reaches a certain number, you must consider database sharding. 2. When writing pressure is high, database sharding must be performed.
Why database/table sharding for MySQL?
It can be used in MySQL. As long as the data volume is large, a problem will occur immediately.
Here we reference a question: why do we need to split databases and tables? Can't MySQL process large tables?
In my project, the physical file size of a single table is more than 80 GB, and the number of records in a single table is more than 0.5 billion.
It is a very nuclear table: Friend relationship table.
However, this method is not the best method, because there are also many problems in file systems such as the Ext3 file system for processing larger files.
At this level, the xfs file system can be used for replacement. However, when a single MySQL table is too large, one problem cannot be solved: The operation base related to table structure adjustment.
This is not possible. Therefore, the application of database/table sharding is monitored for all major items in use.
From the perspective of Innodb itself, there are only two locks on the Btree of the data file, namely the leaf node lock and the sub-node lock. You can know when page splitting or adding
When a leaf is added, data cannot be written to the table.
Therefore, database/table sharding is a good choice.
How many database/table shards are suitable?
According to the test, there are 10 million records in a single table, and the write and read performance is good. In this way, when the buffer is reserved, all data fonts in a single table are maintained in
The number of records is less than 8 million, and the number of tables with orders is less than 5 million.
If you plan based on 100 database and 100 table, such as user business:
5 million * 100*100 = 500000 million = 500 billion records.
There is a number in mind, and it is easier to plan by business.