MySQL Sub-Library sub-table scheme
1. Why Divide the table:
When the data of a table reaches tens of millions of, the time you spend querying is much more, and if you have a joint query, I think I might die there. The purpose of the sub-table is to reduce the burden on the database and shorten the query time.
one mechanism in MySQL is table locking and row locking to ensure the integrity of the data. Table locking means that you can't operate on this table, and you have to wait until I finish working on the table. Row locking is the same, other SQL must wait until I'm done with this data before I can manipulate this piece of data.
2. mysql Proxy:amoeba
Do MySQL cluster, using Amoeba.
From the upper Java program, you do not need to know the primary server and the source from the server, that is, the master-slave database server is transparent to the upper level. Can be configured via Amoeba .
3. Large data volumes and frequently accessed tables, divided into several tables
For example, for a website Platform database table - company table, the amount of data is very large, this can be estimated by the large data scale, we will first separate N table, this n is how much, according to the actual situation.
A website now the amount of data is at most of the million, you can design each table to accommodate the amount of data is a number of , that is, split into ten sheets,
So how can you tell if a table's data is full? Can be in the program section for the table to add data, before inserting the number of statistical records of the operation, when <500 data, directly inserted, when the threshold has been reached, you can create a new database table in the program segment (or have been created beforehand), and then perform the insert operation.
4. Using the merge storage engine to implement the sub-table
If you want to separate the existing big data scale is painful, the most painful thing is to change the code, because the SQL statement inside the program has been written. Using the merge storage engine to implement the sub-table, this method is more suitable.
Example:
----------------------------- Gorgeous split-line --------------------------------------
Database Schema
1, simple MySQL master-slave replication:
MySQL's master-slave replication solves the database read-write separation, and a good performance to improve read, the figure is as follows:
The process of its master-slave replication is as follows:
However, master-slave replication also brings a number of other performance bottlenecks:
1. Write cannot be extended
2. Write cannot be cached
3. Replication delay
4. The lock example rises
5. Table becomes larger, cache rate drops
The problem has to be solved, which results in the following optimization scheme, together to see.
2. mysql vertical partition
If the business is cut enough to be independent, it will be a good idea to put different business data into different database servers, and in case one of the services crashes, it will not affect the normal operation of other business, and also play a role of load shunt, greatly improving the throughput of the database. The database schema diagram after vertical partitioning is as follows:
However, although the business is already independent enough, but some of the business is more or less connected to each other, such as the user, basically will be associated with each business, and this partitioning method does not solve the problem of the single-sheet data explosion, so why not try to split horizontally?
3. mysql level shard (sharding)
This is a very good idea, the user according to a certain rule (by ID hash) group, and the group of users of the data stored in a database shard, that is, a sharding, so as the number of users increased, as long as the simple configuration of a server, the schematic is as follows:
How to determine the Shard of a user, you can build a user and shard corresponding data table, each request first from this table to find the user's Shard ID, and then from the corresponding shard query the relevant data, as shown in:
Library single Table
A library single table is the most common database design, for example, a user table is placed in database db, and all users can be found in the user table in the DB Library.
Library Multi-table
As the number of users increases, the amount of data in the user table becomes larger, and when the amount of data reaches a certain level, the query to the user table slowly slows down, affecting the performance of the entire DB. If you use MySQL, a more serious problem is that when you need to add a column, MySQL locks the table, and all read and write operations can wait.
The user can be sliced horizontally in some way, resulting in two tables of the exact same table structure as the user_0000,user_0001, user_0000 + user_0001 + ... Data is just a complete piece of data.
Multi-Library Multi-table
As the amount of data increases perhaps a single db of storage space is not enough, with the increase in query volume of a single database server has no way to support. The database can be differentiated horizontally at this time.
Sub-database table rules
When designing a table, you need to determine what rules the table will use to divide the database into tables. For example, when a new user is available, the program has to decide which table to add this user information to, so when we log in we have to find the corresponding record in the database through the user's account, all of which need to follow a certain rule.
Routing
The process of finding the corresponding tables and libraries by using the Sub-Library table rules. such as the sub-database of the rules are user_id mod 4, when the user registered a new account, account ID of 123, we can use the ID mod 4 way to determine that the account should be saved to the user_0003 table. When user 123 logs in, we are determined to record in user_0003 by 123 mod 4.
Problems arising from the sub-tables and the matters needing attention
1. The problem of the sub-database dimension
If the user buys the commodity, need to save the transaction record, if according to the latitude of the user table, each user's transactions are saved in the same table, so it is very convenient to find a user's purchase situation, but the purchase of a product is likely to be distributed in more than one table, find it more troublesome. Conversely, according to the commodity dimension of the table, can be very convenient to find the purchase of this item, but to find out the buyer's transaction record is more troublesome.
So the common solution is as follows:
A. This approach is largely impossible and inefficient by way of a sweep of the table.
B. Record two data, one according to the latitude of the user table, a copy according to the dimensions of the commodity table.
C. Through search engine resolution, but if the real-time requirements are very high, but also related to real-time search.
2. Problems with Federated queries
Federated queries are basically not possible because the associated tables may not be in the same database.
3. Avoid cross-Library transactions
Avoid modifying the tables in the db0 while modifying the tables in a transaction, one of which is more complex to operate and will have a certain effect on the efficiency of the DB1.
4. Try to put the same set of data on the same DB server
For example, seller A's goods and transaction information are placed in the db0, when the DB1 hangs, seller a related things can be used normally. This means that the data in the database is not dependent on the data in another database.
A master multi-standby
In practical applications, the vast majority of cases are read far beyond writing. MySQL provides a mechanism for read and write separation, all write operations must correspond to master, read operations can be performed on the master and slave machines, slave is identical to the structure of master, a master can have multiple slave, Even under the slave can hang slave, in this way can effectively improve the DB cluster of QPS.
All of the write operations are first on the master, and then update to the slave, so the synchronization from master to slave machine has a certain delay, when the system is very busy, the delay problem will be more serious, the increase in the number of slave machines will also make this problem more serious.
In addition, it can be seen that master is the bottleneck of the cluster, when too many write operations can seriously affect the stability of master, if master hangs, the entire cluster will not work properly.
So, 1. When reading the pressure is very large, you can consider adding slave machine fractional solution, but when the slave machine to achieve a certain amount of the sub-Library should be considered. 2. When writing pressure is very high, it is necessary to carry out the library operation.
---------------------------------------------
Why do I need to divide the table with MySQL
Can be used to say where the MySQL, as long as the amount of data a large, immediately encounter a problem, to be divided into the database table.
Why do you want to divide the table with a question? Can't mysql handle a big watch?
It is a large table that can be processed. I have experienced projects in which the single table physically file size is more than 80G, with a single table record number above 500 million, and this table
belongs to a very nuclear table: a Friend relationship table.
But this is not the best way to say it. There are also many problems with file systems such as the Ext3 file system being larger than large files.
This level can be replaced with the XFS file system. But MySQL single table too big after one problem is not good to solve: table structure adjustment related operation base
This is not possible. Therefore, the large items in use will be in the face of the application of sub-database sub-table.
From InnoDB itself to the data file btree on only two locks, leaf node lock and child node lock, you can want to know, when the occurrence of page splitting or adding
New leaves will cause the table to not write data.
So the sub-database table is a better choice.
So how much is the Sub-Library table appropriate?
After testing in a single table 10 million records, write read performance is relatively good. This leaves the buffer, then the single table is all the data font is kept in
8 million records below, a single table with character type remains below 5 million.
If you plan by 100 library 100 tables, such as user business:
5 million *100*100 = 500 billion = 500 billion record.
There is a number in mind, according to business planning or relatively easy.
MySQL Sub-Library sub-table scheme