P: today, let's talk about the data slicing method.
S: Good
P: Sometimes we encounter a situation where the data volume is too large and the single point of capacity cannot be supported. In this case, we need to split the database.
S: Well, yes. Considering the hardware cost, it is impossible for us to achieve one-time database sharding. Generally, the database sharding is expanded with the increase of data volume.
P: Yes, so we need to consider the future expansion scheme when determining the database sharding scheme.
S: for example, we use USERID for database sharding:
There are two methods:
1) implement with DB
2) use the HASH algorithm
P: Briefly describe them one by one.
S: Implemented Using RDB:
Record the DBID corresponding to each USERID. at startup, the application puts all corresponding data in the mem cache,
When user data is accessed, obtain the DBID corresponding to the USERID and then access it.
P: whether an abstract table exists. For more information, see
S: Well.
Create table dbinfo (dbid int, ip varchar (20), port varchar (20) comment "DB info table"
Create table user_db (userid int, dbid int) comment "table corresponding to USER and DB ";
Create table user_data (userid int, userdata varchar (3000) comment "user data table";
P: How do I find the corresponding database when a user logs in?
S: So find it in MEMCACHE. If not, find it in the database: select db_id from user_db where userid =? ;
After DBID is found, the data source is also found;
P: when a new user comes in and has multiple databases, what rules are used for allocation?
S: The average distribution is relatively simple;
P: What should I do if I scale up? I used to have two databases. I added a new database. If the database is evenly allocated, it will not be cracked when the first two databases reach a certain level?
S: We can add a weight field to the dbinfo table. For a new database, the weight is set to a greater value. In this way, most new users will go to the new database. If the old database does not want to add new users, you can set the weight to 0;
P: if the number of new users in the old database is increased, but the number of user data is constantly increasing, resulting in a single point of support failure, what should I do?
S: There are two methods:
1) migrate user data to other databases and change the corresponding relationship in the user_db table;
Such migration can only be completed by applications one by one, and the Migration speed is slow;
2) copy and split at the database level;
Copy a database, and then you have two identical data copies;
Here, the application only needs to refresh the corresponding MEMCACHE while changing the corresponding DB; the speed is much faster;
P: Do you have another idea of slicing?
S: for the 2nd slicing ideas, the HASH algorithm is used at the DB level; the Migration and resizing are also fast;
P: Well.
S: if there is an application, the data volume of each USER is basically the same; there will be no special data volume of a USER; then we can consider the number of users; and through our evaluation, assume that 1024 servers can satisfy this application (I have never met such a large MYSQL application)
P: Well.
S: first, we can start only one MYSQL instance on each MACHINE;
S: We can directly build 1024 databases;
Db0001-db1024; at first, the number of users was small. We can put all the 1024 databases in the same MACHINE1;
P: What should I do if my data is big?
S: Split
When the number of users starts to grow, we can migrate some of the databases in MACHINE1 and put them in MACHINE2;
Here we also need to have a table corresponding to the HASH value and the connection pool;
DB_HASH value-MACHINE NUM
--------
100 1
400 2
600 3
1024 4
After the user comes in, HASH the database according to USER_ID, for example, VALUE = 300, go to database 2: DB0300;
P: Oh, we can split 100-400 into 100-200,201-400, right.
S: Correct.
P: Until 1024 machines are split;
S: in this way, the number of users in DB0001-DB1024 is relatively uniform;
You only need to import some databases to the new MACHINE for expansion, which is convenient;
P: if we have already split the database into 1024 databases, the user data volume in a single database is extremely large and cannot be supported by a single point of time;
S: Well, this still exists. If there is such a large amount of data, we can only use level-2 HASH.
P: HASH to table level?
S: Yes.
DBHASH value TABHASH value MACHINE_NUM
------------
1 600 1
1 1024 10
400 2
600 3
1024 4
However, this slicing method does not support single-user migration;
P: Oh, I understand. Are there any other ideas?
S: I believe there must be better methods and ideas. Never forget to write down the good method when passing;