Tens of millions of mysql Data Tables, several suggestions for creating tables and Field Extensions, and several mysql
I. Overview
When designing a system, we need to consider the amount of data in the table after the system runs for a period of time, when the data volume of a certain table is very large (such as a chat message table), we need to create the table. This article will create a table, add data, and expand fields, the following suggestions are provided.
Ii. Create a table
If we need to create a chat message table in the IM project, which has a large data volume and far exceeds the write operation, we all know that mysql's common database engines mainly include innodb, myisam, the main difference between the two database engines is that innodb supports transactions and Foreign keys, and locks are row-level locks (Row-level locks only apply to primary keys, and non-primary keys also lock the entire table ), myisam does not support transactions, does not support foreign key constraints, and locks are table-level locks. From the perspective of performance, myisam is better than innodb, so in the database engine, I choose myisam, in addition, add an index to the message sending user ID and message receiving user ID.
1. Data Type Selection
Considering that the data volume is very large, when selecting the field data type, do not use strings if you can use numbers. Of course, the time type should also be replaced by bigint. text is not recommended, we recommend that you create a default value for the varchar field, for example, default '', because the use of where is null is a full table scan, and the default value must be added for the numeric type, such as num int default 0, if the default value is not added and the insert statement is executed, no value is assigned to this field. When update xxx set num = num + 1 is executed, no SQL error is reported, then the num value is not updated, and the index must be added to the field used as the condition query.
2: Table Partitioning
In the face of big data, in addition to the data type and performance, we can also use Table Partitioning. Table sharding and database sharding are not currently available. Table Partitioning Concept
2.1 Table Partitioning Concept
Range partition: multiple rows are allocated to the partition based on the column values in a given continuous interval.
List partitions: similar to range partitions, the difference is that list partitions are selected based on column values matching a value in a discrete value set.
Hash partition: select a partition based on the return value of a user-defined expression. This expression uses the column values of the rows to be inserted into the table for calculation.
KEY partitioning: similar to HASH partitioning, the difference is that KEY partitioning only supports computing one or more columns, and the MySQL server provides its own HASH function. One or more columns must contain> integer values.
You can use show variables like '% partition %'; to determine the partition type supported by mysql.
Now I use range partition. The partition field is pk. The complete SQL statement is as follows:
CREATE TABLE chatmsg(cid bigint primary key,cMsgSendUserId bigint,cMsgReceiverUserId bigint,cTime bigint,cContent varchar(2000) not null default '',cExt varchar(5000)) ENGINE=MYISAM DEFAULT CHARSET=utf8 COLLATE=utf8_binPARTITION BY RANGE (cid) (PARTITION p0 VALUES LESS THAN (1000000),PARTITION p1 VALUES LESS THAN (5000000),PARTITION p2 VALUES LESS THAN (1000000),PARTITION p3 VALUES LESS THAN MAXVALUE) ;create index senduserid_index on chatmsg(cMsgSendUserId);create index receiverid_index on chatmsg(cMsgReceiverUserId);create index ctime_index on chatmsg(ctime);
3. Add chat records.
The table creation statement shows that foreign keys are not used. Therefore, you need to manually check the integrity of foreign key constraints.
Select count (1) from user where uid = message sender idunion all select count (1) from user where uid = message receiver id
When the returned result of the preceding statement is equal to 2, The add statement can be executed. For more information about optimizing query statements, see my article: how to optimize millions of data records
Iv. Extended Fields
If the table has generated tens of millions of data records, the product manager said, Mr. Wang, the chat record needs to add a read or unread status. If you want to use the environment to alter tableadd column, it can be imagined that this operation is time-consuming. It is possible that the database may crash directly. If the data volume is too large, the database for the alter tableadd column operation has actually crashed. It's not an armist. I still remember that when I was creating a table, we have created a cExt field, which records a json string. In fact, we need to add a version number for the correct method. I didn't add a version number here. The data in the table is as follows:
select cid,cTime,cContent,cext from chatmsg where cMsgSendUserId = 100 and cMsgReceiverUserId = 200union ALLselect cid,cTime,cContent,cext from chatmsg where cMsgSendUserId = 200 and cMsgReceiverUserId = 100
This method can solve most of the extended fields. After cext is queried, convert the value to an object. If the newly added field needs to appear in the where clause, it needs to be analyzed according to the actual situation.
Advantages of cext extended fields:
(1) attributes can be dynamically expanded at any time
(2) the old and new types of data can exist at the same time.
(3) easy data migration: Write a small program to change the old ext version to the new ext version, and modify the version
Insufficient cext extension fields:
(1) fields in cext cannot be indexed.
(2) the key values in cext have a large number of redundancy. We recommend that the key values be shorter.
V. Others
For example, in the early stage of the project, the product manager said, "Mr. Smith, I select any two users. to query the chat records of these two users, I need to return the nicknames of these two users. The product manager Selects two users, we get the IDs of these two users. If we directly join the chat table to the user table, the performance is equally poor. In this case, we can consider using space for time, for example, create a nickname for the receiver and the sender in the chat table. This method means that join is not recommended for big data tables, and the performance is not good. You should use other methods to solve this problem. Of course, in the formal project, the specific situation also needs to be analyzed.
I will also add some ideas. If there is something wrong with the description, I hope to point it out. Thank you. You are welcome to express your thoughts and make common progress.