Text: Table design and optimization (true technical text)
Optimize ①: Create normalized tables to eliminate data redundancy
Database paradigm is a database design method that ensures reasonable database structure, satisfies all kinds of query needs and avoids abnormal database operation. To meet the formal requirements of the table, called the normalization table, the paradigm was generated in the early 1970s, the general table design to meet the first three paradigms can be, here briefly introduce the first three paradigms.
The popular explanation (perhaps not the most scientific, the most accurate understanding)
The first paradigm: the atomic constraint of the attribute (field), which requires that the attribute be atomic and non-divisible;
Second paradigm: A unique constraint on a record that requires a unique identifier for a record, and each record needs to have an attribute to be the unique identity of the entity.
The Third Paradigm: attribute (field) redundancy constraints, that is, any field can not be derived from other fields, in the popular point is: The primary key is not directly related to the data column must be eliminated (the way to eliminate is to create a table to hold them, of course, except the foreign key)
Of course, in fact we often break the third paradigm ... And inevitably, it is to find the right balance between data redundancy and processing speed.
Optimize ②: Appropriate field properties
Let me give you an example:
I've done it before. In the peer-to project, the selection of a field for the type of money flow. Originally the type of capital flow is so more than 10 kinds, basically fixed dead, then we can choose tinyint (4) is completely enough, corresponding to the Java byte. (To be sure, the length of the tinyint is 8 bits, tinyint (1) and tinyint (4) just show the length)
Here are some suggestions for the following fields:
0) The comparison of numeric fields is much more efficient than strings, so the field type uses the smallest and simplest data types possible. such as the IP address can use the int type, as I above example.
1) It is not recommended to use double, not just the length of the storage problem, but also the problem of accuracy.
2) for the storage of integers, in the case of large amounts of data, it is recommended to distinguish the choice of open tinyint/int/bigint (of course, it is very old thing, and now actually not poor this performance)
3) char is a fixed length, so it is handled much faster than varchar, but the drawback is that it wastes storage space and cannot save spaces at the end of a line. In MySQL, MyISAM recommends using a fixed length instead of a variable-length column; InnoDB suggests a varchar type because in InnoDB, the internal row storage format does not differentiate between fixed-length and variable-length.
4) Try not to allow NULL, unless necessary, you can use not null+default instead.
5) text differs from BLOB: BLOB holds binary data, text holds character data, character set. Text and blobs cannot have default values.
The actual scene: Text is the main difference between the Blob is text used to hold character data (such as articles, diaries, etc.), blobs are used to save binary data (such as photos, etc.). Blob and text have a performance problem (resulting in a lot of "holes") when performing a large number of deletions, and it is recommended to periodically optimize table to defragment these tables for improved performance.
6) Self-increment field should be used with caution, not conducive to data migration
7) Strongly oppose the storage of LOB type data in the database, although the database provides such a function, but this is not what he is good at, we should let the right tools to do what he is good at, in order to reach the extreme. (I have encountered LOB type data anyway)
8) As far as possible, the table field is defined as a NOT NULL constraint, because the column containing null values in MySQL is difficult to query optimization, the null value makes the index and the index of the statistics is very complex, you can use 0 or an empty string instead.
9) Use the timestamp type as much as possible, because its storage requires only half of the DATETIME type, and only it can correspond to the actual time zone in the date type. For data types that only need to be accurate to one day, it is recommended to use the date type because his storage space requires only 3 bytes, less than timestamp. (Really technical text, welcome to add)
Optimize ③: Index
Index is an important indicator of table optimization, in the table optimization occupies a very important component, so the previous index optimization is not seen can look first, here no longer redundant.
Optimized ④: Table split (large table split table)
1, vertical split (in fact, is the column split the original table with a lot of columns split into more than one table)
Note: The vertical split should be performed at the beginning of the data table design, and then query the time with Jion key up can be;
Usually we split vertically according to the following principles:
Place infrequently used fields in a single table;
The Text,blob and other large characters are split up and placed in the schedule;
The columns of the frequently combined query are placed in a table;
The disadvantage is also obvious, requiring the use of redundant fields and the need for join operations.
2. Split horizontally (if you find that there are too many records for a table, such as more than 10 million, you want to split the table horizontally.) Horizontal segmentation is done by dividing the table's records horizontally into two tables, with a value of the table's primary key as the line. ）
Of course, we can also use incremental method. such as flowing water such as the data will not change, we use incremental query.
1. Create a daily recharge chart to record the total amount of daily recharge
2. Settle the current top-up record with a timer every day
3. Create monthly recharge list, calculate the total with timer on the last day of each month
4. To query the total, then from the monthly report summary, and then from the Daily Report query day before the data summary, plus today's use of the day water table records today's water, three tables added up, summary. It's a great way to be efficient!
Optimizing ⑤: The legendary ' three-little principle '
①: The fewer tables in the database the better
②: The fewer fields in the table the better
③: A combination of primary keys and fewer composite indexes in a field is as good as possible
Of course, the less is relative, is the reduction of data redundancy of the important design concept.
"mysql optimization topic" Optimization of the road advanced stage-table design and optimization (6)