Author: skate
Time: 2012/05/24
Description of Data Structure Design (table and index creation)
Table Design
1. Data Type Selection
1) number type: it can be divided into integer and floating-point data types. The difference between them is that the value range is different, and the storage space is also different.
Tinyint: a positive integer with a very small size of 1 byte, with the symbol:-128 ~ 127, without symbols: 0 ~ 255
Smallint: 2-byte small integer, with the symbol:-32768 ~ 32767, without symbols: 0 ~ 65535
Mediumint: A 3-byte integer of medium size, with the symbol-8388608 ~ 8388607, without symbols: 0 ~ 16777215
INT: 4-byte standard integer, with the symbol:-2147483648 ~ 2147483647, without symbols: 0 ~ 4294967295
Bigint: 8-byte big integer, with the symbol:-9223372036854775808 ~ 9233372036854775807, without a letter: 0 ~ 18446744073709551615
For databases with a large amount of data, we should differentiate tinyint, Int, and bigint (commonly used types) in detail. To save storage space and improve database processing efficiency,
Select the most suitable data column type based on the value range of application data.
Float (M, d): 4-byte Single-precision floating point number, minimum non-zero value: +-1.175494351e-38, maximum non-zero value: +-3.402823466e + 38
Double (M, d): 8-byte double-precision floating point number, minimum non-zero value: +-2.225074255072014e-308, maximum non-zero value: +-1.7976931348623157e + 308
Decimal (M, d): A floating point number expressed in string form in m + 2 bytes. Its value range is variable, determined by the values of M and D.
For a floating-point data column, the saved value is rounded to the decimal point defined by the column. However, the difference between the decimal type and the float type and the double type is: decimal type.
The value is stored as a string, and its decimal places are fixed. Its advantage is that it does not provide four homes like float and double data columns.
Five errors are produced, so it is suitable for financial computing. Its disadvantage is that because of its different storage formats, the CPU cannot directly perform operations on it, thus affecting
Computing efficiency. Decimal (M, d) occupies m + 2 bytes in total.
Try not to use double, not only the storage length, but also the accuracy. Similarly, it is not recommended to set decimal points with fixed precision.
To use decimal, we recommend that you multiply the number of fixed multiples and convert them to integer storage, which can greatly save storage space without any additional maintenance costs.
Attributes of numeric data columns:
Zerofill: applies to all numeric data column types. If the width of a numeric value is smaller than the defined display width, it is filled with 0 before the value.
Unsigned: a negative number is not allowed in the data column. This attribute is strongly required when determining the negative number of the column Ministry.
Auto_increment: the attribute can generate a unique sequence of numbers. Valid only for integer data columns.
Null and not null: Set whether the data column can be empty. When the application does not need null, we strongly recommend that you set the default value for the column.
Default: You can specify the default value for the data column.
Tinyint> smallint> mediumint> int> bigint> decimal (the storage space increases, but the performance decreases ).
2) character type
Binary (M), char [(m)]: M bytes
Varbinary (M), varchar [(m)]: M bytes L + 1 byte
Tinyblod, tinytext: 2 ^ 8-1 bytes L + 1 byte
Blob, text: 2 ^ 16-1 bytes L + 2
Mediumblob, mediumtext: 2 ^ 24-1 bytes L + 3
Longblob, longtext: 2 ^ 32-1 bytes L + 4
Enum ('value1 ', 'value2',...): 65535 members, one or two bytes.
Set ('value1 ', 'value2',...): 64 members 1, 2, 4, or 8 bytes
L + 1 and L + 2 indicate that the data column is of variable length. the space occupied by the column is changed according to the increase or decrease of the Data row. The total length of the Data row depends on
The length of the data values stored in these data columns. The extra bytes in L + 1 or L + 2 are used to save the length of the data value. In pair Length
When processing the changed data, MySQL needs to save both the data content and data length.
String-type values are stored as a continuous byte sequence and are treated as bytes or characters according to whether they hold binary strings or non-binary characters:
A binary string is considered as a continuous byte sequence and is irrelevant to the character set. MySQL treats BLOB Data columns and data in char and varchar data columns with binary attributes as binary values.
A non-binary string is considered as a continuous character sequence. It depends on the character set. MySQL treats the data in the text column and the char and varchar data columns without the Binary Attribute as binary values.
A non-binary string is a string that is generally called a string. It is compared and sorted by character in the order of character sets. While the binary string
Because it has nothing to do with the character set, it does not sort in character order, but uses the binary values of bytes as the basis for comparison and sorting.
The binary string has nothing to do with the character set. Therefore, the length of the binary string is the same for both Character Calculation and byte calculation. Therefore, varchar (20) does not mean that it can contain up to 20 characters, but rather it can only contain up to 20 characters. For a single-byte character set, each character occupies only one byte, so the length of the two is the same, but for a Multi-byte character set, it can hold less than 20 characters.
Char and varchar are the two most commonly used string types. The difference between them is:
A. Char is of a fixed length, and each value occupies the same byte. MySQL uses space characters on the right side of it to fill up the space.
B. varchar is a type of variable length. Each value occupies the maximum number of bytes and adds a byte used to record its length, that is, L + 1 bytes.
Here are two principles for how to select char and varchar:
A. if the data has the same length, varchar occupies more space because one of them is used to store the length. If the data length is different, varchar can save storage space. Char occupies the same space regardless of the length of characters, and even a null value is no exception.
B. if tables of the MyISAM or isam type are used, char is better than varchar, because tables of the MyISAM and isam types are more efficient in processing rows of a fixed length. After my tests, InnoDB has higher performance in processing char than varchar.
Blob and text
Blob is a binary string, and text is a non-binary string. Both can store large-capacity information and create prefix indexes. So delete
In addition to and modification operations, a large number of fragments are easily generated in the data table. You need to regularly run optimize table to reduce fragments and improve performance.
Enum and set
Both Enum and set are special string data column types, and their value ranges are a pre-defined list. Enum or set data column
The value can only be selected from this list. The main difference between Enum and set is that Enum can only take single values, and set can take multiple values; Enum and set values
It appears as a string, but internally, MySQL stores them as numerical values.
In general, when selecting a type for a character column, do not use the text data type as far as possible. The lob type should be resolutely put out, and its processing method should be determined.
Its performance is lower than that of char or varchar. For a fixed length field, we recommend that you use the char type. For a variable length field, try to use varchar,
Instead of arbitrarily giving a large limit to the maximum length, MySQL will also
There are different storage processing. For the Status field, you can try to use Enum for storage, because it can greatly reduce the storage space, and even if you need
Add a new type. As long as it is added to the end, you do not need to recreate the table data when modifying the structure. What if it is to store pre-defined attribute data? Yes
If you try to use the set type, you can easily save a lot of storage space even if multiple attributes exist.
3) Date and Time data column type
Date :( 1000-01-01 ~ 9999-12-31) 3 bytes (4 bytes before MySQL 3.23) 0000-00-00
Time: (-838: 59: 59 ~ 838: 59: 59) 3 bytes 00:00:00
Datetime: (1000-01-01 00:00:00 ~ 9999-12-31 23:59:59) 8 bytes 0000-00-00 00:00:00
Timestamp: (19700101000000 ~ At a certain time in 2037) 4-byte 00000000000000
Year: (Year (4): 1901 ~ 2155 year (2): 1970 ~ 2069) 1 byte 0000
The Time Value in datatime is different from the time value. The time value in datatime represents the time points, and the time value represents the time spent.
When interpolation is performed on the time data column, a complete time statement is required, for example, 12 minutes and 30 seconds ".
If a null value is inserted into the timestamp column, the data column is automatically set to the current date and time.
When creating and modifying data rows, if you do not explicitly assign values to the timestamp data column, the value is automatically set to the current date and time. If
There are multiple timestamp columns in the row. Only the first one will take the value automatically, and other timestamp columns will not change.
In general, try to use the timestamp type, because its storage space only needs half of the datetime type. For data that only needs to be accurate to a certain day
Type. We recommend that you use the date type because its storage space only needs 3 bytes, which is less than timestamp.
2. Proper splitting/proper Redundancy
A. When a large field similar to text or a large varchar type exists in our table, if most of us access this table
This field is not required, so we need to split it into another independent table to reduce the storage space occupied by common data. This way
One obvious advantage is that the number of data entries that can be stored in each data block can be greatly increased, which reduces the number of physical Io operations and greatly improves
Cache hit rate.
B. independent small fields that are frequently referenced and can only be obtained by joining two (or more) large tables. In this scenario, each join operation only
In order to obtain the value of a small field, the join record is large, resulting in a large number of unnecessary Io, which can be fully exchanged by space for time.
Optimized. However, while redundancy is required, ensure that data consistency is not damaged, and ensure that redundant fields are updated at the same time.
3. Try to use not null
The null type is special, making SQL difficult to optimize. Although there is a difference between the MySQL null type and the Oracle null type, it will enter the index, but if it is a group
If the index is used together, the null field will greatly affect the efficiency of the entire index. In addition, the processing of null in the index is special and will also occupy
Additional storage space.
Eg:
Mysql> Create Table child (
-> Child_id int unsigned not null,
-> Parent_id int unsigned,
-> Primary Key (child_id ),
-> Index (parent_id ));
Query OK, 0 rows affected (0.00 Sec)
Mysql> explain select * from child where parent_id = 42;
+ ---- + ------------- + ------- + ------ + --------------- + ----------- + --------- + ------- + ------ + -------------------------- +
| ID | select_type | table | type | possible_keys | key | key_len | ref | rows | extra |
+ ---- + ------------- + ------- + ------ + --------------- + ----------- + --------- + ------- + ------ + -------------------------- +
| 1 | simple | child | ref | parent_id | 5 | const | 1 | using where; Using index |
+ ---- + ------------- + ------- + ------ + --------------- + ----------- + --------- + ------- + ------ + -------------------------- +
1 row in SET (0.00 Sec)
Mysql>
The parent_id column is empty and the corresponding index key length is 5.
Recommended parent_id columns for MySQL Analysis
Mysql> select parent_id from child procedure analyse (5, 2000) \ G
* *************************** 1. row ***************************
Field_name: Test. Child. parent_id
Min_value: NULL
Max_value: NULL
Min_length: 0
Max_length: 0
Empties_or_zeros: 0
Nulls: 0
Avg_value_or_avg_length: 0.0
STD: 0.0
Optimal_fieldtype: Char (0) not null
1 row in SET (0.00 Sec)
Changing the parent_id column is not empty
Mysql> alter table child modify parent_id int unsigned not null;
Query OK, 0 rows affected (0.01 Sec)
Records: 0 duplicates: 0 Warnings: 0
Mysql> explain select * from child where parent_id = 42;
+ ---- + ------------- + ------- + ------ + --------------- + ----------- + --------- + ------- + ------ + ------------- +
| ID | select_type | table | type | possible_keys | key | key_len | ref | rows | extra |
+ ---- + ------------- + ------- + ------ + --------------- + ----------- + --------- + ------- + ------ + ------------- +
| 1 | simple | child | ref | parent_id | 4 | const | 1 | using index |
+ ---- + ------------- + ------- + ------ + --------------- + ----------- + --------- + ------- + ------ + ------------- +
1 row in SET (0.00 Sec)
At this time, we can see that the index key length of parent_id is 4, so null columns will occupy space in the index.
4. INDEX OPTIMIZATION
Indexing requires additional maintenance, access, and space costs. Therefore, you must be cautious when creating an index so that a single index can cover as many SQL statements as possible and the update frequency is compared.
High tables must control the number of indexes.
A. For a large and newer amount of data, the index maintenance cost will be very high. If there are few retrieval requirements and there is no high requirement on retrieval efficiency,
It is not recommended to create an index or minimize the number of indexes.
B. It is better to directly traverse faster data from a small amount of data to index retrieval, and it is not suitable for indexing.
C. Try to make the search conditions as many as possible in the index, and filter all the data through the index as possible. The back table only retrieves additional data fields.
D. The order of fields plays a vital role in the efficiency of the composite index. Fields with better filtering results must be more advanced.
E. When the amount of data to be read accounts for a large proportion of the total data volume or the index filtering effect is not very good, the use of the index is not necessarily better than the full table scan.
F. In actual use, only one index can be used for one data access. This must be noted during index creation.
Each condition in the WHERE clause has an index.
------ End -----