Zhimeng Database Design Rule document

Source: Internet
Author: User
Tags field table
Field table and field naming

Table field naming rules must be consistent with the names of variables mentioned above. content-related models should adopt the plural form, for example, dede_archives and dede_addonimages, which store fields of multiple items, the field that represents the number should also be named in the plural form, such as keywords and scores ).

If the table has associated fields, make sure that the names are consistent, such as typeid in dede_arctype and typeid in dede_archives.

Fields that represent ID self-increment are usually used in the following ways:

1. The most common core ID, or frequently called in the URL, should be abbreviated as much as possible, such as tid, PID, uid;

2. It has a functional function. The ID occasionally used in the URL is in the full name format, such as mid;

3. There is no functional function, and only the ID is set for management and maintenance convenience. You can use the full name or name it only as ID;

For all names related to tables and fields, please refer to the existing dedecms field naming method in large quantities to ensure the systematic and unified naming.

Field Structure

A field that allows null values. When the database performs a comparison operation, it first determines whether it is null. If it is not null, the database performs a value pair. Therefore, based on efficiency, all fields cannot be blank, that is, all not null;

It is not expected to store non-negative fields, such as the ID and number of articles. It must be set to the unsigned type. The unsigned type is twice the range of positive integers that can be stored by non-unsigned types. Therefore, a larger numerical storage space can be obtained;

Fields that store switch and option data usually use the tinyint (1) Non-Unsigned type. In rare cases, the enum () result set method may be used. When tinyint is used as the switch field, usually 1 is open; 0 is closed;-1 is special data, such as N/A (unavailable ); a value greater than 1 is a special result or a binary number combination of switches (For details, refer to the Code in the program );

For memory/heap tables, pay special attention to plan to save storage space, which will save more memory. For example, to split the storage of IP addresses into four fields of the tinyint (3) unsigned type, the char (15) method is not used;

For any type of data tables, the field space should be sufficient and free of waste. The value range of the field of the value type is shown in the following table:

Field Type Storage space (B) Unsigned Value Range
Tinyint 1 No -128 ~ 127
    Yes 0 ~ 255
Smallint 2 No -32768 ~ 32767
    Yes 0 ~ 65535
Mediumint 3 No -8388608 ~ 8388607
    Yes 0 ~ 255
Int 4 No -2147483648 ~ 2147483647
    Yes 0 ~ 4294967295
Bigint 8 No -9223372036854775808 ~ 9223372036854775807
    Yes 0 ~ 18446744073709551615
SQL statement

In all SQL statements, except the table name and field name, all statements and functions must be written in uppercase. For example, select * From cdb_members; is a non-compliant statement.

A long SQL statement should have an appropriate line break, which is defined based on keywords such as join, from, and order.

In general, when performing operations on multiple tables, you must specify a 1 ~ The abbreviation of two letters to facilitate concise and readable statements.

The following statement example is compliant with the specifications:

SELECT a.*, c.* FROM #@__arvhives a, #@__addonarticle c WHERE c.aid=a.id AND a.id='$aid'
Performance and efficiency fixed length and variable length table

A data table that contains variable-length fields such as varchar and text is a variable-length table, and vice versa.

1. For a variable-length table, because the record size is different, many deletions and changes on it will cause more fragments in the table. Optimize table needs to be run regularly to maintain performance. The fixed-length table does not have this problem;

2. If the table contains variable-length fields, converting them into fixed-length fields can improve the performance because fixed-length records are easy to process. However, before trying to do so, consider the following:

3. Using a fixed-length column involves some compromise. They are faster, but occupy more space. Each value of a char (n) type column must take up to n Bytes (even if it is an empty string), because when stored in the table, the length of the value is insufficient and spaces will be filled on the right;

4. varchar (n) columns occupy less space, because only the space required to store each value is allocated to them, and each value plus one byte is used to record its length. Therefore, if you select the char and varchar types, you need to make a compromise between time and space;

5. The conversion from a variable-length table to a fixed-length table cannot only convert one variable-length field. All of them must be converted. In addition, you must use an alter table statement to convert all data at the same time. Otherwise, the conversion will not take effect;

6. Sometimes the fixed length type cannot be used, even if you want to do so. For example, for a string longer than 255 characters, there is no fixed length type;

7. When designing the table structure, if you can use the fixed-length data type, try to use the fixed-length data type, because the query, retrieval, and update speed of the fixed-length table is very fast. If necessary, you can split some key tables that are frequently accessed. For example, a table with fixed-length data and a table with non-fixed-length data. For example, dededecms's dede_archives and dede_addonarticle tables, dede_member and dede_member_person tables. Therefore, global consideration is required when planning the data structure;

During table structure design, the data storage system should be optimized and optimized.

Computing and Retrieval

Numeric operations are generally faster than string operations. For example, you can compare the logarithm of a single operation. The string operation involves several byte-by-byte comparisons. If the string is longer, there will be more such comparisons.

If the number of values in a string column is limited, the general integer or emum type should be used to obtain the superiority of numerical calculation.

Smaller field types are always much faster to process than larger field types. The processing time of a string is directly related to the length of the string. Generally, smaller tables process faster. For a fixed-length table, you should select the smallest type, as long as you can store the value of the required range. For example, if mediumint is enough, do not select bigint. The variable length type can still save space. A text value records the length of a value in 2 bytes, while a longtext value records the length of a value in 4 bytes. If the length of the stored value never exceeds 64 KB, using text will save each value 2 bytes.

Structure Optimization and INDEX OPTIMIZATION

Indexes can accelerate the query speed, while index optimization and query optimization complement each other. indexes can be optimized based on queries or existing indexes, this depends on the query or index modification, which has the least impact on the existing product architecture and efficiency.

Index optimization and query optimization are the result of years of experience. We cannot elaborate on them here, but we still provide several basic principles.

First, find out which SQL statements are most commonly executed based on the actual operation and access status of the product. The concepts that are most often executed and most often present in programs are completely different. The most commonly executed SQL statements can be divided into large tables (with many data entries) and small tables (with few data entries. Large tables or small tables can be divided into multiple operations, including multiple select operations, multiple write operations, or multiple read/write operations.

For SQL statements that are commonly executed, pay special attention to the following for large table operations:

1. when there are many write operations, you can usually use the write cache method to first cache the data to be written or to be updated to a file or other tables, and regularly perform batch write operations on large tables. For example: click delay update mechanism is implemented based on this principle. At the same time, we should try to make the large tables that are frequently read and written into a fixed-length type, even if the large tables in the original structure are not fixed-length. You can change the data storage structure and data reading mode to split a large table into a variable-length table with multiple reads and writes and a variable-length table with multiple reads and writes;

2. If there are many read operations, you need to set indexes and joint indexes for high-frequency SQL statements based on the SQL query frequency.

Secondly, check whether the SQL statement conditions and sorting fields are dynamic (that is, depending on the switch or attribute of different functions, the SQL query conditions and sorting fields vary greatly ), SQL statements that are too dynamic cannot be optimized through indexes. The only method is to cache the data and update it regularly, which is suitable for scenarios where the results do not have high requirements on effectiveness.

MySQL indexes are commonly used, including primary key, index, and unique. For more information, see the MySQL documentation. In general, if the data values of a single table are not repeated, the primary key and unique indexes are faster than the index. Please use them as appropriate.

In fact, the index consumes read operation resources for conditional query and sorting and is distributed to write operations. The more indexes, the larger the disk space, and the slower the write operation. Therefore, you must not add indexes blindly. The most fundamental starting point for field indexing is the probability of SQL statement execution, table size, and frequent write operations.

Query Optimization

MySQL does not provide the optimization function for query conditions. Therefore, developers need to manually optimize the query conditions in sequence in the program. For example, the following SQL statement:

SELECT * FROM table WHERE a>’0’ AND b<’1’ ORDER BY c LIMIT 10;

In fact, no matter which condition A> '0' or B <'1' is in front of, the results are the same, but the query speed is quite different, especially when performing operations on large tables.

Developers need to keep this principle in mind: the first condition that appears must be the condition that filters out and removes more results; the second condition that appears, and so on. Therefore, the distribution of values of different fields in the table has a great impact on the query speed. The condition in order by is only related to the index and is irrelevant to the condition order.

In addition to conditional order optimization, You can optimize the index structure for fixed or relatively fixed SQL query statements to achieve a high query speed. The principle is: In most cases, the joint index established based on the order of the where condition and the order by sorting field is the optimal index structure matching this SQL statement. Although, in fact, the product cannot only consider one SQL statement, but also cannot create too many indexes because of space occupation.

The preceding SQL statement is used as an example. When the number of records in the Table reaches millions or even tens of millions, we can see that the speed of index optimization is improved.

Based on the two principles of the above condition optimization and index optimization, when the values of the table are as follows, the optimal condition sequence scheme can be obtained:

Field Field B Field C
1 7 11
2 8 10
3 9 13
-1 0 12
Optimal condition:B <'1' and a> '0'
Optimal Index:
Index ABC (B, A, C)Cause:B <'1' can be used as the first piece to filter out the 75% results first. If a> '0' is used as the first piece, only 25% of the results can be filtered out first.Note 1:Field C does not appear in the condition, so the condition sequence optimization is irrelevant to it.
NOTE 2:The Optimal Index is obtained by the optimal condition sequence, not by the SQL statement in the example.NOTE 3:The index is not a virtual pointer that modifies the physical sequence of data storage, but by the physical data corresponding to a specific offset.

The explain statement is a simple method to check whether the index and query can be well matched. Run the explain + query statement in phpMyAdmin or other MySQL clients, for example, explain select * from table where a> '0' and B <'1' order by C, even if developers do not need to simulate millions of data records, they can verify that the index is reasonable. For more information, see MySQL.

It is worth noting that using filesort is the least suitable case. If this result is obtained by explaining, the database creates a temporary table file for this query to cache the results, and delete it after the query is complete. As we all know, hard disk I/O speed is always the bottleneck of computer storage. Therefore, you should do your best to avoid using filesort for SQL statements with high execution frequency. Even though, developers can never ensure that all SQL statements in the product do not use filesort.

This document does not cover all aspects of database optimization, such as the reusability of joint indexes and common indexes, the index design of join connections, and memory/heap tables. Database optimization is, in fact, constantly balancing and modifying between many factors and advantages and disadvantages. The experience can only be obtained through repeated scrutiny in the experience of success and failure. This experience is often the most valuable and valuable.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.