Preface
The database is the repository of the program and the most vulnerable part of the program. Because of its vulnerability and importance, it needs to be managed and optimized specially. In today's network era, database flexibility and efficiency are even more necessary. An efficient database can make the program run more efficiently and improve the program running efficiency. However, the database design often fails to achieve the desired results, so database optimization is particularly important. This series of articles discusses how to make database design more flexible and data retrieval and operations more efficient when considering the large data volume, the optimization methods involved are summarized based on the author's long-term development experience and other articles on Database optimization. If you have any objection, please note.
Database optimization is divided into many types. Because there are many SQL objects, you need to consider their impact on performance when optimizing the database, and the degree of optimization is not optimal, only better, these optimization schemes must be selected based on different environments, and may not always maintain high efficiency. For example, the amount of data in the developed system is small, in addition, the C/S system does not need to consider so many optimization problems at this time, so the focus can be on program friendliness, stability, robustness, and flexibility. First, the database optimization is considered from the design stage. The general content is summarized as follows.
Optimization at the design stage requires a considerable amount of experience, and a plan ahead is the highest level. Therefore, we must avoid many problems with foresight and make the database query and modification highly efficient during design.
I. Standardization
1. database standardization
When creating a database, we must first try to meet the requirements of the three paradigm, because the three paradigm is a guiding standard, in complex table relationships, we can consider using the paradigm to balance the relationships between tables. When designing the table structure, the three paradigms are sufficient. You can also consider adding redundant fields, sometimes increasing the retrieval efficiency.
1. paradigm: each field represents a unique object attribute, the most basic design specification, and the general fields will comply;
2. paradigm: Eliminate part of dependency;
Three paradigms: eliminate transmission dependencies.
If the table structure is completely designed in accordance with the three paradigm, it is sometimes very cumbersome. When a simple entity is converted into a table structure, the three paradigm can not be used, because the paradigm produces fewer columns and more tables, it is not conducive to retrieval and modification, So it depends on the actual situation.
2. Non-Standardization
Reasonable redundancy
When designing the table structure, you can consider proper redundancy. Good redundancy can greatly improve the table retrieval efficiency. Computing fields (such as total and maximum values) that are frequently used in design can be stored in database entities. In this case, you can add triggers to maintain data consistency (this is not recommended, trigger has a lot of uncertainty). In addition, if the designed table relationship produces many tables but needs to be merged during retrieval, you can consider adding duplicate columns to the database entity.
Table Segmentation
The redundant fields of a table can improve the efficiency of the table. When processing large data volumes, we also need to consider vertical and horizontal separation of the table.
(1) Vertical Split: divides an entity table into two tables. This separates frequently accessed data from less accessed data. Before partitioning, you must copy the primary key word for each table. Such tables are conducive to parallel processing and will generate tables with fewer columns.
(2) horizontal segmentation: divides an entity table into multiple groups (divides all rows into multiple groups ). This method applies to entity tables that contain large amounts of data. Historical records are often retained in applications, but are rarely used. In this case, the amount of frequently accessed data can be separated from the less accessed historical data. Or you can also consider clearing data backup to ensure data security.
Ii. generate physical
When generating a physical model, you also need to consider database optimization and database design based on the principle that there is no optimal but better. Below we will discuss the optimization of fields and indexes in the database.
1. Field (1) Use the numeric type as much as possible. The numeric type query and operation efficiency is higher than the string efficiency.
(2) When field types meet incremental requirements, You must select the smallest of the same storage type. For example, you can use smallint Fields instead of integer fields, in this way, the index field can be read faster and more data rows can be placed on one data page, thus reducing the I/O operation.
(3) The field does Not allow null values. It can be replaced by Not Null + Default.
(4) use the text and image types, or do not use them as much as possible, because binary reading and writing is slow and the reading method is single
(5) using auto-increment fields with caution is not conducive to data migration
2. Primary Key (1) selects the candidate key with the smallest number of fields in the combination. This reduces the number of fields to be queried and improves the query efficiency.
(2) If you have to use a primary key combination, put the lower repetition rate in the field query in front of the combination.
3. When using indexes, pay attention to some mistakes. The more indexes, the better, the better. indexes are not the same as primary keys. Pay attention to the following principles during use:
(1) determine which tables need to add indexes based on the data volume. If the data volume is small, only the primary key is required;
(2) determine which fields need to be indexed Based on the Usage frequency, and select fields that are often used as join conditions, filtering conditions, aggregate queries, and sorting fields as index candidate fields;
(3) combine frequently used fields to form a composite index. The field order of the composite index is the same as that of the primary key. You also need to put the most commonly used fields in front, put the fields with low repetition rate in front;
(4) do not add too many indexes to a table because indexes affect the speed of insertion and update;
(5) avoid updating clustered indexes as much as possible, because the order of clustered index data columns is the physical storage order of table records. Once the column value changes, the order of records in the entire table will be adjusted, it will consume a considerable amount of resources.
Conclusion
The design stage is the most important step in the database development program. After the design is completed, many errors can be avoided during later development. There are still many problems to consider, the above is summarized based on practical experience and articles, which will play a major role in designing access to large data volumes.