Database optimization includes the following three parts: Database optimization, database table optimization, and program operation optimization. This article is the second part.
Optimization ①: design a standardized table to eliminate data redundancy
The database paradigm is a database design method that ensures that the database structure is reasonable, meets various Query Needs, and avoids database operation exceptions. A table that meets the requirements of the paradigm is called a canonicalized table. The paradigm was developed in the early 1970s S. Generally, the table design can meet the requirements of the first three paradigms. Here we will briefly introduce the first three paradigms.
Let's take a look at the definition provided by Baidu Encyclopedia:
No repeated column in the first paradigm (1NF)
The first paradigm (1NF) refers to a standard requirement for adding domains in a relational model. All domains should be atomic, that is, each column in the database table is an atomic data item that cannot be a set, array, record, or other non-atomic data item.
2NF attribute
On the basis of 1NF, the non-code attribute must be fully dependent on the Code [remove some function dependencies of the non-main attribute on the primary Code on the basis of 1NF]
3NF attributes
Based on 1NF, any non-primary attribute does not depend on other non-primary attributes [eliminate the transfer dependency on 2NF]
I would like to explain it to you (probably not the most scientific and accurate)
Paradigm 1: attributes (fields) are atomic and cannot be separated;
Second paradigm: the uniqueness constraint of a record. A record must have a unique identifier. Each record must have an attribute as the unique identifier of an object.
Third paradigm: attribute (field) redundancy constraints, that is, any field cannot be derived from other fields. In general, it is: data columns with no direct relationship to the primary key must be eliminated (the elimination method is to create another table to store them, except for Foreign keys)
If the database design is fully standardized, when all the tables are connected by keywords, no data repetition will appear ). The advantage of standardization is obvious. It avoids data redundancy and naturally saves space. It also provides a fundamental guarantee for data consistency, this eliminates data inconsistency and increases efficiency.
Optimization ②: Appropriate redundancy to add computing Columns
The practical principle of database design is to find a proper balance between data redundancy and processing speed.
A table that satisfies the paradigm must be a standardized table, but not necessarily the best design. In many cases, in order to improve the efficiency of database operation, we often need to lower the paradigm standard: appropriately add redundancy to achieve the purpose of changing the space for time. For example, we have a table, product name, unit price, inventory, and total value. This table does not meet the third paradigm, because "total value" can be obtained by "unit price" multiplied by "quantity", indicating that "amount" is a redundant field. However, adding the redundant field "total value" can increase the speed of query statistics. This is the practice of changing the space for time. Reasonable redundancy can distribute the concurrent pressure on tables with a large amount of data and speed up special queries. Redundant fields can effectively reduce the connection between database tables and improve efficiency.
The "total value" is a computing column. There are two types in the database: Data column and computing column. The data column is the column that needs to be manually assigned or assigned by the program, the calculated column is calculated from other data in the table, for example, the "Total Value" here"
Create a computing column in SQL:
Copy codeThe Code is as follows:
Create table table1
(
Number decimal (18,4 ),
Price money,
Amount as number * price -- here is the calculation Column
)
You can also manually add or modify the column attributes in the table design, as shown in figure
Persistence or not, we also need to note:
If it is 'no', it indicates that this column is a virtual column, which is calculated once each time a query is performed, and it cannot be used for check, foreign key or not null constraints.
If it is a true column, it does not need to be calculated every time. You can create an index on this column.
Optimization ③: Index
Index is an important indicator for table optimization and plays an extremely important role in Table optimization. Therefore, we will write a separate chapter "SQL index in one step" to show you how to create and optimize indexes.
Optimization ④: necessity of primary key and foreign key
The Design of primary keys and Foreign keys plays an important role in the design of global databases. Because: The primary key is the abstraction of objects. The pairing between the primary key and the foreign key represents the connection between objects.
Primary Key: According to the second paradigm, a field is required to identify this record. The primary key is undoubtedly the best identifier, but many tables do not necessarily need a primary key, but for a large amount of data, to query frequently-accessed database tables, you must have a primary key. The primary key can increase efficiency and prevent duplication.
The selection of primary keys is also important. Generally, the selection of keys with a small length is faster, and a small key can reduce the B-tree hierarchy of the primary key.
When selecting a primary key, pay attention to the field order of the primary key combination. For the primary key combination, the performance of the primary keys in different field order may vary greatly, generally, you should select a field with a low repetition rate, a single field, or a combination of query possibilities.
Foreign key: The foreign key is used as the database object. Many people think it is troublesome. In fact, foreign keys are useful in most cases for the following reasons: foreign keys are the most efficient way to maintain consistency.
Database Consistency requirements can be followed by foreign keys, CHECK constraints, rule constraints, triggers, and client programs. Generally, it is considered that the method closer to data is more efficient.
Exercise caution when using cascade deletion and cascade update. Cascade deletion and cascade update are the new functions of SQL server 2000 in the current year. They are reserved in SQL SERVER 2005 and should be available. I am cautious about this because cascading deletion and cascading update break through the traditional definition of foreign keys, and the features are a little too powerful, before use, you must determine that you have mastered the function scope. Otherwise, cascading deletion and cascading update may cause your data to be inexplicably modified or lost. In terms of performance, cascading deletion and cascading update are more efficient than other methods.
Optimization ⑤: proper use of stored procedures, views, and functions
Many people are used to placing complex operations on the application layer, but if you want to optimize data access performance, port SQL code to the database (using stored procedures, views, functions, and triggers) this is also a major reason for improvement:
1. the storage process reduces the workload of network transmission, processing, and storage. After compilation and optimization, the execution speed is fast and easy to maintain. When the table structure is changed, the client applications are not affected.
2. Using Stored Procedures, views, and functions helps reduce the disadvantages of SQL replication in applications, because SQL is only processed in one place.
3. Using database objects to implement all TSQL helps to analyze TSQL performance problems, and helps you to centrally manage TSQL code and better reconstruct TSQL code.
Optimization 6: The Legend of the Three Little principles'
①: The fewer tables in the database, the better.
②: The fewer fields in the table, the better.
③: The fewer Primary keys and composite indexes in fields, the better.
Of course, this is relatively rare and is an important design concept for reducing data redundancy.
Optimize 7: split your table and reduce the table size
If you find that a table contains too many records, for example, more than 10 million records, you must perform horizontal segmentation on the table. The horizontal split method uses a value of the primary key of the table as the boundary to horizontally split the records of the table into two tables.
If you find that there are too many fields in a table, for example, more than 80 fields, split the table vertically and split the original table into two tables.
Optimized Layout: Field Design Principles
Fields are the most basic unit of the database, and their design has a great impact on performance. Note the following:
A. Use the numeric type as much as possible for the data type. The numeric type is much faster than the numeric type.
B. The data type should be as small as possible. The smaller here refers to meeting the foreseeable future needs.
C. Do NOT allow NULL unless necessary. Use not null + DEFAULT instead.
D. Use less TEXT and IMAGE. The reading and writing of binary fields is slow, and there are not many reading methods. It is best not to use them in most cases.
E. Use auto-increment fields with caution, which is not conducive to data migration