Database OptimizationArticleMany, but some write seem to be, some are impractical. For a database, they can only achieve better performance and cannot be the best. In addition, due to different actual needs, there are still some differences in the optimization solution, optimize databases based on the actual needs (speed, storage space, maintainability, and scalability). These aspects are often contradictory, the following is a summary of some opinions on the Internet and my own opinions.
The performance improvement of a system is not only the performance tuning in the trial run or maintenance phase, but also the development phase, but also the whole software life cycle. Therefore, I hope to summarize the precautions for database performance optimization based on the different stages of the software lifecycle.
I. analysis stage
In general, in the system analysis stage, there are often too many places to pay attention.Functionality, availability, reliability, and securityRequirements often attract most of our attention. However, we must note that,PerformanceIt is a very important non-functional requirement. It is necessary to determine its real-time requirement, response time requirement, and hardware configuration according to the characteristics of the system. It is better to have quantitative indicators of various needs.
On the other hand, in the analysis phase, the types of systems should be distinguished according to various requirements. In The Big aspect, the types should be OLTP (online transaction processing system) and OLAP (Online Analytical Processing System ).
Ii. Design Phase
The design stage can be said to be a key stage of system performance in the future. At this stage, there is a process related to almost all performance tuning in the future-database design.
InDatabase DesignAfter completion, you can perform a preliminaryIndex DesignGood index design can guide the coding stage to write highly efficientCodeTo lay a good foundation for the performance of the entire system.
Pay attention to the following performance requirements during the design phase:
1. Standardization of database Logic Design
The standardization of database logic design is what we generally call the paradigm. We can simply understand the paradigm as follows:
1st specification: there are no repeated groups or multi-value columns, which is the minimum requirement for database design.
2nd specification: each non-Keyword segment must depend on the primary keyword and cannot depend on some components of a combined primary keyword. Eliminate part of the dependency. In most cases, the database design should reach the second paradigm.
3rd specification: one non-Keyword segment cannot depend on another non-Keyword segment. To eliminate the transfer dependency, reaching the third paradigm should be the requirement of most tables in the system, unless there are tables with special functions.
The higher paradigm requirement is not introduced here. I personally think that if most of the requests reach the second paradigm, the system will generate fewer columns and more tables, this reduces data redundancy and improves performance.
2. reasonable redundancy
It is almost impossible to design a system completely according to the standardization. Unless the system is very small, it is necessary to add redundancy in a planned manner after the standardization design.
Redundancy can be a redundant database, a redundant table, or a redundant field. redundancy at different granularities can play different roles.
Redundancy can be increased for programming convenience or for performance improvement. From a performance perspective, redundant databases can distribute database pressure, while redundant tables can distribute the concurrent pressure on tables with a large amount of data and speed up special queries, redundant fields can effectively reduce database table connections and improve efficiency.
3. Primary Key Design
The primary key is necessary. The primary key of SQL Server is also a unique index. In practical applications, we often chooseThe smallest key combination is used as the primary keyTherefore, primary keys are often suitable for table clustered indexes. Clustered indexes have a great impact on queries. This is an index description below.
In a table with multiple keys, the selection of the primary key is also important. Generally, the selection of keys with a small total length is faster, at the same time, a small key can reduce the B-tree hierarchy of the primary key.
Primary key selection is also required.Note the field order of the primary key combinationFor composite primary keys, the performance of primary keys in different field order may vary greatly. Generally, you should select a field with a low repetition rate or a high possibility of separate or composite queries.
4. Foreign key design
Foreign keys are used as database objects, and many people think that they are troublesome. In fact, foreign keys are useful in most cases for the following reasons:
Foreign keys are the most efficient Consistency Maintenance Method. Database Consistency requirements can be followed by foreign keys, check constraints, rule constraints, triggers, and clients.ProgramGenerally, the method closer to the data is more efficient.
Exercise caution when using cascade deletion and cascade update. Cascade deletion and cascade update are the new functions of SQL Server 2000 in the current year. They are reserved in SQL Server 2005 and should be available. I am cautious about this because cascading deletion and cascading update break through the traditional definition of foreign keys, and the features are a little too powerful, before use, you must determine that you have mastered the function scope. Otherwise, cascading deletion and cascading update may cause your data to be inexplicably modified or lost. In terms of performance, cascading deletion and cascading update are more efficient than other methods.
5. Field Design
Fields are the most basic unit of the database, and their design has a great impact on performance. Note the following:
A. Use the numeric type as much as possible for the data type. The numeric type is much faster than the numeric type.
B. The data type should be as small as possible. The smaller here refers to meeting the foreseeable future needs.
C. Do not allow null unless necessary. Use not null + default instead.
D. Use less text and image. The reading and writing of binary fields is slow, and there are not many reading methods. It is best not to use them in most cases.
E. Use auto-increment fields with caution, which is not conducive to data migration.
6. Design of physical database storage and Environment
During the design stage, you can design the physical storage, operating system environment, and network environment of the database, in the future, our system will be able to adapt to a large number of concurrent users and a large amount of data.
Note thatFile GroupApplication file groups can effectively distribute I/O operations to different physical hard disks to improve concurrency.
7. System Design
The design of the entire system, especially the system structure design, has a great impact on the performance. For general OLTP systems, you can choose a C/S structure and a three-layer C/S structure, the performance of different system structures varies.
In the system design stage, some business logic should be summarized and implemented in database programming. database programming includes databaseStored Procedures, triggers, and functions. The advantage of using Database Programming to Implement business logic is to reduce network traffic and make full use of the database's pre-compilation and cache functions.
8. Index Design
In the design phase, you can perform a preliminary index design based on the functional and performance requirements. here you need to design indexes based on the expected data volume and query, which may be different from actual use in the future.
You should change your mind about the selection of indexes:
A. Determine which tables need to add indexes based on the data volume. If the data volume is small, only the primary key is required.
B. Determine which fields need to be indexed Based on the Usage frequency, and select fields that are frequently used as join conditions, filtering conditions, aggregate queries, and sorting fields as index candidate fields.
C. Combine frequently-used fields to form a composite index. The field order of the composite index is the same as that of the primary key. You also need to put the most commonly used fields in front, put the fields with low repetition rate in front.
D. Do not add too many indexes to a table because the index affects the insert and update speeds.
Bytes -------------------------------------------------------------------------------------------------------------------
I was useless when I first started using SQL Server.Show execution planTo analyze the query. I have always thought that all the SQL queries I submitted are optimal, while ignoring the query performance, so I did not pay enough attention to the "Execution Plan. At the early stage of my career, I was very happy to get data without considering how the data was returned. The "Execution Plan" worked on my queries. I thought that SQL server will handle query performance problems by itself. As a software engineer who has just entered the IT industry or has just learned new technologies, it is unlikely that he will have time to learn what he must know before writing the code. Maybe this is because the IT industry is too competitive.
As time passes, the database capacity gradually increases. One day, the customer was dissatisfied with the query performance of the application system. He came to me with anger and complained that because the query was too slow, he needed to spend more time dealing with official business. Initially, I suggest customers upgrade their system resources, for example, as a temporary solution to increase hard disk capacity. Although the hard disk price is very cheap now, the customer still asked me to provide a permanent solution, check and debug the query statement, to replace the temporary solution that never stops upgrading resources. Because customer satisfaction is very important to the IT industry, I have to consider his personal suggestions. I promised that he would definitely check and adjust my code.
How to get started?
When I first entered the IT industry, I knew that the foundation of SQL Server was only. To be honest, I don't have a clue when I promise the customer to check the system. However, I believe that I can use googl and Bol to obtain relevant information.
I have read some books about SQL Server, and I have searched online. So I learned the concept of "display Execution Plan. You can set the switch of this option to on in the query manager. "Display Execution Plan" is a graphical tool that helps developers and DBAs analyze and optimize queries to improve performance.
Different tasks in "show Execution Plan" have different icons. In this article, I am mainly interested in "Table scan", "index scan", "index seek", "Cluster Index scan", and "clustered index seek. You may introduce other tasks in the future.
Time begins to pass at the speed of the F1 equation, I think it is time for me to fully understand how "Table scan", "index scan", "index seek", "clustered index scan", and "clustered index seek" work.
I am going to start analyzing and optimizing my queries. Before the analysis, I came up with some questions.
- When does the MS-SQL server use "Table scan "?
- When does the MS-SQL server use index scan "?
- When does the MS-SQL server use index seek "?
- When does the MS-SQL server use "clustered index scan "?
- When does the MS-SQL server use "clustered index seek "?
I mainly focus on what SQL server uses to analyze and query the execution plan. After a while, I learned some related knowledge. This knowledge should be helpful for developers and new DBAs. So I decided to write this article and share my knowledge to help others understand the "Execution Plan ".
If you like it, you can read it slowly or simulate the experiment below on SQL Server.
Start
To explain "Table scan", "index scan", "index seek", "clustered index scan", and "clustered index seek" in "display Execution Plan", create a new table first, and add some sample data. The script for creating a new table is as follows:
Create Table performanceissue ( PRID uniqueidentifier not null, Prcode int not null, Prdesc varchar (100) not null ) On [primary] |
After creating a table, you need to add some data. Use the following script to add 100,000 records. The script may take a long time to run. Please wait for the script to be executed.
Declare @ loop int Declare @ PRID uniqueidentifier Declare @ prdesc varchar (100) Set @ loop = 1 Set @ prdesc ='' While @ loop <= 100000 Begin Set @ PRID = newid () Set @ prdesc = 'performanceissue-'+ convert (varchar (10), @ loop) Insert into performanceissue values (@ PRID, @ loop, @ prdesc) Set @ loop = @ loop + 1 End |
After the script is successfully executed, the data is added.
Use the following statement to view the table content:
Select PRID, prcode, prdesc From performanceissue Go |
Because the record is long, the query results are not listed here.
As I mentioned earlier, I want to explain when there will be "Table scan", "index scan", "index seek", "clustered index scan", and "clustered index seek ". Which of the above will improve the performance?
When SQL Server Returns data, we want to know which scanning mechanism SQL server uses to help obtain data. First, let's take a look at "Table scan ". We want to know when "Table scan" will be generated.
Select "show Execution Plan" or use the hot key "Alt + q" to activate "show Execution Plan". You can also use the shortcut key "Ctrl + K ".
Take a look at the "Execution Plan" result after executing the following query.
Select PRID, prcode, prdesc From performanceissue Go |
In the preceding "Execution Plan", SQL server uses "Table scan ". I asked myself why "Table scan" exists and what SQL server uses this method. Is it because I want to get all 100,000 records? So I thought from another angle. What if "Table scan" appears in the query? At this time, I am not very clear about the SQL Server scanning mechanism, so how to optimize the query? In the following SELECT query, only two columns are selected: [PRID, prcode].
Select PRID, prcode From performanceissue Go |
After the query is executed, the execution plan is the same as the first query. Therefore, the query is changed to retrieve only one field [PRID].
Select PRID From performanceissue Go |
After the query is executed, the execution plan is still the same as that of the first query. You do not need to pay too much attention to the "estimated row size" attribute. I immediately decided to get only one record to see how the execution plan would work. The query statement is as follows:
Select PRID, prcode, prdesc From performanceissue Where PRID = 'd2017c151-5f74-4c2a-b527-86fef9712955' -- PRID guid value might be differ in your machine Go |
After the execution is completed, the execution plan is displayed as follows:
The query still uses the "Table scan" method to display data.
So, I need to find another way to avoid "Table scan ". First, I should add an index to the table. So I created a non-clustered index on the prid field. Can I avoid "Table scan" after I add an index "? Next we will start to discuss topics about "index scan" and "index seek.
Index scan and index seek
First, create a non-clustered index on the prid field.
Create unique nonclustered index unc_prid On performanceissue (PRID) Go |
This document assumes that the reader knows how non-clustered indexes work. For more information about non-clustered indexes, refer to the related topics of Bol. For more information, see http://www.sql-server-performance.com/gv_index_data_structures.asp. Next we will detail how "index scan" works.
Execute the following statement and view the execution plan results.
Select PRID, prcode, prdesc From performanceissue Go |
Strange, "Table scan" is still used. Why does SQL Server not use the non-clustered index? Therefore, the query statement is optimized and two fields [PRID, prcode] are retrieved.
Select PRID, prcode from performanceissue Go |
The execution result is the same as that of the previous query. Therefore, modify the query to retrieve only one field [PRID].
Select PRID From performanceissue Go |
The execution plan result is as follows:
"Index scan" is used in queries, which is good. Naturally, the next question is when "index scan" will be used. The PRID field has an index, and the selected field in the query statement is PRID. When the query is executed, SQL server scans the index page and uses the "index scan" method. The fields with and without indexes are selected in the preceding query. SQL Server cannot use "index scan ". When only indexed fields are selected in the query, SQL server uses "index scan ". I don't know how SQL Server determines the underlying layer, but through these experiments, I thinkWhen only indexed fields are selected in the query, SQL server uses the "index scan" method..
The following describes when the "index seek" method is generated. When I saw the word "Seek", the first reaction was the idea of conditional query.
I tried three different query statements with the where syntax to find out which would use "index seek ". The first statement is as follows:
Select PRID, prcode, prdesc From performanceissue Where prcode = 8 Go |
The results show that the execution plan uses "Table scan ".
The second statement is as follows:
Select PRID, prcode, prdesc From performanceissue Where prdesc = 'performanceissue-8' Go |
The execution plan still uses the "Table scan" method.
The third query statement is as follows:
Select PRID, prcode, prdesc From performanceissue Where PRID = 'd2017c151-5f74-4c2a-b527-86fef9712955' -- PRID guid value might be differ in your machine Go |
The "index seek" and "bookmark lookup" methods are used for queries. Index seek is used because the where field PRID with the index is used for filtering. The "bookmark lookup" method is used because no indexed fields are selected in the query. If the two fields without indexes are removed, the "bookmark lookup" method can be removed. Of course, if only the prid field is returned, the query will be meaningless, because the specific PRID value is provided after the where statement.
I think "index seek" is better than "index scan" and "Table scan" in terms of performance improvement, mainly in the following aspects:
- "Index seek" does not need to scan tables and index pages, but "Table scan" and "index scan" do.
- "Index seek" uses "where" to filter the obtained data, which is much faster than "index scan" and "Table scan.
After I completed these tests, my colleague asked me an interesting question: when does SQL server use "clustered index scan" and "clustered index seek "? Next, we will conduct an experiment on "clustered index scan" and "clustered index seek.
I decided to create a clustered index on prcode to test "clustered index scan" and "clustered index seek ".
Clustered index Scan & clustered index seek
The following script deletes the index on the prid field and creates a clustered index on the prcode field.
Drop index performanceissue. unc_prid Go Create unique clustered index uc_prcode On performanceissue (prcode) Go ------------- Clustered index has been created successfully. Index has been created. |
For more information about Clustered indexes, see the related topics of online help or http://www.sql-server-performance.com/gv_index_data_structures.asp. Next we will focus on how "clustered index scan" and "clustered index seek" are used.
Execute the following query statement:
Select PRID, prcode, prdesc From performanceissue Go |
After the query is executed, you can see that "clustered index scan" is used in the execution plan ".
The following three different where methods are used to test when SQL server uses clustered index seek ". The first form is as follows:
Select PRID, prcode, prdesc From performanceissue Where prdesc = 'performanceissue-8' Go |
After the query is executed, you can see that "clustered index scan" is used in the execution plan ".
The second form is as follows:
Select PRID, prcode, prdesc From performanceissue Where PRID = 'd2017c151-5f74-4c2a-b527-86fef9712955' -- PRID guid value might be differ in your machine Go |
After the query is executed, it is found that "clustered index scan" is still used in the execution plan ".
Third form:
Select PRID, prcode, prdesc From performanceissue Where prcode = 8 Go |
"Clustered index seek" is used for this execution plan ".
When the prcode field is used after where, "clustered index seek" is used. The "bookmark lookup" method is not used because the selected fields include fields without indexes during the query of clustered index tables.
In my opinion, "clustered index seek" is better than "clustered index scan" and "index seek" in terms of performance improvement.
- "Clustered index seek" does not need to scan the entire clustered index page.
- Compared with "index scan", the "bookmark lookup" method does not appear in "clustered index seek" when the selected fields in the search contain those without indexes.
Through these experiments, I have accumulated practical experience on the application of the execution plan. I know which scanning mechanism can improve the performance, so that the customer is satisfied.