I. Introduction
Since you and your team have successfully developed and deployed an Internet website, this website has attracted thousands of users to register and use in a short period of time, therefore, you have a very satisfied customer. Everyone, including you and your team, management, and customers, is very happy.
Life is not always smooth sailing. When the site's users started to grow at a high speed on a daily basis, the problem immediately occurred. The customer sent an email complaining that the website performance was too slow and that the website was losing customers.
You started to investigate the system and soon found that the system was very slow when accessing or updating data. When I started the database, the database records increased rapidly, and some table records reached thousands of rows. The test team conducted a test on the product database, the result shows that a processing process can be completed in only 2/3 seconds on the test server. Now it takes 5 minutes."
This old story happened to thousands of systems around the world. Almost every developer, including me, will encounter the same thing during his or her development process. I know why this happens, and I know how to overcome it.
II. Scope of reading
Note that the main focus of this series of articles is "optimizing the performance of transactional SQL Server database data access", but most optimization technologies are also applicable to other databases.
The optimization technology I will discuss is only applicable to software developers. As a developer, you need to follow my concerns and confirm that you have done everything you can to optimize the data access code you have written or want to write. Database Administrators (DBAs) also play an important role in Optimizing and improving performance, but the optimization in the DBA field will not be within the scope of this article.
3. Start optimizing a database
When the database-based application system slows down, 99% of the possible reasons are that the system's data access process is not optimized, or the best method is not used. Therefore, you need to review and optimize your data access/operation process to improve the global performance of the system. Next, we start our optimization task step by step.
Step 1: use the correct index on the column
Some may argue whether the correct index is the first step in the database optimization process. However, I think the correct index in the database application is the first. There are two reasons:
1. In a product system, it will enable you to improve performance as much as possible in a short time.
2. You do not need to make any system modifications to create database indexes, so you do not need to re-compile or deploy the database.
If you find that the current database does not properly process the index, you have created an index, and the result is a fast performance improvement. However, if the index has been processed, proceed to the following steps.
What is an index?
I believe you have understood what index is, But I still see that many people are not very clear about the index. Let's figure out what an index is. Let's take a look at the following little story.
A long time ago, there was a large library in an ancient city with thousands of books in it, and the books were stored on shelves in disorder. Therefore, once a reader asks a reader for a book, there is no better way for the reader to check whether the book matches the one the reader asks. It often takes several hours to discover a desired book. At the same time, readers have to wait for a long time.
[This looks like a table without a primary key. When searching data in the table, the database engine needs to traverse all the data to find relevant records, so it runs very slowly.]
When readers and books increase greatly every day, the workload of librarians is getting heavier and heavier. One day, a wise man came to the library and saw the heavy work of the librarians. He suggested that he number each book and put it on the shelf in sequence. "What benefits can I get from this ?" The bookmaker asked, and the wise man replied, "If a reader asks for a book by giving you a book number, you will soon be able to see which shelf contains the book number, then on this shelf, you can quickly find the desired books"
[The number of a book is like creating a primary key in a data table. When you create a primary key in a table, the system creates a clustered index tree, all data pages containing records are sorted by the value of the primary key in the file system. each data page is also sorted by the value of the primary key. therefore, when you request any data row from the database, the database server first uses the focused index to find the appropriate page (as if the bookshelves were first discovered ), search for records containing the primary key value on the page (like a book found on the shelf)]
"This is exactly what I need." The excited librarians began to numbers the books and arranged them on different bookshelves. He spent a day sorting them. at the end of that day, he did a test and found that he could find a book in almost no time. the readers are very happy.
[This is exactly what happens after you create the primary key. first, a focus index is created, and the data page is sorted by the primary key value in the physical file. one thing I think is easy to understand, because data can only be sorted using the values of one column as creden., so a table can only create one focusing index. just as books can only be sorted by one standard, that is, the book number.]
Wait a moment. The problem has not been completely solved. In the next period, a reader will not have the number of the book, but only the name of the book. What should he do if he wants to ask for the book through the title? Poor librarians can only query all numbered books from 1 to n. if a book is stored on the 67-day shelf, it may take 20 minutes for him to spend 2-3 hours compared to the time when the book was not sorted in the morning. this is indeed an improvement. however, compared to finding a book by using the book number within 30 seconds, 20 minutes is still a short time. is there any better way? He asked the wise man.
[Assume that you have a product table. If you only have one productid primary key and no other indexes, the above situation will also happen. Therefore, when you search by product name, the data engine can only traverse all physically ordered data pages in files. There is no other method.]
The wise man told the librarians that, because you have sorted the books by the book number, you cannot re-sort the books by using other creden, A better way is to create a directory or index that contains the title and corresponding numbers. On this directory, sort the books alphabetically and group the books with arabic letters. For example, when someone wants to find databasemanagementsystem, you can use the following rules to find this book.
1. Go to Chapter D in the title directory and find the book that contains your title.
2. Get the book number and use it to search for the book.
"You are really a genius", the librarians shouted, he immediately spent some time creating a directory of the title. Through a quick test, he found that it only takes one minute to use the title to query, the number of the book to be searched in 30 seconds, and the number to be searched in 30 seconds.
The reader thought that the reader may also use other creden. to search for books, such as the author's name, so he created the same directory for the author. after creating these directories, the librarians can use these creden. to find the books within one minute. the arduous work of librarians has finally ended. Many readers have gathered in the library because they quickly searched for the books. The library has become very lively.
Then the librarians began to live his happy life, and the story was over.
Now, I'm sure you have understood what indexes are, why they are so important, and how they work internally. For example, we have a product table named products that has created a focused index, because when a primary key is created, a focus index is created immediately. We should create a non-focused index in the productname column. Once we do this, the database engine will create an index tree for the non-focused index, like the title directory in the story, sort by product name on the index page. Each index page contains a certain range of product names and corresponding productid. Therefore, when you use the product name as a credential for search, the database engine first queries the non-focused index tree of the product name to find the primary key productid of the book. Once found, the database engine uses the primary key productid to search for the focused index tree, and get the correct result.
The working principle of the index tree is as follows:
It is called the B + tree. The intermediate node contains a certain number of values, which indicates how the database engine traverses when searching for an index value from the same node. if this is an index tree, the page node is a physical data page. if the index tree is not focused, the page node contains the index value and the corresponding focused index value.
Generally, it takes a short time for the database engine to find the desired value in the index tree and transfer it to the target data record. Therefore, applying indexes in databases greatly improves data retrieval operations.
Follow these steps to ensure that the correct index is included in your database.
Make sure that each table in the database has a master key
This ensures that each table has a focused index. The data pages of the table are physically arranged on the disk through the primary key values. Therefore, any data retrieval operation that uses the primary key can quickly retrieve data by sorting the primary key fields.
Create a non-focused index on these columns
Columns frequently used as search creden
Used to join columns in other tables
Used as an external key column
Column Used for sorting
Highly Selective Columns
XML type
The following is an example of an index creation command.
CREATEINDEX NCLIX_OrderDetails_ProductIDON dbo.OrderDetails(ProductID) |
You can also create an index on the required columns on the SQL server console.
Step 2: Create a correct Composite Index
Have you created all the appropriate indexes in the database? Assume that you have created an index in a sales table (selesid, salesdate, salespersonid, productid, qty). If productid is a highly selective column, any SELECT query that uses the index column (productid) in the where statement to retrieve data will run very fast?
For example, if you do not create an index with an external key (this requires traversing all data pages), this is very fast, but there is still room for further improvement.
Let's assume that the sales table contains 10,000 rows of data, and the following SQL statement selects 400 rows.
SELECTSalesDate,SalesPersonIDFROMSalesWHEREProductID=112 |
First, let's figure out how to execute SQL statements in the database engine:
1. The sales table has a non-focused index in the productid column. Therefore, first query the non-focused index tree and find the entry containing productid = 112.
2. The index page containing the productid = 112 entry also contains the index-focused value (all primary values, namely, salesid)
3. For each primary key (400 in total), the database engine enters the index tree to find the correct row location.
4. For each primary key, once the correct row location is found, the database engine obtains the values of the salesdate and salespersonid columns from the matched row.
Note that in the preceding steps, for each primary key entry with productid = 112 (400 in total), the database engine must search for the index tree 400 times, to retrieve the appended columns (salesdate, salespersonid ).
Let's assume that a non-focused index contains not only the value of the focused index (Primary Key), but also the value of the other two columns (salesdate, salespersonid) marked in the query, the database engine does not need to perform the preceding steps 3rd and 4th. It only needs to enter the non-focused index tree of the productid column and read the values of the three columns from the index page, isn't the Running Speed faster?
Fortunately, there is a way to implement this feature, which is composite indexes. You can create a composite index on the column of the table to indicate which columns should be stored together with the focused index. The following is an example of creating a composite index for the productid column in the sales table.
CREATEINDEXNCLIX_Sales_ProductID--Indexname ONdbo.Sales(ProductID)--Columnonwhichindexistobecreated INCLUDE(SalesDate,SalesPersonID)--Additionalcolumnvaluestoinclude |
Note that creating a composite index should contain a few columns, which are often used in select queries. Adding too many columns to a composite index will not bring you too many benefits. In addition, a considerable amount of memory is used to store the values of composite index columns, resulting in memory overflow and performance degradation.
When creating a composite index, try to use databasetuningadvisor for help.
We know that once an SQL statement starts running, the SQL Server engine optimizer dynamically generates different retrieval plans based on the following points.
Data Volume
Statistics
Index Changes
Tsql parameter value
Server Load
This means that for a special SQL statement, the execution plan on the product server may be different from the execution plan on the test server, and even the table and index structure are the same. This also indicates that an index created on the test server may accelerate the performance of the test server, but the same index on the product server may not bring you any benefits. Why? Because the sqlsevver execution plan in the test environment may use the created index, it gives you good performance. However, the execution plan on the product server may not use the newly created index for any of the following reasons. For example, a non-focused index column is not a highly selective column on the product server, but a highly selective column on the test server.
Therefore, when creating an index, we need to understand this: The index is used by the execution engine to increase the speed. But what should we do?
The answer is that we have to simulate the load of the product server on the test server, then create indexes and test them. Only in this way can the performance index be improved on the test server to Improve the Performance on the product server.
It should be difficult to do so, but fortunately we have some useful tools to implement it. Please follow the instructions below:
1: Use sqlprofiler to capture traces on the product server. Use tuningtemplate (I know someone suggests not to use sqlprofiler on the product server, but sometimes you have to use it when diagnosing performance problems on the product server ), if you are not familiar with this tool, or you want to learn more about sqlprofiler, please read http://msdn.microsoft.com/en-us/library/ms181091.aspx
2. using the tracking files generated in the previous step, use the database optimization consultant to create similar loads in the test database, and obtain some suggestions from the optimization consultant, especially the recommendations for creating indexes, you may obtain more practical suggestions from the optimization consultant. Because the optimization consultant uses the trace files generated by the product server to load the test server, it can produce the most likely index recommendations. If you are not familiar with the optimization advisor tool, or you want to learn more about using the optimization advisor, please read: http://msdn.microsoft.com/en-us/library/ms166575.aspx.
Step 3: If fragments occur, refresh them.
At this point, if you have created all the correct indexes in the table, but you may not have achieved the desired good performance. Why? One possibility is that index fragmentation occurs.
1. What is index fragmentation?
Index fragmentation is a case where the index page is split due to a large number of insert, modify, and delete operations in the table. If the index has high fragmentation, there are two situations: one is that it takes a lot of time to scan the index, and the other is that the index does not use the index during the query, performance will be reduced.
There are two types of fragments:
Internal fragmentation: data insertion or modification on the index page ends with data distribution as a sparse matrix, which leads to an increase in the data page, this increases the query time.
External fragmentation: data insertion or modification of index/Data Pages ends with distribution of new index pages separated by page numbers and inconsistent in the file system, the database server cannot use the advantages of the pre-read operation, because the next associated data page is not approaching, and the page numbers of these related connections may be anywhere in the data file.
2. How do I know if index fragmentation has occurred?
Execute the following SQL statement in the database (the following statement runs normally in sqlserver2005 and later versions, replacing adventureworks' with your target database name ')
SELECTobject_name(dt.object_id)Tablename,si.name IndexName,dt.avg_fragmentation_in_percentAS ExternalFragmentation,dt.avg_page_space_used_in_percentAS InternalFragmentation FROM ( SELECTobject_id,index_id,avg_fragmentation_in_percent,avg_page_space_used_in_percent FROMsys.dm_db_index_physical_stats(db_id('AdventureWorks'),null,null,null,'DETAILED' ) WHEREindex_id<>0)ASdtINNERJOINsys.indexessiONsi.object_id=dt.object_id ANDsi.index_id=dt.index_idANDdt.avg_fragmentation_in_percent>10 ANDdt.avg_page_space_used_in_percent<75ORDERBYavg_fragmentation_in_percentDESC |
The index fragmentation information of the adventureworks database shown in the preceding query is as follows:
By analyzing the above results, you can see where index fragmentation occurs and apply the following rules:
The value of externalfragmentation is greater than 10, indicating that the corresponding index contains external fragments. The value of internalfragmentation <75 indicates that the corresponding index contains internal fragments.
3. How to reorganize index fragments
There are two methods:
Index Reorganization: execute the following command:
ALTERINDEXALLONTableNameRECOGNIZE |
Index reconstruction:
ALTERINDEXALLONTableNameREBUILDWITH(FILLFACTOR=90,ONLINE=ON) |
By replacing all with the name of a specific index, You can restructure or recreate a single index. You can also use the Database Console to recreate/reorganize indexes.
4. When will indexes be restructured and rebuilt?
When the external fragment value is 10-15 and the internal fragment value is 60-75, you should restructure the index for such an index. Otherwise, you should re-create the index.
An important thing about index reconstruction is that once an index is re-built on a specific table, the table will be locked (not during restructuring ). Therefore, for a large table in a product database, index rebuilding on a large table often takes several hours, and we do not want this locking. Fortunately, there is a solution in sql2005 that you can set the online option value to on when recreating a table index, this will make the re-indexing and the data transaction on the table do the same.
Iv. Data Access conclusion
It is tempting to create an index on all the fields suitable for creating an index in a data table. However, if you are working on a transaction database, creating an index on each field is not required every time. In fact, creating a large number of indexes on An OLTP system may reduce the database performance. (Because when many operations are update operations, updating data means updating indexes)
A primary rule is recommended as follows:
If you are engaged in a transactional database, do not create more than five indexes on a table on average. If you are engaged in a data warehouse, you can create up to 10 indexes on a table on average.