SQL Server Index Structure and usage (4)
Author: freedk
I. An in-depth understanding of the index structure
Ii. Improve SQL statements
Iii. General paging display and storage process for small data volumes and massive data
Importance of clustered index and how to select clustered Index
In the title of the previous section, I wrote: The general paging display stored process for small data volumes and massive data. This is because when we apply this stored procedure to the "office automation" system, the author finds that the third stored procedure has the following phenomena when there is a small amount of data:
1. The paging speed is generally between 1 second and 3 seconds.
2. When querying the last page, the speed is generally 5 to 8 seconds, even if the total number of pages is only 3 or 0.3 million pages.
Although the implementation of this paging process is very fast in the case of ultra-large capacity, but in the first few pages, this 1-3 second speed is slower than the first non-optimized paging method. In the user's words, it is "No ACCESS database is faster ", this recognition is sufficient to prevent users from using your developed system.
I have analyzed this. The crux of this problem is so simple, but so important: the sorting field is not a clustered index!
The title of this article is "query optimization and paging algorithm solution ". The author does not put together the topics of "query optimization" and "paging algorithm" because both of them need a very important thing-clustered index.
As we mentioned earlier, clustered indexes have two major advantages:
1. Narrow the query range as quickly as possible.
2. Sort fields as quickly as possible.
1st are mostly used for query optimization, while 2nd are mostly used for data sorting during paging.
Clustered indexes can only be created in one table, which makes clustered indexes more important. The selection of clustered indexes can be said to be the most critical factor for "query optimization" and "efficient paging.
However, clustering index columns must meet both the needs of query columns and the needs of sorting columns. This is usually a contradiction. In my previous "Index" discussion, I used fariqi, that is, the user's published date as the starting column of the clustered index. The date accuracy is "day ". The advantages of this method have been mentioned earlier. In the quick query of the time range, it is more advantageous than using the ID Primary Key column.
However, because duplicate records exist in the clustered index column during pagination, max or min cannot be used as the most paging reference object, thus making sorting more efficient. If the ID Primary Key column is used as the clustered index, the clustered index is useless except for sorting. In fact, this is a waste of valuable resources.
To solve this problem, I later added a date column, whose default value is getdate (). When a user writes a record, this column automatically writes the current time, accurate to milliseconds. Even so, to avoid a small possibility of overlap, you also need to create a UNIQUE constraint on this column. Use this date column as the clustered index column.
With this time-based clustered index column, you can use this column to query a certain period of time when you insert data, and use it as a unique column to implement max or min, it is a reference object for paging algorithms.
After such optimization, the author found that the paging speed is usually dozens of milliseconds or even 0 milliseconds in the case of large data volumes or small data volumes. However, the query speed for narrowing the range by date segments is no slower than the original query speed. Clustered index is so important and precious, so I have summarized that clustered index must be built on:
1. The most frequently used field to narrow the query scope;
2. The most frequently used field to be sorted.
Conclusion
This article brings together my recent experiences in using databases, and is an accumulation of practical experience in "office automation" systems. I hope this article will not only help you in your work, but also help you understand the problem analysis methods. The most important thing is that I hope this article will be helpful, we are setting off everyone's interest in learning and discussion, so as to jointly promote and jointly make our best efforts for the public security, technology, and police business and the Golden Shield Project.
Finally, it should be noted that during the experiment, I found that the biggest impact on the database speed is not the memory size, but the CPU. When I tried it on my P4 2.4 machine, I checked "Resource Manager". The CPU usage often continued to reach 100%, but the memory usage did not change or changed significantly. Even on our hp ml 350 G3 server, the peak CPU usage reaches 90%, typically around 70%.
The experimental data in this article is from our hp ml 350 server. Server Configuration: Dual-Inter Xeon hyper-threading CPU 2.4 GB, memory 1 GB, operating system Windows Server 2003 Enterprise Edition, Database SQL Server 2000 SP3
Article reference: http://www.vckbase.com/document/viewdoc? Id = 1310