4. How to optimize and operate a database with a large data volume (hundreds of thousands of data records) (how to select an aggregate index)

Source: Internet
Author: User

4. Importance of clustered indexes and how to select clustered Indexes

In the title of the previous section, I wrote: The general paging display stored process for small data volumes and massive data. This is because when we apply this stored procedure to the "office automation" system, the author finds that the third stored procedure has the following phenomena when there is a small amount of data:

1. The paging speed is generally between 1 second and 3 seconds.

2. When querying the last page, the speed is generally 5 to 8 seconds, even if the total number of pages is only 3 or 0.3 million pages.

Although the implementation of this paging process is very fast in the case of ultra-large capacity, but in the first few pages, this 1-3 second speed is slower than the first non-optimized paging method. In the user's words, it is "No ACCESS database is faster ", this recognition is sufficient to prevent users from using your developed system.

I have analyzed this. The crux of this problem is so simple, but so important: the sorting field is not a clustered index!

The title of this article is "query optimization and paging algorithm solution ". The author does not put together the topics of "query optimization" and "paging algorithm" because both of them need a very important thing-clustered index.

As we mentioned earlier, clustered indexes have two major advantages:

1. Narrow the query range as quickly as possible.

2. Sort fields as quickly as possible.

1st are mostly used for query optimization, while 2nd are mostly used for data sorting during paging.

Clustered indexes can only be created in one table, which makes clustered indexes more important. The selection of clustered indexes can be said to be the most critical factor for "query optimization" and "efficient paging.

However, clustering index columns must meet both the needs of query columns and the needs of sorting columns. This is usually a contradiction.

In my previous "Index" discussion, I used fariqi, that is, the user's published date as the starting column of the clustered index. The date accuracy is "day ". The advantages of this method have been mentioned earlier. In the quick query of the time range, it is more advantageous than using the ID Primary Key column.

However, because duplicate records exist in the clustered index column during pagination, Max or min cannot be used as the most paging reference object, thus making sorting more efficient. If the id Primary Key column is used as the clustered index, the clustered index is useless except for sorting. In fact, this is a waste of valuable resources.

To solve this problem, I later added a date column, whose default value is getdate (). When a user writes a record, this column automatically writes the current time, accurate to milliseconds. Even so, to avoid a small possibility of overlap, you also need to create a unique constraint on this column. Use this date column as the clustered index column.

With this time-based clustered index column, you can use this column to query a certain period of time when you insert data, and use it as a unique column to implement Max or Min, it is a reference object for paging algorithms.

After such optimization, the author found that the paging speed is usually dozens of milliseconds or even 0 milliseconds in the case of large data volumes or small data volumes. However, the query speed for narrowing the range by date segments is no slower than the original query speed.

Clustered index is so important and precious, so I have summarized that clustered index must be built on:

1. The most frequently used field to narrow the query scope;

2. The most frequently used field to be sorted.

Conclusion:

I hope this article will not only help you in your work, but also help you understand the problem analysis methods. The most important thing is that I hope this article will be helpful, the interest of everyone in learning and discussion can be promoted together.
It should be noted that, in the test, we found that the biggest impact on the database speed is not the memory size, but the CPU when you query large data volumes. When I tried it on my P4 2.4 machine, I checked "Resource Manager". The CPU usage often continued to reach 100%, but the memory usage did not change or changed significantly. Even on our HP ml 350 G3 server, the peak CPU usage reaches 90%, typically around 70%.

The experimental data in this article is from our HP ml 350 server. Server Configuration: Dual-Inter Xeon hyper-threading CPU 2.4 GB, memory 1 GB, operating system Windows Server 2003 Enterprise Edition, Database SQL Server 2000 sp3.

4. Importance of clustered indexes and how to select clustered Indexes

In the title of the previous section, I wrote: The general paging display stored process for small data volumes and massive data. This is because when we apply this stored procedure to the "office automation" system, the author finds that the third stored procedure has the following phenomena when there is a small amount of data:

1. The paging speed is generally between 1 second and 3 seconds.

2. When querying the last page, the speed is generally 5 to 8 seconds, even if the total number of pages is only 3 or 0.3 million pages.

Although the implementation of this paging process is very fast in the case of ultra-large capacity, but in the first few pages, this 1-3 second speed is slower than the first non-optimized paging method. In the user's words, it is "No Access database is faster ", this recognition is sufficient to prevent users from using your developed system.

I have analyzed this. The crux of this problem is so simple, but so important: the sorting field is not a clustered index!

The title of this article is "query optimization and paging algorithm solution ". The author does not put together the topics of "query optimization" and "paging algorithm" because both of them need a very important thing-clustered index.

As we mentioned earlier, clustered indexes have two major advantages:

1. Narrow the query range as quickly as possible.

2. Sort fields as quickly as possible.

1st are mostly used for query optimization, while 2nd are mostly used for data sorting during paging.

Clustered indexes can only be created in one table, which makes clustered indexes more important. The selection of clustered indexes can be said to be the most critical factor for "query optimization" and "efficient paging.

However, clustering index columns must meet both the needs of query columns and the needs of sorting columns. This is usually a contradiction.

In my previous "Index" discussion, I used fariqi, that is, the user's published date as the starting column of the clustered index. The date accuracy is "day ". The advantages of this method have been mentioned earlier. In the quick query of the time range, it is more advantageous than using the ID Primary Key column.

However, because duplicate records exist in the clustered index column during pagination, max or min cannot be used as the most paging reference object, thus making sorting more efficient. If the ID Primary Key column is used as the clustered index, the clustered index is useless except for sorting. In fact, this is a waste of valuable resources.

To solve this problem, I later added a date column, whose default value is getdate (). When a user writes a record, this column automatically writes the current time, accurate to milliseconds. Even so, to avoid a small possibility of overlap, you also need to create a UNIQUE constraint on this column. Use this date column as the clustered index column.

With this time-based clustered index column, you can use this column to query a certain period of time when you insert data, and use it as a unique column to implement max or min, it is a reference object for paging algorithms.

After such optimization, the author found that the paging speed is usually dozens of milliseconds or even 0 milliseconds in the case of large data volumes or small data volumes. However, the query speed for narrowing the range by date segments is no slower than the original query speed.

Clustered index is so important and precious, so I have summarized that clustered index must be built on:

1. The most frequently used field to narrow the query scope;

2. The most frequently used field to be sorted.

Conclusion:

I hope this article will not only help you in your work, but also help you understand the problem analysis methods. The most important thing is that I hope this article will be helpful, the interest of everyone in learning and discussion can be promoted together.
It should be noted that, in the test, we found that the biggest impact on the database speed is not the memory size, but the CPU when you query large data volumes. When I tried it on my P4 2.4 machine, I checked "Resource Manager". The CPU usage often continued to reach 100%, but the memory usage did not change or changed significantly. Even on our hp ml 350 G3 server, the peak CPU usage reaches 90%, typically around 70%.

The experimental data in this article is from our HP ml 350 server. Server Configuration: Dual-Inter Xeon hyper-threading CPU 2.4 GB, memory 1 GB, operating system Windows Server 2003 Enterprise Edition, Database SQL Server 2000 sp3.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.