Use Column Storage index: Highway to build data

Source: Internet
Author: User
Tags execution memory usage microsoft sql server set time

A few months ago, we took you to the Microsoft Next generation database platform SQL Server Denali column storage indexing function. For tables at the data warehouse level, it can improve query performance to a large extent. In the latest Community preview version of CTP3, we are fortunate to have access to the full functionality of the column storage index, so in this article we'll delve into the mysteries.

Unlike our familiar "row store" format, the data in each column index in the new schema is grouped and stored separately, and the column data can be compressed. Additionally, when the DBA runs a query on the column storage index, SQL Server reads only the columns used in the query. The result is less disk I/O and less memory usage, so the query's performance will increase by dozens of times or even 100 times times.

For column storage indexes and row storage indexes, we do a comparative test in this article that contains 85 million rows of data, the author uses different combinations, and the results are exactly the same as those in the Microsoft SQL Server Online Guide:

Column storage indexes tend to be faster than row storage indexes. The SQL Server Online Guide mentions that in some specific cases, rows store indexes have better performance, but in most cases they are definitely column storage indexes;

As expected, the column storage index works better in the use of aggregations and filtered queries;

The query optimizer chooses the column storage index to be faster than the row storage index;

The query execution time may be dozens of times times faster or more than a hundredfold depending on the scenario you are using. The more value you repeat in a column, the higher the compression rate, and therefore the higher the performance. Similarly, this is true for data types;

It takes almost the same time to build a column to store index peers to store indexes. In the CTP2 version, all of the Microsoft MVP trials reflect a longer build time for the column storage index. But in CTP3, Microsoft has improved on this feature.

Consider a special scenario in which a query returns two columns in the result, where value differs a little; The result set contains 850 rows. Using a traditional row store index, the query spends about 4 minutes, and the column stores the index, and the execution time drops to 1.2 seconds, and the speed increases 200 times times. Additionally, the tested Denali server is a virtual machine that has not been optimized for disk I/O.

Using columns to store indexes properly

The column store index is not flawless, and it has its own limitations. Only one column can store the index in each table, and if you use a table partition, you must include the partitioning column in the index. Cannot contain the following data types: binary, variable binary, text, memo, image, varchar (max), nvarchar (max), unique identifier, timestamp, decimal and numeric precision greater than 18 digits, CLR, and XML.

In addition to the data type limitations, the biggest limitation of the SQL Server Denali Column Storage Index is that once you create a column to store the index, the table becomes read-only, so any data modifications cannot be made any more. This is not very good news for any of the frequently updated tables.

Microsoft also has to make a corresponding adjustment, the most direct way is only in the pre-set time to update the table. Delete or revoke the index, and then rebuild after the update.

Or you can use partitions. Use Insert to add a new partition, when you want to update an existing row, disconnect, drop the column to store the index, update the data, recreate the index, and then add it to the table.

The final method is to fragment the data into static data and update it. Keep the columns storing static data in the Index table, and then store the data that needs to be updated in a different index table. Use the UNION ALL statement to select from both tables. With this approach, you can also get some effect: storing indexes in a static datasheet can speed up the query, while the other traditional tables do not increase in speed.

Of course, such a situation is not very common, mainly applicable to data warehouse tables or infrequently used online transaction processing tables. The SQL Server Online Guide reminds us to use column storage indexes cautiously, and this part of the functionality may change in future official editions. So we can speculate that in the official version of SQL Server Denali, we will not be subject to such restrictions.

The SQL Server Denali column Storage indexing feature is evident in specific environments, especially in data warehouses. The improvement in query performance means that the query time in the actual run will be greatly shortened, and we do not need to use the aggregate table of static data and then refresh periodically. Fully consider the company's specific business needs before use, will be able to achieve a multiplier effect.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.