Use Column Storage index: Highway to build data

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

A few months ago, we took you to the Microsoft Next generation database platform SQL Server Denali column storage indexing function. For tables at the data warehouse level, it can improve query performance to a large extent. In the latest Community preview version of CTP3, we are fortunate to have access to the full functionality of the column storage index, so in this article we'll delve into the mysteries.

Unlike our familiar "row store" format, the data in each column index in the new schema is grouped and stored separately, and the column data can be compressed. Additionally, when the DBA runs a query on the column storage index, SQL Server reads only the columns used in the query. The result is less disk I/O and less memory usage, so the query's performance will increase by dozens of times or even 100 times times.

For column storage indexes and row storage indexes, we do a comparative test in this article that contains 85 million rows of data, the author uses different combinations, and the results are exactly the same as those in the Microsoft SQL Server Online Guide:

Column storage indexes tend to be faster than row storage indexes. The SQL Server Online Guide mentions that in some specific cases, rows store indexes have better performance, but in most cases they are definitely column storage indexes;

As expected, the column storage index works better in the use of aggregations and filtered queries;

The query optimizer chooses the column storage index to be faster than the row storage index;

The query execution time may be dozens of times times faster or more than a hundredfold depending on the scenario you are using. The more value you repeat in a column, the higher the compression rate, and therefore the higher the performance. Similarly, this is true for data types;

It takes almost the same time to build a column to store index peers to store indexes. In the CTP2 version, all of the Microsoft MVP trials reflect a longer build time for the column storage index. But in CTP3, Microsoft has improved on this feature.

Consider a special scenario in which a query returns two columns in the result, where value differs a little; The result set contains 850 rows. Using a traditional row store index, the query spends about 4 minutes, and the column stores the index, and the execution time drops to 1.2 seconds, and the speed increases 200 times times. Additionally, the tested Denali server is a virtual machine that has not been optimized for disk I/O.

Using columns to store indexes properly

The column store index is not flawless, and it has its own limitations. Only one column can store the index in each table, and if you use a table partition, you must include the partitioning column in the index. Cannot contain the following data types: binary, variable binary, text, memo, image, varchar (max), nvarchar (max), unique identifier, timestamp, decimal and numeric precision greater than 18 digits, CLR, and XML.

In addition to the data type limitations, the biggest limitation of the SQL Server Denali Column Storage Index is that once you create a column to store the index, the table becomes read-only, so any data modifications cannot be made any more. This is not very good news for any of the frequently updated tables.

Microsoft also has to make a corresponding adjustment, the most direct way is only in the pre-set time to update the table. Delete or revoke the index, and then rebuild after the update.

Or you can use partitions. Use Insert to add a new partition, when you want to update an existing row, disconnect, drop the column to store the index, update the data, recreate the index, and then add it to the table.

The final method is to fragment the data into static data and update it. Keep the columns storing static data in the Index table, and then store the data that needs to be updated in a different index table. Use the UNION ALL statement to select from both tables. With this approach, you can also get some effect: storing indexes in a static datasheet can speed up the query, while the other traditional tables do not increase in speed.

Of course, such a situation is not very common, mainly applicable to data warehouse tables or infrequently used online transaction processing tables. The SQL Server Online Guide reminds us to use column storage indexes cautiously, and this part of the functionality may change in future official editions. So we can speculate that in the official version of SQL Server Denali, we will not be subject to such restrictions.

The SQL Server Denali column Storage indexing feature is evident in specific environments, especially in data warehouses. The improvement in query performance means that the query time in the actual run will be greatly shortened, and we do not need to use the aggregate table of static data and then refresh periodically. Fully consider the company's specific business needs before use, will be able to achieve a multiplier effect.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Use Column Storage index: Highway to build data

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Use Column Storage index: Highway to build data

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support