Discover what new features are in SQL Server 2014 (3)-Updatable column storage clustered index

Discover what new features are in SQL Server 2014 (3)-Updatable column storage clustered index _mssql

Last Update:2017-01-18 Source: Internet

Author: User

Tags one table

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Brief introduction

The column storage index already exists in SQL Server 2012, but only nonclustered column indexes are allowed in SQL Server 2012, which means that the column index refers to the underlying data on the original row storage index, and therefore consumes more storage space, But the biggest constraint in 2012 is that once the nonclustered columns are stored on a table, the table becomes read-only, which makes it very painful to update data every time you use a column index in the Data warehouse. The Updatable clustered column index in SQL Server 2014 solves the problem.

Can I update the clustered column storage index?

The concept of clustered column storage indexes can be likened to traditional row storage, which is both the data itself and the concept of column storage. There are many benefits to storing data in columns rather than rows.

First of all, for a large number of aggregation, scanning, grouping and other data warehouse class queries only need to read the selected columns, for the star structure that needs to join more than one table, and so on, especially the performance of the column index can be updated, and each table only need one (this is the advantages and disadvantages Because a nonclustered index cannot be built again, the clustered column index can be greatly saves the Space column index because it is stored in columns, the data types in the same column are the same, so it is easier to achieve higher compression ratios the table stored by the column consumes less storage space, so there is less IO

So what's the downside of column storage indexing?

Row storage is ideal for OLTP operations, because each clustered index key identifies a row that is stored on a physical disk and is contiguous, you can use the seek operation to complete a large number of highly selective queries, while the column store indexes each column in the same row is not physically linked, and the column store clustered index does not have " Primary key concept, so there is no seek operation, and if a large number of OLTP class queries, performance will be problematic.

The column storage index only supports scan operations, as shown in Figure 1.

Figure 1. The column storage index only supports scan operations

So how does a column index store?

Column index storage can be words too literally, or stored by columns. This process can be divided into 3 stages, first of all a bunch of rows grouped, this is called "row group", after the grouping is completed, and then split by column, and finally the column compression, as shown in Figure 2.

Figure 2. The process of column storage

We notice that some of them are not grouped enough, so just let this part of the data in the form of traditional row storage to be honest, this is called deltastore, such as data growth to can be grouped, and then grouped, the current SQL Server 2014 think 10W below the data are not grouped.

The two parts stored in the above column can be observed by the 2014 newly introduced DMV, as shown in Figure 3. In Figure 3, our team currently has 31465 rows of clustered column indexes that have inserted 1000 rows of new data, and SQL Server considers this part of the data to be less than 10W rows and therefore exists in Deltastore.

Figure 3. Compressed columns and Deltastore

When we insert 1000 data, we can observe that the data in the Deltastore increases by 1000 to 2000, but still exists in Deltastore. As shown in Figure 4.

Figure 4. The data inserted again is still in the Deltastore.

So I'm going to insert a lot of rows and see that the bulk of the data is still stored in a deltastore way, as shown in Figure 5.

Figure 5. Data compression cannot be compressed after a large amount of data is inserted

So when will this data be compressed, according to BOL: http://msdn.microsoft.com/en-us/library/dn223749 (v=sql.120). aspx, there will be a background thread to periodically detect, It can also be archived automatically when rebuilding or collating an index, as shown in Figure 6.

Figure 6. Archive Column Storage index after rebuilding index

Space occupancy Comparison

Updatable columns store clustered indexes with the highest compression ratios, because the same column is often the same class of data, so this type of data has a better compression ratio. Now I'm purely comparing the space occupied by traditional clustered indexes, page compression, row compression, and column storage indexes, and of course, if we count the nonclustered indexes of traditional tables, then the row storage table will need more space. We used a simple comparison of more than 3W of data, as shown in Figure 7.

Figure 7. Different storage occupancy space

The sample data in Figure 7 is very small, but you can still see that the column store is almost 2/3 less occupied than the row store without the nonclustered index, which is not a huge boost.

Simple performance Comparison

First, by storing the columns, we select all the columns, and for the row store, we need to select the entire table to read all of the data, but the column store only needs to read the selected columns, so if you select only a specific column, the column storage performance is greatly improved, as shown in Figure 8.

Figure 8. Updatable column storage clustered index performance increased dramatically

But instead, we try a typical OLTP operation, and only select all the columns in a row, and then the results of Figure 8 are large. As shown in Figure 9.

Figure 9. Column storage indexes are very weak for OLTP operations

Summary

This article describes the principles, concepts, scenarios, and space usage of updatable column storage indexes in SQL Server 2014, and gives a performance comparison of two examples of OLAP and OLTP extremes. Column storage indexes are a huge leap for data warehouses and class OLAP queries.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More