Uncover new features of SQL Server 2014 (3)-updatable column storage clustered index,

Source: Internet
Author: User

Uncover new features of SQL Server 2014 (3)-updatable column storage clustered index,

Introduction

The column storage index actually exists in SQL Server 2012, but SQL Server 2012 only allows the creation of non-clustered column indexes, this means that the column index references the underlying data on the original Row Storage index, which consumes more storage space, but the biggest limit in 2012 is that once the index of the non-clustered column is created on a table, the table will become read-only, so that even if the column index is used in the data warehouse, every time data is updated, it becomes very painful. The updatable clustered column index in SQL Server 2014 solves this problem.

Can I update the clustered column storage index?

The concept of clustered column storage index is similar to that of traditional Row Storage. Clustered indexes are both data and column storage. Storing data by column instead of by row provides many benefits,

First, you only need to read the selected columns for a large number of data warehouse queries such as aggregation, scanning, and grouping, in scenarios such as the need to Join multiple tables, the performance is improved, especially when the column index can be updated, and only one table is required (this is both an advantage and a disadvantage, because non-clustered indexes cannot be created again) clustered column indexes can be used, which greatly saves space. Column indexes are stored by column, and the data types in the same column are the same, therefore, it is easier to achieve a higher compression ratio. tables stored in columns occupy less storage space, so there is less IO.

What are the disadvantages of column storage indexes?

Row Storage is very suitable for OLTP operations. Because each clustered index key can identify a row and the row is stored continuously on the physical disk, you can use the Seek operation to complete a large number of highly selective queries, the column store indexes do not physically relate to each column in the same row, and the column store clustered index does not have the "primary key" concept, so there is no SEEK operation, if a large number of OLTP queries are performed, the performance may be faulty.

Column storage indexes only support Scan operations, as shown in figure 1.

Figure 1. Column store indexes only support Scan operations

How is column indexes stored?

Column index storage is simply stored by column. This process can be divided into three stages: first grouping a bunch of rows, this is the so-called "Row group". After grouping is complete, sharding by column, and finally compressing the column, 2.

Figure 2. Column storage procedure

We noticed that some of them are not grouping enough, so let the data stay in the form of traditional row storage. This is the so-called Deltastore, before grouping, SQL Server 2014 considers that data smaller than 10 million is not grouped.

The two parts of the column storage above can be observed through the newly introduced DMV 2014, as shown in 3. In Figure 3, we have inserted 31465 rows of new data into the clustered column index of 1000 rows, and SQL Server considers this part of data to be less than rows, therefore, it exists as Deltastore.

Figure 3. Compressed columns and Deltastore

When we insert another 1000 data, we can see that the data in DeltaStore has increased by 1000 to 2000, but it still exists in DeltaStore. 4.

Figure 4. data inserted again is still in DeltaStore

Then, when I insert a large number of rows for observation, we will find that large volumes of data are still stored in DeltaStore mode, 5.

Figure 5. data cannot be compressed after a large amount of data is inserted

So when will the data be compressed, according to the description of the BOL: http://msdn.microsoft.com/en-us/library/dn223749 (v = SQL .120 ). aspx will have a background thread for regular detection. In addition, it can be automatically archived When indexes are rebuilt or sorted, as shown in figure 6.

Figure 6. Archive column storage index after index Reconstruction

Space usage comparison

The compression ratio of updatable column storage clustered indexes is the highest, because the same column is often the same type of data, so this type of data has a better compression ratio. Now I simply compare the space occupied by traditional clustered indexes, page compression, row compression, and column storage indexes. Of course, if we calculate non-clustered indexes of traditional tables, therefore, the row-store table requires more space. We use more than pieces of data for simple comparison, as shown in figure 7.

Figure 7. space occupied by different storage

The sample data in figure 7 is very small, but we can still see that the column storage space is nearly 2/3 less than the Row Storage without non-clustered indexes, which is not a huge improvement.

Simple Performance Comparison

First, store all columns by column. For row store, you must select the entire table to read all columns of data, however, column storage only needs to read the selected columns. Therefore, if only a specific column is selected, the column storage performance is greatly improved, as shown in figure 8.

Figure 8. Improved performance of updatable column storage clustered Indexes

However, if we try a typical OLTP operation and select only all columns in a row, the results in Figure 8 are displayed. 9.

Figure 9. Column-store indexes are very weak for OLTP operations

Summary

This article describes the principles, concepts, applicable scenarios, and space usage of updatable column storage indexes in SQL Server 2014, and provides two extreme examples of OLAP and OLTP for performance comparison. Column storage indexes are a huge leap for data warehouses and OLAP queries.




Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.