A Free Trial That Lets You Build Big!
Start building with 50+ products and up to 12 months usage for Elastic Compute Service
SQL Server has introduced the concept of Columnstore since 2012, and so far 2016 support for Columnstore has been very friendly. Because my side of the online environment is primarily 2014, this article is an introduction to the 2014-based column storage for SQL Server. Below we mainly look at one of the following storage developments and some principles:
The development of Columnstore is to handle very large amounts of data for analytical calculations, so SQL Server 2012 introduced a Columnstore index that significantly provides the performance of high-traditional data warehouse type statements and is further enhanced in SQL Server 2014. Columnstore stores the data of a column separately, so there are two main benefits.
1: The similarity of the data in the same column is higher, so the compression ratio is higher. Disk operation, the disk IO will also be reduced accordingly. Of course, when the compressed data is read into memory, the decompression will require additional CPU.
2: Because the data is stored and read by columns, if some columns are not needed in the access, then the actual operation will also not access these columns, then disk IO will be further reduced.
3: Because data is stored and read in columns, large amounts of data aggregation access is faster than previous row storage.
For Columnstore, the main thing is the use of data warehousing, Microsoft in recent years has been in this area frequently. For the Data Warehouse, CPU, memory, disk may be called performance bottlenecks, but we instruct the disk operation is the slowest compared to memory and CPU performance, and Columnstore is the performance of the IO is very high, columnstore will reduce the IO operation of the disk, improve the efficiency of operations, especially the aggregation of large amounts of data. The Columnstore is not the best choice, of course, if it is an exact search on the line.
for these performance improvements and storage space optimization, mainly and columnstore implementation principle is inseparable (because the non-clustered Columnstore function is more chicken, we do not introduce, because there is a nonclustered columnstore table becomes a read-only table):
1:clustered Columnstore inde– The entire table is organized according to the column storage, directly replaces the traditional heap table or the clustered index, and is free to make the pruning operation.
2: Clustered Columnstore Index Although the structure of the column store is basically the same as the nonclustered Columnstore index, it can be used for pruning operations. The reason is that it has one or more pieces of storage, part of which is called Delta tore.
After the principle is about to be read, the following is a list of SQL Server2014 improvements to Columnstore:
Support for reading and writing data
After breaking the data read-only limit, the range and scenario used by the Columnstore index are greatly increased
Compared to the traditional Ad-hoc operation, in SQL Server2014 it is recommended to use BULK INSERT and partition switching to update large batches of data, more efficient and lower maintenance costs
Support for more data types
Added more data type support: (n) varchar (max), varbinary (max), XML, Spatial, CLR
Basically, the SQL Server2014 Columnstore supports all NON-BLOB data types
The entire table can be established and only one clustered Columnstore index can be established. Traditional row storage requires nonclustered indexes to help improve access efficiency, but column storage is not required. And because there is only one piece of data, storage requires significantly less disk space
Nonclustered column indexes are still supported, and are still read-only structures.
When we have a clustered Columnstore index, we don't need a nonclustered column index, because all of the data is stored in columns at this time. However, if you need to add constraints on the table or if the workload still requires a nonclustered index of b-tree form, then we can only consider using nonclustered Columnstore indexes.
The following improvements are performed on the execution of a statement
0 Vector-based computing methods are modified
0 Support for more syntax
All join methods (including outer, HASH, SEMI (not in, in)
"Mixed Mode" plans
Further improvements to bitmap and spill operations
Improvements to Hash join
In fact, in my practice of SQL Server 2014 columnstore, I also found a few places that were not very friendly.
1:sql Server 2014 clustered Columnstore does not support view functionality, this is still a pit, because the main application of Columnstore is the Data warehouse, there are many views to provide reports or to provide the report Department query permissions, through the view can hide a lot of sensitive information, It is difficult to make a decision without supporting the view, specifically modifying the column storage
2:sql Server 2014 clustered Columnstore does not support AlwaysOn queries from libraries
3:sql Server 2014 12.0.2 has a vulnerability to column storage, AlwaysOn log synchronization is prone to internal lock contention, affecting master-slave synchronization, this feature we may have to upgrade the SP1 patch to resolve, I have not seen this issue since the library upgrade, which is also the SQL A bug in Server 2014 column storage
The above is also considered in the production environment through the pit, because the consideration is not very comprehensive walk a lot of road. I hope we can warning. In addition to the above pits, Columnstore does not support the following features :
The following data types are not available in the Columnstore index:
Binary (n), varbinary (n) (allowed in 2014 and later versions, but excluding varbinary (max)), image, text, ntext, varchar (max), nvarchar (max), Sql_ Variant,xml
Indexes can only be rebuilt by deleting and creating indexes, not by using the ALTER INDEX command
Unable to use Columnstore index in view or indexed view
Columnstore indexes cannot be combined with the following features: distribution, change data capture, change tracking, Filestream
Columnstore index cannot contain more than 1024 columns
The corresponding table cannot contain uniqueness constraints, primary KEY constraints, or FOREIGN KEY constraints
Next we look at a few of the following storage practices:
1: Create a column-stored table
CREATE TABLE maxiangqian( id [int] NOT NULL, age [int] NOT NULL, sex [tinyint] NOT NULL, name varchar(20)); GO CREATE CLUSTERED COLUMNSTORE INDEX cci_Simple ON maxiangqian; GO :
2: Row-clustered index is converted to Columnstore:
CREATE TABLE maxiangqian( id [int] NOT NULL, age [int] NOT NULL, sex [tinyint] NOT NULL, name varchar(20)); GO CREATE CLUSTERED INDEX cl_simple ON maxiangqian (id); GO CREATE CLUSTERED COLUMNSTORE INDEX cl_simple ON maxiangqian WITH (DROP_EXISTING = ON);
Or we can simply delete the clustered index and then
CREATE CLUSTERED COLUMNSTORE INDEX cl_simple ON maxiangqian
The effect is the same.
3 Converting a heap table into a columnstore table:
The first step is to delete the index of the heap performance and then create the clustered Columnstore index:
CREATE TABLE maxiangqian( id [int] NOT NULL, age [int] NOT NULL, sex [tinyint] NOT NULL, name varchar(20)); GO create index pid on maxiagnqian(id) drop index pid on maxiangqian CREATE CLUSTERED COLUMNSTORE INDEX cci_Simple ON maxiangqian; GO :
It basically satisfies some of the features you create for Columnstore, so let's look at how to convert a clustered Columnstore table into a normal table:
CREATE CLUSTERED INDEX pid ON maxiangqian WITH ( DROP EXISTING = ON ); or DROP INDEX cci_Simple ON MyFactTable;
OK, we've basically been able to figure out how to create a Columnstore index.
However, we instruct the column store to delete only when the tag, so that if the column store is frequently updated to delete, the fragmentation will be very large, below we see how to eliminate fragmentation---reconstruction:
CREATE CLUSTERED COLUMNSTORE INDEX cci_Simple ON maxiagnqian WITH ( DROP_EXISTING = ON ); ALTER INDEX cci_Simple ON maxiangqian REBUILD PARTITION = ALL WITH ( DROP_EXISTING = ON );
Both of these methods are achievable.
In fact, for the column storage, sleeper, my side to give me a bigger surprise is the disk space savings, the compression ratio of columnstore can reach 10:1 or even 15:1, and relatively for my data warehouse some large-scale aggregation operation performance improvement. In the case of space saving and high performance, what are the reasons for not using column storage?
SQL Server 2014 Clustered column storage
Start building with 50+ products and up to 12 months usage for Elastic Compute Service