Original address: Http://social.technet.microsoft.com/wiki/contents/articles/4995.sql-server-columnstore-performance-tuning.aspx
SQL Server's Columnstore index is a new version of SQL Server release that improves query performance for the Data Warehouse, and this article explains the performance tuning of Columnstore.
Basic principles of Columnstore Index performance
With the same hard disk and data volume, columnstore can significantly increase the speed of some queries. The factors that result in a high efficiency of columnstore queries include the following:
1. The Columnstore index stores data in a highly compressed form, and each column is stored on a different page grouping. Because most tables in the data Warehouse contain more than 30 columns, a typical query might require only 5, 6 columns, and only the columns that are displayed in the query need to be read from disk, so using Columnstore can greatly reduce the query I/O for the Data warehouse. Columns that are frequently queried are cached in memory, and clustered B-trees or heaps that contain data backups are often used to build column indexes and perform most queries. During the period of the query, he releases the memory space and resources on a regular basis.
2. Batching is a more efficient, vector-based query method that uses Columnstore indexes. A batch is a collection that contains 1000 rows. Batch processing can reduce the CPU consumption by 7% to 40% compared to the old row-based query method. Efficient vector operations allow batch processing to reduce the CPU cost of basic filtering, expression estimation, execution planning, and federated queries.
3. Block splitting can skip most of the data to speed up data scanning. Each section in the Columnstore index is divided into 1000 rows, each with metadata that stores the maximum and minimum values for each column within the block. The storage engine uses metadata to detect filtering conditions. If the engine detects that there are no compliant data in the block, it skips the block data, and the data is not read and written to the disk.
4. The storage engine can remove some of the data before querying, which can improve the response speed of the query.
SQL Server has refined the pattern of Columnstore indexes and batch queries. A partial query can be executed as a batch query pattern, a standard row query pattern, or a mix of batch and standard row query patterns. The key to getting the best performance is to confirm that most of the data in the query process is in batch mode. Even if a large number of queries cannot be executed in batch mode, using a Columnstore index can also achieve significant performance gains by reducing disk I/O and pre-filtering of the storage engine.
With the show execution plan on the interface, place your mouse over the expensive scan operation to see the prompts. It shows whether the estimated and actual execution modes are row mode or batch mode.
What is the use of Columnstore indexes
1. Apply the Columnstore index only to a table with a large amount of data. The Columnstore index should be created on the actual expense table of the Data warehouse, not a definition table. If you have a definition table that contains at least millions of data, you might also consider applying a columnstore index to this definition table.
2. Use the Columnstore index to include each column of the table. If you do not, querying columns that are not in the Columnstore index will not benefit from the efficiency of the Columnstore index.
3. Structured queries, avoid associating two or more large tables, and increase efficiency by associating multiple small definition tables with a large actual expense table.
when to use Columnstore indexes is not recommended
1. Avoid direct string filtering on columns that store the index and association of tables through string columns. String filtering cannot be swept using a column index, and associating a field with a string is less efficient than using a numeric type association. Filtering on numeric types and time types can be scanned using a Columnstore index, and in a large table in the real world, you might consider using shaped encodings instead of string types, and transfer string types to a small definition table, which is associated by shaping the encoding so that queries are more efficient.
2. Avoid using outer joins (out joins) on columns that store indexes on columns, outer joins cannot apply bulk processing of queries, instead, they are handled by a lower-version row query.
3. Avoid using a non-contained statement (not in) on the column that stores the index for the same reason as 2nd.
4. Avoid associating other tables with the UNION ALL statement on columns that store the index. The reason is ditto.
Due to the limited ability of English, translation of the statement may be ambiguous, if not understood, please read the original text.
SQL Server Column Storage performance tuning (translation)