Go Columnstore Performance Tuning

Source: Internet
Author: User

Fundamentals of Columnstore index-based performance

Columnstore indexes can speed up some queries by a factor of 10X to 100X in the same hardware depending on the query and D Ata. These key things make columnstore-based query processing so fast:

  • the Columnstore index itself stores data in highly compressed format,  With all column kept in a separate group of pages. This reduces I/o a lot for most data warehouse queries because many data warehouse fact tables contain + or more columns, While a typical query might touch is 5 or 6 columns. only the columns touched by the query must is read from disk.  only the more frequently accessed columns has a to take up space in main memory. The clustered B-tree or heap containing the primary copy of the data is normally used only to build the Columnstore, and W Ill typically not being accessed for the large majority of query processing. It ' ll be paged out of memory and won ' t take main memory resources during normal periods of query processing.
  • &NBSP; there is a highly efficient, vector-based query execution Method called "Batch processing" that works with the Columnstore index. A "Batch" is an object, the contains about $ rows. Each column within the batch is represented internally as a vector. batch processing can reduce CPU consumption 7X to 40X compared to the older, row-based query execution methods. efficient vector-based algorithms allow this by D Ramatically reducing the CPU overhead of basic filter, expression evaluation, projection, and join operations.< /span>
  • Segment Elimination can skip large chunks of data to speed up scans. Each partition in a Columnstore indexes was broken into one million row chunks called segments. Each segment have metadata that stores the minimum and maximum value of each column for the segment. The storage engine checks filter conditions against the metadata. If it can detect that no rows would qualify then it skips the entire segment without even reading it from disk.
  • The storage engine pushes filters down into the scans of data. This eliminates data early during query execution, improving query response time.

the Columnstore index and batch query execution mode is deeply integrated into SQL Serv Er. A particular query can be processed entirely in batch mode, entirely in the standard row mode, or with a combination of BA TCH and row-based processing. The key to getting the best performance are to make sure your queries process the large majority of data in bat CH mode. Even if the bulk of your query can ' t be executed in batch mode, you can still get significant performance be Nefits from Columnstore indexes through reduced I/O, and through pushing down of predicates to the storage engine.

To-tell if the main part of your query was running in batch mode, look at the graphical Showplan, hover the mouse pointer o Ver the most expensive scan operator (usually a scan of a large fact table) and check the ToolTip. It would say whether the estimated and actual execution mode was Row or Batch. See here for an example.

DOs and DON ' Ts for using columnstores effectively

Obeying the following do ' and Don ' TS would help you get the most out of columnstores for your decision support workload.

Dos

  • Put columnstore indexes on large tables only. Typically, you'll put them on your fact tables in your Data warehouse, and not the dimension tables. If you had a large dimension table, containing more than a few million rows and then you could want to put a Columnstore index On it as well.
  • Include Every column of the table in the Columnstore index. If you don't, then a query that references a column not included in the index would not be benefit from the Columnstores index Much or at all.
  • Structure your queries as star joinswith grouping and aggregation as much as possible. Avoid joining pairs of large tables. Join A single large the fact table to one or more smaller dimensions using the standard inner joins. Use a dimensional modeling approach for your data as much as possible to allow you to structure your queries the this.
  • Use the best practices for statistics management and query design. This is independent of columnstore technology. Use good statistics and avoid query design pitfalls to get the best performance. See the white Paper in SQL Server statistics for guidance. In particular, see the sections "Best practices for managing Statistics."

DON ' Ts

(Note:we is already working to improve the implementation-eliminate limitations associated with these "don ' ts" and we Anticipate fixing them sometime after the SQL Server release. We ' re not ready to announce a timetable yet.) Later, we ' ll describe how to work around the limitations.

  • Avoid joins and string filters directly on columns of columnstore-indexed tables. String filters don ' t get pushed down to scans on Columnstore indexes, and join processing on strings are less efficient t Han on integers. Filters on number and date types is pushed down. Consider using integer codes (or surrogate keys) instead of strings in columnstore indexed fact tables. You can move the string values to a dimension table. Joins on the integer columns normally'll be processed very efficiently.
  • Avoid use of the OUTER JOIN on columnstore-indexed tables. Outer joins don ' t benefit from batch processing. Instead, SQL Server reverts to row-at-a-time processing.
  • Avoid use of the not in on columnstore-indexed tables. Not IN (<subquery>) (which internally uses an operator called "Anti-semi-join") can prevent batch processing and CAU Se the system to revert to row mode. Not in (<list of constants>) typically works fine though.
  • Avoid use of the UNION all to directly combine columnstore-indexed tables with other tables. Batch processing doesn ' t get pushed down through UNION all. So, for example, creating a view vfact that does a UNION all of the tables, one with a columnstore indexes and one without , and then querying Vfact in a star join query, won't is use batch processing.

Original Url:sql Server Columnstore performance Tuning

Go Columnstore Performance Tuning

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.