--Columnstore Index 1: initial Columnstore Index

Source: Internet
Author: User

2012 later provides an index type that differs from the traditional B-tree structure, which is the memory Columnstore index. This index applies a column-based storage pattern, a new batch mode for query execution, and provides a huge performance boost for specific workloads. How is it built? How does it work? And why is there such a big improvement in performance, then we'll explain it with a concise description and a detailed example.

So what exactly is a columnstore index? Most of the time, Columnstore indexes are described as a function of a data warehouse and a data report. In fact, you are most likely to use such an index in this case. However, even in an OLTP database, you will encounter some reports that you want to get data from a large number of data tables, which are very slow. With proper planning and careful use, even these reports can be used to improve performance with Columnstore indexes. An important premise is that the data is very large and the Columnstore index is used with large data tables. Although there is no clear minimum requirement, as an experience, I recommend that at least 10 million of the row data be in a single table to benefit from the Columnstore index.

For the example in this series, it will be used ContosoRetailDW as a demo database: http://www.microsoft.com/en-us/download/details.aspx?id=18279, which is a 626MB database backup, A database of about 1.2GB size is a bit small for columnstore indexes, but is large enough for presentation. The database itself does not contain any Columnstore indexes, in fact, it is not a bad thing, in order to better reflect the advantages of Columnstore indexes, we will compare the performance of the same query with and without Columnstore indexes. The following example is a typical query from a bi information worker.

With Contosoproductsas (SELECT *    from   dbo. DimProduct    WHERE  brandname                    = ' Contoso ') SELECT     CP. ProductName,           DD. CalendarQuarter,           COUNT (FOS. SalesOrderNumber) as Numorders,           SUM (FOS. salesquantity) as      quantitysoldfrom       dbo. Factonlinesales as         fosinner JOIN dbo. DimDate as                 DD      on   DD. DateKey                   = fos. Datekeyinner JOIN contosoproducts as             CP      on   CP. ProductKey                = fos. Productkeygroup by   CP. ProductName,           DD. Calendarquarterorder by   CP. ProductName,           DD. CalendarQuarter;

Listing 1: A typical bi query

In my notebook, the query took an average of 6.27 seconds to read the data already in the cache, and it would take 8.11 seconds for the data to be read directly from the hard disk. Since FactOnlineSales there are more than 12500000 rows of data in the table, this query must scan the entire clustered index, which is good, but if you face such a query all day long, such a slow response will become a very disgusting thing, Also can think of if the database is 10 times times even a hundredfold size when what kind of performance?

Note that these execution times are based on the use of hardware devices, and if repeated execution of these tests on a high-end device, these queries can be very fast. Of course, if it were on a cheap notebook three years ago, it would be more slowly executed. However, even so, we will see a significant increase in execution efficiency after the Columnstore index is created.

Creating a Columnstore Index

There are two types of Columnstore indexes: clustered and non-clustered. There are many similarities between the two, and there are many differences. One difference is that there is only a nonclustered Columnstore index in 2012. The clustered version was added in 2014. We will create a nonclustered Columnstore index so that the reader can implement it without SQLServer2014.

CREATE nonclustered COLUMNSTORE INDEX nci_factonlinesaleson dbo. Factonlinesales   (Onlinesaleskey,    DateKey,    storekey,    ProductKey,    Promotionkey,    CurrencyKey,    Customerkey,    SalesOrderNumber,    salesorderlinenumber,    salesquantity,    SalesAmount,    returnquantity,    returnamount,    discountquantity,    Discountamount,    TotalCost,    unitcost,    UnitPrice,    etlloadid,    loaddate,    updatedate);

Listing 2: Creating a nonclustered Storage index

Performing this creation will take some time (I have to wait for nearly 43 seconds), but this is a one-off operation that will be done at night in the real Data warehouse. Once the index is created, it increases the efficiency of many queries in SQL Server.

What have we got? Advantages

When we run the code for listing 1 again, the result is the same as before, but the result is almost instantaneous. The entire query took only 0.34 seconds, which was more than 18 times times faster than the Columnstore index was previously added. Of course, even if the Columnstore index slows down, it takes about 1.54 seconds to read from the hard disk, but it's still 5 times times faster than the previous 8.11 seconds.

Disadvantages

This performance gain from nonclustered Columnstore indexes is impressive, but it also needs to be very careful when writing queries. Almost every table query with a Columnstore index can be more efficient, but you have to write your code with a lot of restrictions to get greater performance potential. One such restriction, for example, has to do with external connections.

If the programmer writing the listing 1 code intends to have all the products brandname as "Contoso", even if they have not been sold, are included in the results, then the inner join must be changed to the right Outer join, as follows Listing3 As shown in:

With Contosoproducts
As (SELECT *
FROM dbo. DimProduct
WHERE brandname = ' Contoso ')
SELECT CP. ProductName,
Dd. CalendarQuarter,
COUNT (FOS. SalesOrderNumber) as Numorders,
SUM (FOS. salesquantity) as Quantitysold
FROM dbo. Factonlinesales as Fos
INNER JOIN dbo. DimDate as DD
On DD. DateKey = Fos. DateKey
Right JOIN contosoproducts as CP
On CP. ProductKey = Fos. ProductKey
GROUP by CP. ProductName,
Dd. CalendarQuarter
ORDER by CP. ProductName,
Dd. CalendarQuarter;

Listing 3: Introducing an external link

In the absence of a Columnstore index (or with a hint of impersonation ignoring the Columnstore index), the query ran for 6.71 seconds when the data was already in the cache. Contains the additional cost of the change resulting from the execution plan, which took approximately 0.44 seconds to increase by nearly 7.

When you run the query without prompting in my SQLServer2012, the optimizer immediately selects an execution plan with a Columnstore index, and the result is as fast as expected, approaching 4.24 seconds. Of course, this is still more efficient than the 6.71-second non-columnstore index, but there is no significant change compared to the previous 0.34-second situation, so why is it that the change from inner to outer has been so great when the Columnstore index is applied at the same time?

Batch processing mode

Columnstore indexes are the result of using a pattern called "Batch execution mode" that executes a query in a completely different way, but there are a number of limitations to this pattern in 2012, with only a few operators available to use this pattern, as long as the operators that are no longer in these operators are used. This query is returned to the original query pattern. For example, the outer join is such an operator that will cause the query to return to the row pattern, although some performance gains can be obtained, but not significantly improved from the batch mode.

The quickest way to verify this pattern is to view the query's image in SSMs by executing the plan. Check the two properties "Estimated execution mode" and "Actual execution mode" for an example of query execution plan in batch mode, and two properties are batch.

Figure 1-1: Execution plan appears as batch

Of course, in the 2014 batch processing mode operators increased a lot, of which outer join is one of them, in short, in performance and limitations, 2014 have significantly improved, this is beyond doubt.

Contrast effect.

There is no easy way to predict performance improvements when you create a Columnstore index. The benefits of comparing query performance in real-world environments, or testing comparisons in a test environment as realistic as possible, are now available.

For queries that can run in batch mode, we've seen performance gains of 5 to 70 times times after adding a Columnstore index, compared to row-mode queries, where performance is always smaller and generally 50% to 20 times times higher.

Summarize

You can improve performance by using Columnstore indexes with two factors. One is to save I/O through the new index schema, and the other is batch mode. Unfortunately, only a few operators in SQLServer2012 can use Columnstore indexes, causing many queries to be enforced in row mode, losing the performance of batch mode. The good news, though, is that the vast majority of the restrictions have been perfected in SQL Server 2014.

--Columnstore Index 1: initial Columnstore Index

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.