Data Warehouse-New optimization features for SQL Server 2008

Source: Internet
Author: User
Tags joins

In the absence of a real data warehouse database, all data warehouses are now just a relational database created based on the dimension model, but the Data Warehouse database itself has some differences, such as the unique features of an OLTP database, such as the most significant of which is known as the fact table ( Generally there are millions or billions of data in the center of the connection, which is surrounded by a lot of smaller cardinality called the dimension of the table (probably only hundreds of rows of data), and then the center of the large data volume of the fact table through the foreign key to connect to more than 10 or even dozens of small data volume of the dimension table.

In response to this unique nature of the Data Warehouse, Microsoft introduced an optimization feature specifically for Data Warehouse queries after SQL Server 2008:Star Join optimization and few-outer-row optimization. These two optimizations are built into the engine, and what we need to do is follow some notation to get the benefits from both optimizations.

Star Join optimization

As we said above, the qualitative design pattern of the Data warehouse is the fact table with large data in the middle, and a lot of dimension tables are scattered around the small data, and they are connected with each other by using the foreign key relationship. Queries for the Data warehouse share the same basic query pattern: Select several measures from the fact table, link to one or more dimension tables using foreign keys, and then set the filter conditions and aggregations on the non-primary key columns of the dimension. We call this pattern a star-shaped pattern,

The star join optimization is a data Warehouse query-specific optimization, which uses the exploratory approach to automatically identify a query for the Star model database and identify the fact table, the query optimizer will then build a hash table for each participating dimension table, Bitmap filters are then built on these hash tables and applied to the scan of the fact table. The constructed filter effectively excludes most of the rows that should be removed by the next join operation, so the data rows that need to be processed by the next operator are greatly reduced.

Then the question is, how does the engine detect the star pattern? It uses the following exploratory approach:

    1. In statements with multiple joins, the largest table is considered a fact table
    2. The size of a table that is considered a fact table must be greater than a specified minimum value
    3. Two tables must be a INNER join directly
    4. Joins between two tables must be based on a single column and are equal predicates
Few-outer-row optimization

The optimization is specific to the absconded loop (nested loop join), in some data warehouse queries, the dimension table outside the nested link is typically a parallel scan with filtering. However, if there is only very little data filtered, especially if the data is still on an index page, SQL Server 2005 will select these dimension data in a single thread, which results in all subsequent work being done in a single thread. This can cause imbalances in some environments that provide more than 1 degrees of parallelism.

SQL Server 2008 detects this nested loop join and introduces Exchange operators to distribute a small number of rows of external tables to multiple threads.

Data Warehouse-New optimization features for SQL Server 2008

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.