Data Warehouse-New optimization features for SQL Server 2008

Source: Internet
Author: User

in the absence of a real data warehouse database, now all data warehouses are just a relational database created based on the dimension model, but the Data Warehouse database itself has some differences and OLTP The unique characteristics of the database, such as the most significant is the largest number of data known as the fact of the table ( There are usually millions or billions of data ) The center of the connection, which is surrounded by a number of smaller cardinality tables called Dimensions ( may only have hundreds of rows of data ) and then the center of the large data volume of the fact table is connected to the dimension table by a foreign key to more than 10 or even dozens of small data volumes.

for this unique nature of the Data Warehouse, Microsoft SQL Server has introduced optimized features specifically for Data Warehouse queries:starjoin optimization and few-outer-row optimization. These two optimizations are built into the engine, and what we need to do is follow some notation to get the benefits from both optimizations.

Star JoinOptimized

As we said above, the qualitative design pattern of the Data warehouse is the fact table with large data in the middle, and a lot of dimension tables are scattered around the small data, and they are connected with each other by using the foreign key relationship. Queries for the Data warehouse share the same basic query pattern: Select several measures from the fact table, link to one or more dimension tables using foreign keys, and then set the filter conditions and aggregations on the non-primary key columns of the dimension. We call this pattern a star-shaped pattern,

and Starjoin optimization is a data Warehouse query-specific optimization, it is automatically identified by the exploratory method of a query is a Star model database query and identify the fact table, the query optimizer will then build a hash table for each of the participating dimension tables, Bitmap filters are then built on these hash tables and applied to the scan of the fact table. The constructed filter effectively excludes most of the rows that should be removed by the next join operation, so the data rows that need to be processed by the next operator are greatly reduced.

Then the question is, how does the engine detect the star pattern? It uses the following exploratory approach:

1. in statements with multiple joins , the largest table is considered a fact table

2. The size of a table that is considered a fact table must be greater than a specified minimum value

3. Two tables must be a inner join directly

4. A join between two tables must be based on a single column and is an equality predicate

Few-outer-rowOptimized

the optimization is specific to the absconded loop (Nestedloop join) , in some data warehouse queries, a dimension table that is outside a nested link is typically a parallel scan with filtering. However, if there is very little data filtered, especially if the data is still on an index page,SQL Server 2005 will pick up the dimension data in a single thread, which will cause all subsequent work to be done in a single thread, which in some cases provides greater than 1 degrees of parallelism cause imbalance in the environment.

SQL Server will detect this nested loop join and introduce exchange operators to distribute a small number of rows of external tables to multiple threads.


Data Warehouse-New optimization features for SQL Server 2008

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.