in the absence of a real data warehouse database, now all data warehouses are just a relational database created based on the dimension model, but the Data Warehouse database itself has some differences and OLTP The unique characteristics of the database, such as the most significant is the largest number of data known as the fact of the table ( There are usually millions or billions of data ) The center of the connection, which is surrounded by a number of smaller cardinality tables called Dimensions ( may only have hundreds of rows of data ) and then the center of the large data volume of the fact table is connected to the dimension table by a foreign key to more than 10 or even dozens of small data volumes.
for this unique nature of the Data Warehouse, Microsoft SQL Server has introduced optimized features specifically for Data Warehouse queries:starjoin optimization and few-outer-row optimization. These two optimizations are built into the engine, and what we need to do is follow some notation to get the benefits from both optimizations.
Star JoinOptimized
As we said above, the qualitative design pattern of the Data warehouse is the fact table with large data in the middle, and a lot of dimension tables are scattered around the small data, and they are connected with each other by using the foreign key relationship. Queries for the Data warehouse share the same basic query pattern: Select several measures from the fact table, link to one or more dimension tables using foreign keys, and then set the filter conditions and aggregations on the non-primary key columns of the dimension. We call this pattern a star-shaped pattern,
and Starjoin optimization is a data Warehouse query-specific optimization, it is automatically identified by the exploratory method of a query is a Star model database query and identify the fact table, the query optimizer will then build a hash table for each of the participating dimension tables, Bitmap filters are then built on these hash tables and applied to the scan of the fact table. The constructed filter effectively excludes most of the rows that should be removed by the next join operation, so the data rows that need to be processed by the next operator are greatly reduced.
Then the question is, how does the engine detect the star pattern? It uses the following exploratory approach:
1. in statements with multiple joins , the largest table is considered a fact table
2. The size of a table that is considered a fact table must be greater than a specified minimum value
3. Two tables must be a inner join directly
4. A join between two tables must be based on a single column and is an equality predicate
Few-outer-rowOptimized
the optimization is specific to the absconded loop (Nestedloop join) , in some data warehouse queries, a dimension table that is outside a nested link is typically a parallel scan with filtering. However, if there is very little data filtered, especially if the data is still on an index page,SQL Server 2005 will pick up the dimension data in a single thread, which will cause all subsequent work to be done in a single thread, which in some cases provides greater than 1 degrees of parallelism cause imbalance in the environment.
SQL Server will detect this nested loop join and introduce exchange operators to distribute a small number of rows of external tables to multiple threads.
Data Warehouse-New optimization features for SQL Server 2008