We often encounter this problem in the design of the Data Warehouse: If the dimension has only one attribute in the dimension design, is the choice to create a single dimension for this attribute, or will the attribute of that dimension be placed directly in the fact table as part of the fact table?
Suppose there is a dimension, which is usually designed with at least two columns (Dimkey and Dimattribute properties), and the fact table is associated to this dimension through Dimkey. First of all, in the query phase of a multi-table JOIN relationship comparison of single-table query is certainly less efficient, we look at the following example:
CREATEdim_table (Dim_keyINT PRIMARY KEY IDENTITY(1,1), Dim_attrNVARCHAR( -))CREATEfact_table (Dim_keyINT FOREIGN KEY REFERENCESdim_table (Dim_key), MEASUREDECIMAL( -,2))
The query for a typical star structure is as follows:
SELECT d.dim_attr, SUM as Total from as F INNER JOIN as D on = D.dim_key GROUP by D.dim_attr
If you put this attribute directly in the FACT table, the results and queries are as follows:
create table Fact_table_2 (dim_attr int foreign key references Dim_table (Dim_key), MEASURE decimal (18 , 2 Span style= "color: #0000ff;" >select sum (MEASURE) as Total from fact_table_2 group by dim_attr
Our queries and aggregations are much simpler and more efficient in terms of query efficiency. But why do we usually choose to put this individual attribute or the dimension table, here are a few reasons we need to consider:
1. If the fact table is very large, using the Dim_key INT type 4 Bytes with respect to Dim_attr's NVARCHAR (20) type can significantly reduce the volume of the fact table.
2. If this attribute value changes in the source business system, it means that we want to update all the property values associated with the property in the fact table.
3. It is possible today that the dimension does have only one attribute, but who can ensure that the dimension does not add any other related attributes?
Data Warehouse design is an iterative development process, the development of a year, maintenance for several years, if we can consider the above reasons, we can clearly consider whether in the design phase of the need to select a single attribute as a dimension to design.
For more bi articles, see the Bi-series essay list (SSIS, SSRS, SSAS, MDX, SQL Server) If you feel this article has helped you, please help recommend it to make it easy for others to see these articles quickly in the Biwork blog recommendations Bar.
Dimension design of a property of small knowledge of data warehouse design