Use the SSIS Slow Changing Transformation component to manage slowly Changing dimensions and ssistransformation

Source: Internet
Author: User
Tags ssis

Use the SSIS Slow Changing Transformation component to manage slowly Changing dimensions and ssistransformation

Recently, I tried to use the Slow Changing Transformation component that comes with SSIS to process the Slow change dimension. As you can see in a detailed article, follow the steps to perform simultaneous translation. The original website is from: Managing Slowly Changing Dimension with Slow Changing Transformation in SSIS.

 

Introduction

As a database expert or ETL developer, you may occasionally encounter scenarios where maintenance and management are slow. There are multiple methods to implement SQL Server. The simplest is to use Slowly Changing DimensionTransformation In The SSIS data stream component.

In this article, I will use an example to provide steps and guidance on how to use SSIS's Slowly Changing DimensionTransformation to manage slow changes.

 

Understanding the scenarios of slow change dimension

 

Dimensions are terms in data management and data warehouse. It refers to the logical grouping data, such as geographical location, customer or product information. Through the slow change dimension (SCDs), data changes slowly rather than regularly based on time .~ Wikipedia

There are different types of slow changes:

  • SCD Type 0 (Fixed)-This type is the least commonly used. It is fixed after the first insertion and does not accept changes. This means that the data will not be overwritten once written.
  • SCD Type 1 (Changing)-If the data is changed, it will be overwritten by the new value.

Related Articles:

    • SQL Server 2012 Integration Services-Package Deployment
    • SQL Server 2012 Integration Services-Package and Project Parameters
    • SQL Server 2012 Integration Services-Package Variables
    • SQL Server 2012 Integration Services-Package and Project deployments
    • SQL Server 2012 Integration Services-Unattended Execution of SSIS Packages
    • SQL Server 2012 Integration Services-GUI-Friendly Ways of Managing Execution of SSIS Packages

For example, consider this example:

 

SupplierCode

SupplierName

Address

S0000001

ABC Company

USA

S0000002

XYZ Corporation

USA

 

If the supplier name is changed over time, as you can see below, the supplier name has been updated by the new record. It seems very simple to implement, but it cannot trace the history.

 

SupplierCode

SupplierName

Address

S0000001

ABC Company Ltd.

USA

S0000002

XYZ Corporation

USA

 

  • SCD Type 2 (Historical)-In this type, if the data is changed, it will save a new record, and the old record is outdated by the flag.

 

SupplierCode

SupplierName

Address

EffectiveDate

Expiration Date

S0000001

ABC Company

USA

3/2/2013

3/2/2013

S0000002

XYZ Corporation

USA

3/2/2013

S0000001

ABC Company Ltd.

USA

3/3/2013

 

  • Different people use different methods to maintain SCD type 2. For example, one method is to add a valid date and an expiration date to indicate that the record is active. If the end date is NULL, the current record is active. Another method is to add a flag column to indicate the current activity record. People usually use the first method or the combination of the two.
  •  
  • SCD Type 4 (Limited history)-This is not a common type because only limited changes can be maintained. In this SCD type, add additional columns in the table to save the old values.

 

SupplierCode

SupplierName

Address

OldSupplierName

S0000001

ABC Company

USA

ABC Company Ltd.

S0000002

XYZ Corporation

USA

 

There are multiple ways to achieve slow Changing dimensions in SQLServer. The simplest is to use Slowly Changing Dimension Transformation in the SSIS data stream component. Despite some restrictions, these restrictions will be mentioned at the end of the article.

Before I begin to explain the Slowly Changing Dimension Transformation component, let me explain the proxy key and why it is important to the data warehouse. We often add a meaningless key in the dimension called the proxy key. The proxy key is usually an integer that acts as a unique or primary key for a table only and acts as a foreign key constraint for a fact table. The proxy key is only important for slow management changes.

 

Use Slowly Changing Dimension Transformation

 

First, create a supplier table and add some data. As you can see, I added the SupplierCode field as the primary key and used it as the business key.

 

USE [AdventureWorks2012]

GO

Create table [dbo]. [Supplier] (

[SupplierCode] CHAR (8) primary key,

[SupplierName] [varchar] (50) NULL,

[Address] [varchar] (50) NULL,

) ON [PRIMARY]

GO

Insert into [dbo]. [Supplier] ([SupplierCode], [SupplierName], [Address])

VALUES

('S000000', 'abc Company ', 'usa '),

('S000000', 'xyz Corporation ', 'usa ')

GO

SELECT * FROM [dbo]. [Supplier]

 

Now we create a dimension table to store supplier information. You should note that I have added the SupplierId field as the proxy key. The validity period and deadline are used to track historical changes. In addition, I added the CurrentFlag column to indicate whether the current record is active.

 

USE [AdventureWorks2012]

GO

Create table [dbo]. [DimSupplier] (

[SupplierId] [int] IDENTITY (1, 1) NOTNULL,

[SupplierCode] CHAR (8 ),

[SupplierName] [varchar] (50) NULL,

[Address] [varchar] (50) NULL,

[Interval tivedate] [date] NULL,

[ExpirationDate] [date] NULL,

[CurrentFlag] [char] (1) NULL,

CONSTRAINT [PK_DimSupplier] primary key clustered ([SupplierId] ASC)

) ON [PRIMARY]

GO

 

So far, everything went smoothly. Now we have created an SSIS package, added a Data Flow task, and dragged the data source component to get data from the original table. Now a SlowlyChanging Dimension Transformation component is added to connect to the data source component above. Double-click SlowlyChanging Dimension Transformation to modify it. The Wizard interface is as follows:

 


Slowly Changing Dimension Wizard

 

Click Next to go to the next page. On the New Page, select the target dimension table and field ing. Then you need to specify the instance from the source table as the business key. In my example, SupplierCode is the primary key of the source table, so I use it as the business key, as shown below:

 


Business key

 

Click the Next button to go to the Next page of The Wizard. In this page, you must specify whether the fields in the dimension table are processed as the content of the content in the content table.

 


Specify each column of the dimension

 

In my example, I select the address column for processing as SCD Type 1 and the name for processing as SCD Type 2, as shown below:

 


SCD Type 1 and SCD Type 2

 

Click Next to go to the Next page of The Wizard. You need to specify the start date (Effective Date) end date column (effective period) on this page (because a column is processed as SCD Type 2 ), and set the generation date variable,


Start and End Dates

 

Click Next to go to the Next page of The Wizard. on this page, specify the Dimension member settings.

 


Inferred Dimension Members

 

Click Next to go to the Next page of The Wizard. Click Finish to complete the configuration. The following is what we see in the data flow task:

 


Complete the Wizard

 

The slow change dimension wizard adds several task management slow change dimensions based on your selection and configuration. On the above screen, the "Changing Attribute Updates Output" path will be the content of the update record for the content of the content to be updated. The "New Output" path adds New entries to the unique table to maintain historical records. The "Historical Attribute Inserts Output" path updates the records in the previous expiration date column.

 

When you run the package for the first time, you will notice that the two records in the source table are loaded into the dimension table,

 


The Dimension Table

 

Run the following statement to verify the data in the Supplier dimension table:

 

USE [AdventureWorks2012]

GO

SELECT * FROM [dbo]. [DimSupplier]

GO

 

This is the result you saw after executing the above script, which is similar to what we expected:

 


Results of executed query

 

Now we open-source tables and use the following script to update some records. I will change the supplier name of SupplierCode = 's0000001.

 

USE [AdventureWorks2012]

GO

UPDATE [dbo]. [Supplier]

SET [SupplierName] = 'abc Company Ltd .'

WHERE [SupplierCode] = 's0000001'

GO

SELECT * FROM [dbo]. [Supplier]

GO

 

Now run the package again, you will see that a record (new) has been inserted and a record (old) has been updated or marked as obsolete. This is because the updated column is configured as SCD Type 2:

 


One record inserted and one record outdated

 

Run the preceding query statement to verify the data. As we predicted, SupplierCode = 's0000001' has two records. The previous record has updated the surface data as of date, and the latest record corresponds to the latest Supplier name:

 

USE [AdventureWorks2012]

GO

SELECT * FROM [dbo]. [DimSupplier]

GO

 


Query results

 

Restrictions:

Slowly Changing Dimension transformation is designed to be easy to use, mainly for small Dimension tables. As we can see above, Slowly Changing Dimension Transformation is an out-of-the-box SSIS component that can be quickly configured for small dimensions. However, Slowly ChangingDimension Transformation is not suitable for all situations, especially when your dimensions are large. The following are some reasons:

· Slowly ChangingDimension transformation adds components to the data flow task according to your configuration to manage slow change dimensions. If some customization is made on these components, and then Slowly Changing Dimension transformation is modified again, your customization will be lost.

  • The performance of large dimensions is slow because no data is cached.
  • You can only use SQL Server.
  • It uses the OLEDB command for row update instead of batch update.

Conclusion

In this article, I talked about the slow-Changing Dimension Conversion. The out-of-the-box Toolkit provided in SSIS allows you to easily and quickly configure and manage small slowly changing dimensions. In the next article, I will discuss some options for you to manage more slowly changing dimensions.

 




Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.