Introduction: Verifying the requirements of DataStage operations
Today, companies are implementing information-centric projects to transform their businesses and achieve cost savings. Many data integration or information integration applications or processes contain ETL and serve as one of the components.
Typically, an ETL process (unit of work) is designed to perform the following tasks:
Extraction: Extracts data from the source system and collates it.
Transformations: Converts data to the desired format that can be used in the next step. Typically, this involves applying core business logic to transform data into information.
Loading: Typically, data is loaded into a database table/warehouse for the reporting engine to gain insights from transformed data.
Jobs in a data integration application experience two common life cycles
Porting/Migrating a job from an older version to a new version of the DataStage software or hardware running it.
Migrate jobs from the development environment to the test environment and then to the production environment.
Both of these use cases need to validate a large number of DataStage jobs. Businesses often verify that jobs that run in a new version of the software or in a new hardware environment will produce the same results as before, making them confident that the new system will replace the old system. Similarly, before you deploy a job in a data integration process to a production environment, you must identify the behavior that is expected in the development, testing, and production environments.
This article provides a step-by-step example of how DataStage users can use the IBM infosphere Optim Test Data Management Solution to validate the results of an ETL job.
Use the Optim Test Data solution for DataStage
In the validation process for the DataStage job, Optim Test Data Solution can be used to
Generating test data
Compare job output to one expected or datum output
During the validation process, the DataStage job references the generated test data as the input source. After the DataStage job is executed, a comparison step is performed to verify the final output.
The workflow can be represented as shown in the diagram.
Figure 1. Verifying the workflow of a DataStage job using Optim TDM
In subsequent sections, you will see an example of using the DataStage job to generate test data and then comparing the final result with the expected results to validate the job.