Implement data verification and check in kettle, and kettle implement data verification

Source: Internet
Author: User

Implement data verification and check in kettle, and kettle implement data verification

Implement data verification and check in kettle

In ETL projects, input data usually cannot be consistent. There are some steps in kettle for data verification or check. The verification steps can verify the licensed fields based on some calculations; the filtering steps implement data filtering; and The javascript steps implement more complex calculations.

 

Generally, it is useful to view the data in a certain way. Because most ETL jobs run unattended, the ETL program usually notifies the ETL developer or administrator of these defects. We recommend that you store the problematic data rows in a specific public table to track the data. The table should contain metadata, such: name of the conversion, verification error, and error description.

 

InHereDownload the sample file. The csv input file records the two fitness sites that the customer arrives at and leaves. The Conversion Program verifies the customer ID, location name, date format, and rationality of the given date. Correct data is written to an excel file, and the error data is redirected to the error collection step. each row of error data is encapsulated into a character field, and some metadata information and Error Descriptions about the conversion are collected, finally, these error data rows are saved to another excel file.

In actual scenarios, the output step is more likely the table output step. The "get System Info" step will collect more data, and the subsequent incorrect mobile phone steps are recommended.SubconversionTo be reused in other conversions.

 

Saving verification errors in a structured manner also makes it possible to monitor data in a good place. After the ETL process is completed, you can simply send an email to the Administrator, which briefly describes the error code, data defects such as conversion name, BATCH_ID, and any metadata you need. If you work on DWH or BI, you already have the necessary tools. If you do not want to use the excel report method, you may still use kettle's job to create a short report file and mail it to the Administrator.

 

 

 

 

 


In the kettle graphic interface, data extraction and conversion can be directly implemented. Why do we need to save it as a ktr file first and then call it using a java program?

Can be run, non-graphical operation can reduce the resources consumed by the graphic interface, and there are many running methods, such as sample interface running, command line calling running, Program Calling running, remote machine running (cluster running).

After using kettle to complete data migration, how can we use kettle to compare the data in the source database with the migrated data?

Use kettle's core object: insert an update, which is determined based on the primary key.
 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.