Implement data verification and check in kettle
In ETL projects, input data usually cannot be consistent. There are some steps in kettle for data verification or check. The verification steps can verify the licensed fields based on some calculations; the filtering steps implement data filtering; and The javascript steps implement more complex calculations.
Generally, it is useful to view the data in a certain way. Because most ETL jobs run unattended, the ETL program usually notifies the ETL developer or administrator of these defects. We recommend that you store the problematic data rows in a specific public table to track the data. The table should contain metadata, such: name of the conversion, verification error, and error description.
InHereDownload the sample file. The CSV input file records the two fitness sites that the customer arrives at and leaves. The Conversion Program verifies the customer ID, location name, date format, and rationality of the given date. Correct data is written to an Excel file, and the error data is redirected to the error collection step. each row of error data is encapsulated into a character field, and some metadata information and Error Descriptions about the conversion are collected, finally, these error data rows are saved to another Excel file.
In actual scenarios, the output step is more likely the table output step. The "Get system info" step will collect more data, and the subsequent incorrect mobile phone steps are recommended.SubconversionTo be reused in other conversions.
Saving verification errors in a structured manner also makes it possible to monitor data in a good place. After the ETL process is completed, you can simply send an email to the Administrator, which briefly describes the error code, data defects such as conversion name, batch_id, and any metadata you need. If you work on dwh or bi, you already have the necessary tools. If you do not want to use the Excel report method, you may still use kettle's job to create a short report file and mail it to the Administrator.
Implement data verification and check in kettle