Zookeeper
Check for empty rows in the kettle data stream
During ETL processing, sometimes data needs to be generated but data is not input, which may cause some problems. Therefore, ETL data streams are usually required to generate an empty row of data. Sometimes, clustering is required in processing, the value is 0 when no data is input. This document describes how to detect and process empty row data streams.
Example scenario
Assume that you need to read the input data to represent sales (there are three fields: product name, items_sold sales volume, and turnover sales amount ). the ETL process needs to calculate the total sales volume and total sales volume of the product. Here, the process is probably: read multiple rows of data from the input file, and then use the clustering function to generate the expected results.
This method is flawed because no output data is generated when there is no input data. In this example, you can switch the two input connection lines to test the results.
Solution 1: Use group
If you use the group by step to implement aggregation, you can set the total returned result rows, even if no input is made, enable the "alwaysgive back a result row" option. As shown in:
Solution 2: Use the Detect Empty stream step (to Detect Empty data streams)
If this scenario is more complex and has more fields, we need a general solution to Detect Empty data streams. We use the "Detect Empty stream" step. Connect the Input Source (the source may be empty) to the empty step, and copy data from the empty step to the two branches. The "Detect Empty stream" step does not process rows with data streams, if no data is input, a data row is created and all field values are empty. This row indicates no data.
In the example, if the input connection fails, no data lines are entered. Then, manually modify product = "NONE", item_sold = 0, and turnover = 0.0 by using the Javascript step, as shown in, when the input data is indeed empty, the "Detect Empty stream" Step generates an empty row and is updated to the expected output. Download the code.
Check for empty rows in the kettle data stream