Check for empty rows in the kettle data stream

Source: Internet
Author: User
Check whether the empty line ETL processing in kettle data streams sometimes requires data generation but no data input. This may cause some problems. Therefore, the ETL data stream is usually required to generate a blank line of data; sometimes some clustering functions are required for processing, which means that when no data is input, the generated value is 0. This article describes how to detect and process empty row data streams.

Check whether the empty line ETL processing in kettle data streams sometimes requires data generation but no data input. This may cause some problems. Therefore, the ETL data stream is usually required to generate a blank line of data; sometimes some clustering functions are required for processing, which means that when no data is input, the generated value is 0. This article describes how to detect and process empty row data streams.

Check for empty rows in the kettle data stream

During ETL processing, sometimes data needs to be generated but data is not input, which may cause some problems. Therefore, ETL data streams are usually required to generate an empty row of data. Sometimes, clustering is required in processing, the value is 0 when no data is input. This document describes how to detect and process empty row data streams.

Example scenario

Assume that you need to read the input data to represent sales (there are three fields: product name, items_sold sales volume, and turnover sales amount ). the ETL process needs to calculate the total sales volume and total sales volume of the product. Here, the process is probably: read multiple rows of data from the input file, and then use the clustering function to generate the expected results.

This method is flawed because no output data is generated when there is no input data. In this example, you can switch the two input connection lines to test the results.

Solution 1: Use group

If you use the group by step to implement aggregation, you can set the total returned result rows, even if no input is made, enable the "Alwaysgive back a result row" option. As shown in:

Solution 2: Use the Detect empty stream step (to Detect empty data streams)

If this scenario is more complex and has more fields, we need a general solution to detect empty data streams. We use the "detect empty stream" step. Connect the Input Source (the source may be empty) to the empty step, and copy data from the empty step to the two branches. The "detect empty stream" step does not process rows with data streams, if no data is input, a data row is created and all field values are empty. This row indicates no data.

In the example, if the input connection fails, no data lines are entered. Then, manually modify product = "none", item_sold = 0, and turnover = 0.0 by using the javascript step, as shown in, when the input data is indeed empty, the "detect empty stream" Step generates an empty row and is updated to the expected output. Download the code.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.