Check the empty rows in the kettle data stream.

Source: Internet
Author: User

Check the empty rows in the kettle data stream.
Zookeeper

Check for empty rows in the kettle data stream

 

During ETL processing, sometimes data needs to be generated but data is not input, which may cause some problems. Therefore, ETL data streams are usually required to generate an empty row of data. Sometimes, clustering is required in processing, the value is 0 when no data is input. This document describes how to detect and process empty row data streams.

Example scenario

Assume that you need to read the input data to represent sales (there are three fields: product name, items_sold sales volume, and turnover sales amount ). the ETL process needs to calculate the total sales volume and total sales volume of the product. Here, the process is probably: read multiple rows of data from the input file, and then use the clustering function to generate the expected results.

This method is flawed because no output data is generated when there is no input data. In this example, you can switch the two input connection lines to test the results.

Solution 1: Use group

If you use the group by step to implement aggregation, you can set the total returned result rows, even if no input is made, enable the "Alwaysgive back a result row" option. As shown in:

 

Solution 2: Use the Detect empty stream step (to Detect empty data streams)

If this scenario is more complex and has more fields, we need a general solution to detect empty data streams. We use the "detect empty stream" step. Connect the Input Source (the source may be empty) to the empty step, and copy data from the empty step to the two branches. The "detect empty stream" step does not process rows with data streams, if no data is input, a data row is created and all field values are empty. This row indicates no data.

In the example, if the input connection fails, no data lines are entered. Then, manually modify product = "none", item_sold = 0, and turnover = 0.0 by using the javascript step, as shown in, when the input data is indeed empty, the "detect empty stream" Step generates an empty row and is updated to the expected output. Download the code.

 

 

 


Kettle processes data from two data streams

Using Merge Join, you should use the INNER connection [INNER] method to "filter data in two fileds at the same time" as you said, then, output the required fields in the subsequent steps.
 
What can I do if Multiple CGI commands are submitted?

Write CGI programs in C Language
1. CGI Overview
CGI (Public Gateway Interface) specifies the interface protocol standard for Web servers to call other executable programs (CGI programs. The Web server calls the CGI program to implement interaction with the Web browser, that is, the CGI program receives and processes the information sent to the Web server by the Web browser, send the response to the Web server and Web browser. CGI programs generally process Form data on Web pages, query databases, and integrate with traditional application systems. CGI programs can be written in any programming language, such as Shell scripting language, Perl, Fortran, Pascal, and C language. However, CGI programs written in C language have the features of fast execution speed and high security (because C language programs are compiled and executed and cannot be modified.

The CGI interface standard consists of three parts: standard input, environment variable, and standard output.

1. Standard Input

CGI programs, like other executable programs, can get input information from the Web server through standard input (stdin), such as data in Form, this is the so-called POST method for passing data to CGI programs. This means that the CGI program can be executed in the command line status of the operating system to debug the CGI program. The POST method is a commonly used method. This article uses this method as an example to analyze the methods, processes, and techniques of CGI program design.

2. Environment Variables

The operating system provides many environment variables that define the execution environment of the program and the application can access them. The Web server and CGI interface also set some environment variables to pass some important parameters to the CGI program. The cgi get method also transmits the data in Form to the CGI program through the Environment Variable QUERY-STRING.

3. standard output

The CGI program transmits the output information to the Web server through the standard output (stdout. The information sent to the Web server can be in various formats, usually in plain text or HTML text, so that we can debug CGI programs in the command line status and get their output.

The following is a simple CGI program that outputs the Form information in HTML directly to the We B browser.

Reference

# Include <stdio. h>
# Include <stdib. h>
Main ()
{
Int, I, n;
Printf (plaintext Contenttype: text/plainnn plaintext 〃);
N = 0;
If (getenv (Response CONTENT-LENGTH parameters 〃))
N = atoi (getenv (CONTENT-LENGTH Encoding 〃));
For (I = 0; I putchar (getchar ());
Putchar ('n ′);
Fflush (stdout );
}

The following is a brief analysis of this program.
Prinft (plaintext Contenttype: text/plainnn plaintext 〃);
This line uses the standard output to transmit the character string Contenttype: text/plainnn encode to the Web server. It is a MIME header that tells the Web server that subsequent output is in plain ASCII text format. Note that there are two new line characters in this header, because the Web server needs to see a blank line before the actual text information starts.
If (getenv (Response CONTENT-LENGTH parameters 〃))
N = atoi (getenv (Response CONTENT-LENGTH Encoding 〃));
This line first checks whether the CONTENT-LENGTH environment variable exists. The Web server sets this environment variable when calling a CGI program using the POST method. Its text value indicates the number of characters that the Web server sends to the CGI program, so we use the function atoi (...... remaining full text>

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.