Kettle (1) conversion, step, and jumper

Last Update:2014-07-31 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Term definition:

Transformation ----- Conversion

Step ---------------- step

Hops -------------- jumper

One step is the minimum Execution Unit in kettle, which is used to implement a specified single logical task.

A conversion is a mesh structure composed of a batch of steps to implement a relatively complete task. The conversion actually defines the data flow. Let's take a look at an example:

Is a conversion. It reads data from text files, filters and sorts the data, and finally loads the result data to a table in a relational database. When an error occurs during data filtering, the data flow is null.

Conversion is essentially a directed graph that depicts a set of data conversion logic. In kettle, the suffix of the conversion file is. Ktr.

The two core components related to the conversion are step and hop wiring:

Step: it is a building block for conversion, such as text file input and output steps. In kettle, there are approximately 140 steps and are grouped based on functions, such as input steps, output steps, and script steps. Each step in the conversion is used to perform specific operations, such as input operations and sorting operations. You can edit the configuration conversion to meet your needs.

Jumper: used to connect each step in the conversion and pass the metadata of the previous or previous step to the next step. In, it seems that each step is executed in sequence, but this is not the case. The Skip wiring does not depend on the sequence of Data Execution bodies (that is, steps) when determining the data flow between steps. When the conversion is started, each step creates various threads and pushes and transmits data.

Note]

All steps are in parallel at startup and runtime, so the initialization sequence between steps is not fixed. This is why the variables set in the previous step cannot be used in subsequent steps.

You can double-click the step to edit it, and press SHIFT + the left mouse button to connect the step.

One step can have multiple connections, and the data streams in the conversion can flow among multiple steps. In kettle IDE, the Jumper wiring is represented by arrows, the direction of the arrow determines the data flow. When the result set of a step is pushed to multiple steps, data can be transmitted in the copy or distribution mode. Replication means that the data received by each subsequent step is the complete result set of the previous step. Distribution means that the data received by each subsequent step is the result set of the previous step, for example, if step a pushes data to step B and C in a distributed manner, B receives 1, 3, and 5 of the result set .. Number record. The data received by C is 2, 4, and 6 of the result set .. Number record.

/******************************* Alien Jordan shot time ***** **********************************/

Because the startup sequence of the steps in the conversion is not fixed and runs independently in their respective threads in parallel, the variables set in the previous step may be

If no result is obtained, the output result of each step can be pushed to the next step by the jumper.

Therefore, before all the steps are started, you cannot ensure that the data in the previous step is obtained by the subsequent steps. When all the steps are running,

The Skip wiring ensures that the output results of the previous step can be obtained in the subsequent steps.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Kettle (1) conversion, step, and jumper

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Kettle (1) conversion, step, and jumper

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support