Term definition:
Transformation ----- Conversion
Step ---------------- step
Hops -------------- jumper
One step is the minimum Execution Unit in kettle, which is used to implement a specified single logical task.
A conversion is a mesh structure composed of a batch of steps to implement a relatively complete task. The conversion actually defines the data flow. Let's take a look at an example:
Is a conversion. It reads data from text files, filters and sorts the data, and finally loads the result data to a table in a relational database. When an error occurs during data filtering, the data flow is null.
Conversion is essentially a directed graph that depicts a set of data conversion logic. In kettle, the suffix of the conversion file is. Ktr.
The two core components related to the conversion are step and hop wiring:
Step: it is a building block for conversion, such as text file input and output steps. In kettle, there are approximately 140 steps and are grouped based on functions, such as input steps, output steps, and script steps. Each step in the conversion is used to perform specific operations, such as input operations and sorting operations. You can edit the configuration conversion to meet your needs.
Jumper: used to connect each step in the conversion and pass the metadata of the previous or previous step to the next step. In, it seems that each step is executed in sequence, but this is not the case. The Skip wiring does not depend on the sequence of Data Execution bodies (that is, steps) when determining the data flow between steps. When the conversion is started, each step creates various threads and pushes and transmits data.
Note]
All steps are in parallel at startup and runtime, so the initialization sequence between steps is not fixed. This is why the variables set in the previous step cannot be used in subsequent steps.
You can double-click the step to edit it, and press SHIFT + the left mouse button to connect the step.
One step can have multiple connections, and the data streams in the conversion can flow among multiple steps. In kettle IDE, the Jumper wiring is represented by arrows, the direction of the arrow determines the data flow. When the result set of a step is pushed to multiple steps, data can be transmitted in the copy or distribution mode. Replication means that the data received by each subsequent step is the complete result set of the previous step. Distribution means that the data received by each subsequent step is the result set of the previous step, for example, if step a pushes data to step B and C in a distributed manner, B receives 1, 3, and 5 of the result set .. Number record. The data received by C is 2, 4, and 6 of the result set .. Number record.
/******************************* Alien Jordan shot time ***** **********************************/
Because the startup sequence of the steps in the conversion is not fixed and runs independently in their respective threads in parallel, the variables set in the previous step may be
If no result is obtained, the output result of each step can be pushed to the next step by the jumper.
Therefore, before all the steps are started, you cannot ensure that the data in the previous step is obtained by the subsequent steps. When all the steps are running,
The Skip wiring ensures that the output results of the previous step can be obtained in the subsequent steps.