Kettle subconversion is ing

Source: Internet
Author: User
Kettle sub-conversions, that is, ing sub-conversions, are a very good feature that can be reused throughout the conversion. Therefore, if you need to copy and paste the same steps to multiple other conversions, you can use subconversions (ing) to simplify your ETL program. Child conversions generally receive row input data from parent conversions, process the input data, and then pass

Kettle sub-conversions, that is, ing sub-conversions, are a very good feature that can be reused throughout the conversion. Therefore, if you need to copy and paste the same steps to multiple other conversions, you can use subconversions (ing) to simplify your ETL program. Child conversions generally receive row input data from parent conversions, process the input data, and then pass

Kettle subconversion is ing

Sub-conversions are very good features and can be reused throughout the conversion. Therefore, if you need to copy and paste the same steps into multiple other conversions, you can use subconversions (ing) to simplify your ETL program.

Child conversions generally receive row input data from parent conversions, process the input data, and then return the data to parent conversions. Therefore, the sub-conversion requires an input step and an output step for connecting to the parent conversion during running. Define the field structure of input and output row data in these interface steps. In order to achieve reusability, when the parent conversion calls the sub-conversion, the row field of the parent conversion is mapped to the sub-conversion field as the input. Also, the ing occurs after processing, returns the conversion to the parent. Therefore, subconversions are also called mappings.

The following describes how to use subconversions to reconstruct the computing content and add it to a subconversion. The sample code is inHereDownload. The parity code in the example may need to be referencedAnother article.

Sample Conversion

This example is used to calculate the number, receive an input value, calculate the parity bit (the number of digits in the binary 1), and calculate the sum of all digits (the sum of digits in the decimal representation ), finally, it is output to excel.

The example is relatively simple. If this computation needs to be used in several other conversions of the project, let's refactor the above example so that it can be called in other conversions.

Sub-conversions are stored in a separate file. The Mapping input specification step is used as the input, the calculation step is in the middle, and the Mappingoutput specification step is used as the output. The ing-related steps are included in the Mapping classification. The configuration of the input step is displayed. The parent conversion is required to transfer an integer field name to the current sub-conversion. These fields may be used in subsequent steps. The check box indicates that other fields will be passed in, and their flow through sub-conversions will not be affected. This is very useful. If the parent conversion has multiple fields, and the sub-conversion only needs some of the fields, you need to use this option to ensure that other fields are not affected.

The calculation steps remain unchanged, and the other two fields are output, respectively, the sum of the parity bit and the number of digits. The output step does not need to be configured to provide conversion from the fields output in the previous step to the parent node.

Now that the sub-conversion design has been completed, you need to call the parent conversion and select the Mapping step (sub-conversion. In the ing step, you must determine the sub-conversion to be called, whether to pass in any naming parameters, and other common attributes of the conversion. The sub-conversion can be easily specified and specified through the file or library path. In the "generate random integer" step, the data stream is passed into the sub-conversion, and the data stream processed by the sub-conversion is Output to the "Excel Output" step. Connect the sub-conversions between the generate random integer step and the Excel Output step. Add the input and output tabs in the sub-conversion step configuration, and select "Is this the main data path ?" This tells kettle to perform the deduction based on the original and target steps of the connection. In addition, on the input tab, you must specify the ing between the input row fields and the fields of the sub-conversion. The unique input field in the example is already value, so it is easy to configure.

The reconstruction of the sub-conversion has been completed. The main conversion calls the sub-conversion and stores the result in excel. You can download the example for verification.

Reserved field name

The field ing in the preceding example is relatively simple. In actual scenarios, the input fields are not necessarily the same as the names defined in the subconversion. In this case, you can choose to let the sub-conversion start and return the field name in the sub-conversion, or rename the field in the sub-conversion to the original input name. Use "Ask these values to be renamed back on output ?" If you map the input field "foo" to the sub-conversion name "bar", if you do not select this option, the output field name is bar from the sub-conversion; if this option is selected, the output field name is still the original foo. This feature allows the subconversion to be decoupled from the parent conversion.

In the preceding example, the sub-conversion has only one input and output, but the sub-conversion can have multiple input and output. For demonstration, we separate the calculation steps into two independent paths: the parity bit and the technical digit and.

Currently, subconversions have two inputs and outputs. The integers converted from the parent must be input to two inputs and output to different excel files. In this way, the parity bit is saved in one file, and the sum of the digits is in another file. Parent conversions also need to be restructured and changed.

If you download the example, you will note that the ing step has two inputs and outputs. "Is this the main data path ?" The options are not selected. It is unrealistic to expect kettle to start from input to output based on the connection. Both the original step and the target step are manually defined. In fact, the connection between the ing step and the input and output is symbolic, and the connection can be deleted completely without affecting the operation (don't worry, you can try it ). In this example, the connection is retained. In addition, the "Generate random integer" step has two subsequent steps (two input steps for sub-conversion). Therefore, you need to copy the records to two subsequent steps instead of distributing them.

Conclusion

Kettle ing makes the conversion reusable and simplifies the ETL program. Its Input and Output steps and parameter configurations are flexible, and almost any complicated conversions can be reconstructed into reusable subconversions. If the conversion is large, you can re-construct a continuous sub-conversion combination implementation. Subconversions make it easier to understand the structure of the converted data stream.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.