Step 1 of kettle's User Defined Java Class, kettledefined
Zookeeper
Step 1 of User Defined Java Class)
The "user-defined java class" step in kettle, also known as the UDJC step, is available in version 4.0 and has powerful functions and is omnipotent. It can write arbitrary code without affecting efficiency. This section describes in detail how to use examples in different scenarios. If you use this step, you can divide the content into three parts for ease of reading because of the large amount of content.Download.
UDJC step Working Mechanism
The user-defined java class is inherited from org. pentaho. di. trans. steps. userdefinedjavaclass. TransformClassBase. We can download the source code and check the methods and attributes of the class, which is helpful to my understanding.
During the conversion operation, the UDJC step code inherits TransformClassBase and compiles it. This class is
A general step plug-in class with some convenient public methods. The custom code can overwrite or inherit the methods or attributes of the parent class according to the actual situation, or declare additional methods or attributes, you can also import the class declaration at the beginning of the Code. The following classes are automatically imported by default:
Import org. pentaho. di. trans. steps. userdefinedjavaclass .*;
Importorg. pentaho. di. trans. step .*;
Importorg. pentaho. di. core. row .*;
Import org. pentaho. di. core .*;
Importorg. pentaho. di. core. exception .*;
If you are familiar with kettle's internal mechanism and want to easily access some objects through code, click the code snippet on the left side of the udjc step attribute to help us learn more quickly.
The following sections describe how to use udjc in different scenarios:
Simple Field Conversion
The first example is a very simple operation: convert a string type field to uppercase. The purpose is to describe how to set steps and process rows, and how to access input and output fields. If you are already developing the kettle plug-in, you will be very familiar with it. Assume that the row data stream contains a field "testfield". udjc defines an output field "uppercase" of the character type ". The following code converts testfield into uppercase and writes it to the output field as the result.
The Code is as follows:
Public boolean processRow (StepMetaInterfacesmi, StepDataInterface sdi) throws KettleException
{
Object [] r = getRow ();
If (r = null ){
SetOutputDone ();
Return false;
}
If (first ){
First = false;
}
R = createOutputRow (r, data. outputRowMeta. size ());
// Get the value from an input field
String test_value = get (Fields. In, "testfield"). getString (r );
// Play around with it
String uppercase_value = test_value.toUpperCase ();
// Set a value in a new output field
Get (Fields. Out, "uppercase"). setValue (r, uppercase_value );
// Send the row on to the next step.
PutRow (data. outputRowMeta, r );
Return true;
}
The kettleudjc step calls the processRow () method to process an input row. If true is returned, the system continues to prepare to process another input row. If no data is processed, false is returned.
GetRow () is a blocking call. It waits for the previous step to provide a row of data. If necessary, an array of objects is returned to indicate the input row, or null indicates that no input row is required.
The next step is simple, seemingly useless three lines of code, involving a Boolean field first (parent field), through which you can easily identify whether the first row of data is being processed, this is useful when some jobs only need to be executed once. If this parameter is not used, ignore it.
Call createOutputRow () to ensure that the row array is large enough to accommodate the added output fields.
The get () method can access the input or output fields of the step based on the name. You must specify the field type (In, Out, Info) and field name, and returno
Rg. pentaho. di. trans. steps. userdefinedjavaclass. FieldHelper class instance. This object can access field data. The parent class is defined as follows: public FieldHelper get (Fields type, String name) throwsKettleStepException;
After setting the output field in a row, call putRow () transmission to change the row to the next step.
This short example quickly customizes the content of the input field. The conversion of the example is in the attachment.Uppercase. ktrFile.
The sample code is downloaded here. For more information, see the second and third sections.
How does kettle use the User Defined Java Class control?
String firstnameField;
String lastnameField;
String nameField;
Public boolean processRow (StepMetaInterface smi, StepDataInterface sdi) throws KettleException
{
// Obtain input
//
Object [] r = getRow ();
// The input is null. The value is false.
//
If (r = null ){
SetOutputDone ();
Return false;
}
// For performance consideration, parameter can be queried only once
//
If (first ){
FirstnameField = getParameter ("FIRSTNAME_FIELD ");
LastnameField = getParameter ("LASTNAME_FIELD ");
NameField = getParameter ("NAME_FIELD ");
First = false;
}
// Use createOutputRow () to ensure that the output array is large enough to hold any new fields
//
Object [] outputRow = createOutputRow (r, data. outputRowMeta. size ());
String firstname = get (Fields. In, firstnameField). getString (r );
String lastname = get (Fields. In, lastnameField). getString (r );
// Set the value in the output field
//
String name = firstname + "" + lastname;
Get (Fields. Out, nameField). setValue (outputRow, name );
// PutRow will send the row on to the default output hop.
//
PutRow (data. outputRowMeta, outputRow );
Return true;
}
How does kettle java class output results to the console?
Instead of running Java W, you can use System. out. print to output