Step 2 of kettle's User Defined Java Class, kettledefined
Zookeeper
Step 2 of User Defined Java Class)
The "user-defined java class" step in kettle, also known as the UDJC step, is available in version 4.0 and has powerful functions and is omnipotent. It can write arbitrary code without affecting efficiency. This section describes in detail how to use examples in different scenarios. If you use this step, you can divide the content into three parts for ease of reading because of the large amount of content.Download.
If you have not started from the first part, visit the first part.
Step Parameter)
If you write a piece of code and want to make it more generic, the step parameters can be used. In the example, we provide a regular expression and a field name, this step checks whether the field corresponding to the parameter matches the regular expression. If the returned result is 1, the reverse value is 0.
The Code is as follows:
Import java. util. regex. Pattern;
Private Pattern p = null;
Private FieldHelper fieldToTest = null;
Private FieldHelper outputField = null;
Public boolean processRow (StepMetaInterfacesmi, StepDataInterface sdi) throws KettleException
{
Object [] r = getRow ();
If (r = null ){
SetOutputDone ();
Return false;
}
// Prepare regex and field helpers
If (first ){
First = false;
String regexString = getParameter ("regex ");
P = Pattern. compile (regexString );
FieldToTest = get (Fields. In, getParameter ("test_field "));
OutputField = get (Fields. Out, "result ");
}
R = createOutputRow (r, data. outputRowMeta. size ());
// Get the value from an input field
String test_value = fieldToTest. getString (r );
// Test for match and write result
If (p. matcher (test_value). matches ()){
OutputField. setValue (r, Long. valueOf (1 ));
}
Else {
OutputField. setValue (r, Long. valueOf (0 ));
}
// Send the row on to the next step.
PutRow (data. outputRowMeta, r );
Return true;
}
The getParameter () method returns the value of the parameter defined in the ui. Of course, the value of the parameter may also be a kettle variable. Using variables as parameters is a common practice. You can manually search for variables in the xml code of the step.
The conversion name of the example is parameter. ktr.
Message step (Info Steps) Usage
Sometimes multiple input steps need to be merged, and different roles may be assigned, such as the stream query step. The message step is used to provide a query, and its data cannot be returned through the getRow () method. It is very easy to use in udjc steps. Define in the message step Tab Of the ui interface of the udjc step and return the corresponding value through the getRowsFrom () method.
In the example conversion, the message step is used to receive a group of regular expressions and test whether a field in the mainstream data matches. If any expression matches, the result field is set to 1. if no match exists, the result is 0 and a matched expression is appended.
The Code is as follows:
Import java. util. regex. Pattern;
Import java. util .*;
Private FieldHelper resultField = null;
Private FieldHelper matchField = null;
Private FieldHelper outputField = null;
Private FieldHelper inputField = null;
Private ArrayList patterns = newArrayList (20 );
Private ArrayList expressions = newArrayList (20 );
Public boolean processRow (StepMetaInterfacesmi, StepDataInterface sdi) throws KettleException
{
Object [] r = getRow ();
If (r = null ){
SetOutputDone ();
Return false;
}
// Prepare regex and field helpers
If (first ){
First = false;
// Get the input and output fields
ResultField = get (Fields. Out, "result ");
MatchField = get (Fields. Out, "matched_by ");
InputField = get (Fields. In, "value ");
// Get all rows from the info stream andcompile the regex field to patterns
FieldHelper regexField = get (Fields. Info, "regex ");
RowSet infoStream = findInfoRowSet ("expressions ");
Object [] infoRow = null;
While (infoRow = getRowFrom (infoStream ))! = Null ){
String regexString = regexField. getString (infoRow );
Expressions. add (regexString );
Patterns. add (Pattern. compile (regexString ));
}
}
// Get the value of the field to check
String value = inputField. getString (r );
// Check if any pattern matches
Int matchFound = 0;
String matchExpression = null;
For (int I = 0; I <patterns. size (); I ++ ){
If (Pattern) patterns. get (I). matcher (value). matches ()){
MatchFound = 1;
MatchExpression = (String) expressions. get (I );
Break;
}
}
// Write result to stream
R = createOutputRow (r, data. outputRowMeta. size ());
ResultField. setValue (r, Long. valueOf (matchFound ));
MatchField. setValue (r, matchExpression );
// Send the row on to the next step.
PutRow (data. outputRowMeta, r );
Return true;
}
Call the findInfoRowSet () method to return the entire row set content of the input step corresponding to the name defined in the message step of the udjc step. Reading a row from the content of the row set is different from reading a row from the main data stream. You can call getRowFrom () and specify the row set.
The name of the sample conversion is info_steps.ktr.
Target Steps)
When using udjc steps, you may need to specify the row set to flow to different target steps. Call the putRow () method and pass a target step as a parameter. We need to define all possible target steps in the target step of the ui interface of the udjc step. In the following example, we randomly distribute data to different target steps.
The findTargetRowSet () method returns the target step row set defined in the ui interface and serves as a parameter of the putRowto () method. The name of the sample conversion is target_steps.ktr.
The Code is as follows:
Import java. util. regex. Pattern;
Import java. util .*;
Private RowSet lowProbStream = null;
Private RowSet highProbStream = null;
Public boolean processRow (StepMetaInterfacesmi, StepDataInterface sdi) throws KettleException
{
Object [] r = getRow ();
If (r = null ){
SetOutputDone ();
Returnfalse;
}
// Prepare regex and field helpers
If (first ){
First = false;
LowProbStream = findTargetRowSet ("low_probability ");
HighProbStream = findTargetRowSet ("high_probability ");
}
// Send the row on to the next step.
If (Math. random () <0.35 ){
PutRowTo (data. outputRowMeta, r, lowProbStream );
}
Else {
PutRowTo (data. outputRowMeta, r, highProbStream );
}
Returntrue;
}
For more information, see section 3;
How does kettle use the User Defined Java Class control?
String firstnameField;
String lastnameField;
String nameField;
Public boolean processRow (StepMetaInterface smi, StepDataInterface sdi) throws KettleException
{
// Obtain input
//
Object [] r = getRow ();
// The input is null. The value is false.
//
If (r = null ){
SetOutputDone ();
Return false;
}
// For performance consideration, parameter can be queried only once
//
If (first ){
FirstnameField = getParameter ("FIRSTNAME_FIELD ");
LastnameField = getParameter ("LASTNAME_FIELD ");
NameField = getParameter ("NAME_FIELD ");
First = false;
}
// Use createOutputRow () to ensure that the output array is large enough to hold any new fields
//
Object [] outputRow = createOutputRow (r, data. outputRowMeta. size ());
String firstname = get (Fields. In, firstnameField). getString (r );
String lastname = get (Fields. In, lastnameField). getString (r );
// Set the value in the output field
//
String name = firstname + "" + lastname;
Get (Fields. Out, nameField). setValue (outputRow, name );
// PutRow will send the row on to the default output hop.
//
PutRow (data. outputRowMeta, outputRow );
Return true;
}
How does kettle java class output results to the console?
Instead of running Java W, you can use System. out. print to output