Http://wiki.pentaho.com/display/COM/PDI+Plugin+Loading
SVN: // source.pentaho.org/svnkettleroot/plugins/s3svinput
<? XML version = "1.0" encoding = "UTF-8"?>
<Plugin
ID= "Templateplugin"
Iconfile= "Icon.png"
Description= "Template plugin"
Tooltip= "Only there for demonstration purposes"
Category= "Demonstration"
Classname= "Plugin. template. templatestepmeta">
<Libraries>
<LibraryName = "templatestep. Jar"/>
</Libraries>
</Plugin>
ID: It must be globally unique in the kettle plug-in, because it is serialized by kettle, so do not change it at will.
Iconfile: The image displayed by the Plugin in kettle. It must be a PNG image.
Description: describes the plug-in, which is displayed in the tree menu.
Tooltip: the prompt information displayed when the mouse slides over the tree menu.
Category: parent directory displayed by the plug-in
Classname: Metadata class
Library: indicates that the plug-in needs to load the dependent jar package
Kettle conversion steps work components:
Here there are four classes that constitute the kettle STEP/node. Each class has its specific purpose and role.
Templatestep: The Step class implements the stepinteface interface. During the conversion operation, its instance will be the actual data processing location. Each execution thread represents an instance of this type.
Templatestepdata: The data class is used to store data. When the plug-in is executed, it is unique for each execution thread. What is stored during execution mainly includes database connection, file handle, cache, and other things.
Templatestepmeta: The metadata class implements the stepmetainterface interface. Its responsibility is to save and serialize the configurations of specific step instances. In this example, it is responsible for saving the step name set by the user and the name of the output field.
Templatestepdialog: The dialog box class implements the interface for this step to interact with the user. It displays a dialog box, through which you can set the steps as needed. The dialog box class is closely related to the metadata class, and the O (operator _ operator) O ~ Think about why? (Metadata class can track user settings)
Except forCodeAnd plugin. XML, which sets the metadata of the plug-in and defines the Display Effect of the steps on the kettle graphic workbench. For better understanding, I will use this step to design a conversion process and execute it.For plug-in development, we will start with the plugin. xml configuration file, then talk about metadata and dialog box classes, and finally talk about step classes and data classes.
========================================================== ========================================================== ======
During our ETL work, some special process tasks are often encountered in some projects. Kettle's original process nodes cannot meet our requirements, at this time, we need to customize the process processing nodes. Custom process nodes are mainly for data management, data verification, and extraction of some special file data. Check kettleSource codeThen you can know how to create your own kettle plug-in.
This articleArticleIt mainly tells you how to develop a simple transformation plug-in kettle 4.0,This plug-in can accept any record stream, and then add another field with a value after the stream. You can define the name of the field, which is very simple.O (distinct _ distinct) O ~ I will try my best to briefly introduce some interface specifications required by the Development plug-in.
Preparations:
1. Download pdi-ce-4.0.0-stable.zip for desktop testing.
2. Download eclipse. you can install the svn plug-in any version and download kettle source code later.
3. Download the standard plug-in source code template project.
Create a plug-in project:
1. Import the downloaded "Standard plug-in source code template project" to your workspace.
2. There will be some errors in the project after the import, because the dependent package is not imported. In this case, decompress pdi-ce-4.0.0-stable.zip and enter the decompressed directory to add all the jar packages starting with kettle under the lib directory and the SWT. Jar packages under libswt/Win32 (which depend on the operating system to import) to the project environment variable.
3. Re-compile the project and there should be no errors.
Kettle conversion steps work components:
Here there are four classes that constitute the kettle STEP/node. Each class has its specific purpose and role.
Templatestep: The Step class implements the stepinteface interface. During the conversion operation, its instance will be the actual data processing location. Each execution thread represents an instance of this type.
Templatestepdata: The data class is used to store data. When the plug-in is executed, it is unique for each execution thread. What is stored during execution mainly includes database connection, file handle, cache, and other things.
Templatestepmeta: The metadata class implements the stepmetainterface interface. Its responsibility is to save and serialize the configurations of specific step instances. In this example, it is responsible for saving the step name set by the user and the name of the output field.
Templatestepdialog: The dialog box class implements the interface for this step to interact with the user. It displays a dialog box, through which you can set the steps as needed. The dialog box class is closely related to the metadata class, and the O (operator _ operator) O ~ Think about why? (Metadata class can track user settings)
In addition to the above Code, there is also a plugin. XML, which sets the metadata of the plug-in and defines the Display Effect of the steps in the kettle graphic workbench. For better understanding, I will use this step to design a conversion process and execute it.For plug-in development, we will start with the plugin. xml configuration file, then talk about metadata and dialog box classes, and finally talk about step classes and data classes.
Write your own plugin. xml:
The following plug-in. XML is the content in our plug-in. Its function is to tell kettle plug-in the metadata class, plug-in name and description, and the jar package to be loaded. For details, refer to the following article:Plug-in loading
<? XML version = "1.0" encoding = "UTF-8"?>
<Plugin
ID= "Templateplugin"
Iconfile= "Icon.png"
Description= "Template plugin"
Tooltip= "Only there for demonstration purposes"
Category= "Demonstration"
Classname= "Plugin. template. templatestepmeta">
<Libraries>
<LibraryName = "templatestep. Jar"/>
</Libraries>
</Plugin>
ID: It must be globally unique in the kettle plug-in, because it is serialized by kettle, so do not change it at will.
Iconfile: The image displayed by the Plugin in kettle. It must be a PNG image.
Description: describes the plug-in, which is displayed in the tree menu.
Tooltip: the prompt information displayed when the mouse slides over the tree menu.
Category: parent directory displayed by the plug-in
Classname: Metadata class
Library: indicates that the plug-in needs to load the dependent jar package
I. Metadata:
The following shows several key methods of metadata. Note that the output field of the next step is stored in the metadata class with the private member variable outputfield.
// Keep track of the Step settings
Public String getoutputfield ()
Public void setoutputfield (...)
Public void setdefault ()
// Serialize the step settings to and from XML
Public String getxml ()
Public void loadxml (...)
// Serialize the step settings to and from a kettle Repository
Public void readrep (...)
Public void saverep (...)
// Provide information about how the step affects the field structure of processed rows
Public void getfields (...)
// Perform extended validation checks for the step
Public void check (...)
// Provide instances of the step, data and dialog classes to kettle
Public stepinterface getstep (...)
Public stepdatainterface getstepdata ()
Public stepdialoginterface getdialog (...)
The templatestepmeta metadata class has many other aspects, but most of them are implemented by default by its parent class basestepmeta. These default implementations are enough to make our metadata class work well. For more information, see kettle official documents on stepmetainteface and basestepmeta.
Ii. Dialog Box:
Temeplatestepdialog sets the dialog box for the step. Kettle's user interface component is the eclipse SWT framework. To develop a complicated dialog box, you must be familiar with most of the SWT code. You can click the Help menu on Eclipse to obtain the SWT document online. During development, a dialog box object has a metadata object, which records where to read the configuration? Where should I save the configured configuration? It only sets the name of the output field in our template step. A dialog box class inherited from basestepdialog must provide open (...) Method. This method must return the name of the step (when the step is changed) or null (when the dialog box is canceled)
Iii. steps:
The Step class is the place for actual processing and conversion work. Because most of the sample code is provided by the parent class basestep, most plug-ins only focus on the following specific methods.
// Initialization and teardown
Public BooleanInit(...)
Public voidDispose(..)
// Processing rows
Public voidRun()
Public BooleanProcessrow(..)
The init () method is called by kettle before the conversion is executed. The conversion must be performed only when all steps are initialized successfully. We have not done anything in this template step. Here we just show you how to understand it.
The dispose () method is executed after the step is executed (not after the conversion is executed). It closes the resource, such as the file handle and cache.
The run () method is called when the record set is actually processed. It is actually a small loop that calls the processrow () method to process records. When no data is processed or the conversion is stopped, the loop is exited.
The processrow () method is called when processing a single record. This method usually calls getrow () to obtain a single record to be processed. This method will be blocked if necessary, for example, when this step is intended to slow down the processing of data. The subsequent process of processrow () will execute the conversion and call the putrow () method to put the processed records to its downstream steps.
Note: Your steps may change the record structure. To be secure, be sure to familiarize yourself with the org. pentaho. Di. Core. Row package, especially the rowmetainterface and rowdatautil classes.
The base-class basestep identifies the first access to the processed records. It may be useful when some code is executed only once. For example, a time-consuming query is actually a cache.
Iv. Data:
Most steps require temporary buffering or temporary data. The data class is the proper storage location for the data. Each execution thread obtains its own data class instance, so it can run in an independent space. Templatestepdata is inherited from basestepdata. As an empirical rule, do not place the non-constant field in the basestepdata class. If you need it, place it in the templatestepdata class.
Our steps only use one data object to store the output structure of the record set, and do not use other storage media, such as files.
Summary:
A kettle step plug-in consists of four classes, each of which has its own roles and responsibilities. The metadata, dialog box, steps, and data classes work together well, this is especially because many sample code and common methods have been implemented by the parent class, making kettle plug-in development very easy. If you need to discuss it, you canHttp://www.ahuoo.comLeave a message for me. Thank you! The next section describes plug-in development and debugging. O (distinct _ distinct) O ~
==================================== Debug ======== ========================================================== ========
Kettle plug-in debugging is troublesome because it is dependent on two different projects. However, if you master the method, it is actually very simple. Review common debugging methods in Java, such as link source and Java remote debugging, if you understand Maven, you can even directly rely on debugging. Next I will repeat the previous two debugging methods.
Preparations:
1, through the svn plug-in on Eclipse download kettle4.0 source code, address is: http://source.pentaho.org/svnkettleroot/Kettle/branches/4.0.0
After the download is complete, we found that there are two projects in Eclipse: templatestepplugin and kettle 4.0.0. If there are differences, see the article ETL Tool-kettle plug-in development (basic)
2. Download the test. Ktr file I have prepared for you.
I. Link source debugging:
1. Under the kettle4.0 project, go to the plugins-> steps directory and find a dummyplugin Directory, which is the external conversion plug-in of kettle official website. Specifically, the reader finds that only three files are used: dpl.png and dummy. jar, plugin. XML, in fact, is the east and west required by a complete plug-in.
2. In the same directory as dummyplugin, create the folder templatestep and copy the icon.png and plugin. xml files under the templatesteppluginproject to this point. The jar package is no longer needed.
Otherwise, the source code will not be associated during debugging ~ Think about why ??
3. The most important step is to right-click kettle 4.0.0, select Properties, select Java build path on the left hand side, and select source from the preceding tab, there are several buttons on the right,
Click the link Source button to set your plug-in source code directory and directory name. For details, see:
4. Run. After association, we can find the kettle entry running class spoon. java, you can use the shortcut key Ctrl + Shift + R to find this option to debug and run, enter the kettle workbench, open the test file. ktr,
Check whether the service is running normally.Breakpoint debugging is enabled in templatestep.
Ii. Kettle remote debugging
The key to remote debugging is to first set the remote debugging parameters in kettle STARTUP configuration, and then set the remote debugging monitoring port number in eclipse.
1. Enter the directory extracted from pdi-ce-4.0.0-stable.zip (the ETL tool -- kettle plug-in development (basic) in the previous article, edit the startup configuration file spoon. bat, and spoon. Sh in Linux.
Add the following sentence to the file:
Set opt =-xdebug-xnoagent-djava. compiler = none-xrunjdwp: Transport = dt_socket, Server = Y, suspend = N, address = 8285
See:
2. Click spoon. BAT to enter the kettle workbench.
3. Open eclipse, choose run --> debug deployments, select remote Java application on the left, right-click to create a remote debugging ApplicationProgram, Such,
The port is port 8285 in the above parameter. Everything is OK. Select the debug button below.
4. On the kettle workbench, run the test. Ktr test file, and eclipse starts listening for debugging. You can set the breakpoint.
========================================================== ========================================================== ======
// Copy the metadata of the input row and set it to the metadata of the output row.
Rowmetainterface outputrowmeta = getinputrowmeta (). Clone ();
// Construct a new output column. Method 1
Rowmetainterface rowmeta = new rowmeta ();
Object [] rowdata = new object [1];
Int valtype = valuemeta. GetType ("string ");
Valuemetainterface valuemeta = new valuemeta ("fieldname1", valtype); valuemeta. setlength (-1 );
Rowmeta. addvaluemeta (valuemeta );
Rowmetaanddata metaanddata = new rowmetaanddata (rowmeta, rowdata );
Rowmetainterface newmeta = metaanddata. getrowmeta ();
Outputrowmeta. mergerowmeta (newmeta );
// Construct a new output column. Method 2
Rowmetainterface outputrowmeta = getinputrowmeta (). Clone ();
Int valtype = valuemeta. GetType ("string ");
Valuemetainterface valuemeta = new valuemeta ("filedname1", valtype );
Valuemeta. setlength (-1 );
Outputrowmeta. addvaluemeta (valuemeta );
// Obtain a data row from the previous step.
Object [] r = NULL;
R = getrow ();
// Obtain the value of a field in a row of data
String fieldname = "myoldfield ";
Int fieldindex = This. getinputrowmeta (). indexofvalue (fieldname );
Object value = R [fieldindex];
// Append the new data to the back of the original row data to become the new output line :( reprinted Please note: http://pdi.itpub.net)
Object [] values = new object [1];
Values [0] = "new value ";
R = rowdatautil. addrowdata (R, getinputrowmeta (). Size (), values );
// Put the metadata and data of the output row in the cache so that the next step can be read. Note that the number of metadata and the number of data must be equal. (Reproduced Please note: http://pdi.itpub.net)
Putrow (outputrowmeta, R );