Example application of kettle variables and custom Java code

Source: Internet
Author: User
Tags ftp connection

1 kettle.properties parameter configuration data source connection and FTP connection

Because configurations such as database connection FTP in the test environment and production environment are changed during deployment, it is pre-defined as a configuration item, modified in the configuration file, so that testing and publishing will be simple, and the database as an example illustrates the use of such configurations.

(1) First to find the configuration file, different operating system path is not the same, I use Win7 for development, the path of the configuration file is "C:\Users\chenpeng\.kettle\kettle.properties", as follows:


(2) The configuration file is configured as follows:


You can also visualize settings:


(3) Examples of specific use

Below is the database connection configuration:




Below is the FTP connection configuration:

1.1.2 kettle.properties parameter settings and paths and working with regular mates

(1) Setting the value of a variable in kettle.properties

(2) The kettle.properties is set as follows (the path under Win7 is: C:\Users\chenpeng\.kettle)


(4) Use the path specified in the parameter when outputting the file:




Note: If the path error, such as the deletion of a folder will be an error, as long as the error, even if the folder is set up will still report the directory error (as if there is memory function), then must be restarted Kettle to run correctly.

l Use parameters and regular expressions to mix the scenes with the output file:


Application of 1.1.3 kettle.properties parameters in Java code

Use variables in 1.1.4 jobs and set variable values with JavaScript

The above example is the application of variables declared in the kettle.properties, which are universal in scope, but a lot of time, the sub-job needs to have the parameters of the internal special variables, you can not use the variables declared in the kettle.properties, you need to declare the variables in the process, and set the scope to when The pre-work is valid. The following scenarios are business: The file name is named as the current date format, so a variable is defined at the job level, but it cannot be assigned a value, if the variable is assigned by a JavaScript script, then the variable is applied in the location of the output file name. The deletion of the subsequent files is the common part of the need to use this variable as an interface parameter to do processing.

(1) The main flow is as follows:


(2) Define variables:

(3) Call JavaScript script to modify value in child procedure:



Date.prototype.Format = function (FMT) {//author:meizz      var o = {         "m+": This.getmonth () + 1,//month          "d+": this. GetDate (),//day          "H +": this.gethours (),//hour           "m+": this.getminutes (),//min         "s+": This.getseconds (),//Sec          "q+": Math.floor ((This.getmonth () + 3)/3),//Quarterly         "S": this.getmilliseconds ()//msec       };      if (/(y+)/.test (FMT)) FMT = Fmt.replace (regexp.$1, (this.getfullyear () + ""). substr (4-regexp.$1.length));     For (var k in O)      if (New RegExp ("(" + K +) "). Test (FMT)) FMT = Fmt.replace (regexp.$1, (regexp.$1.length = = 1)? (O[k]): (("XX" + o[k]). substr (("" + O[k]).));      return FMT;  }  var dateTime = new Date (). Format ("YyyyMMdd"); Gives back today at Yyyy/mm/dd hh:mm:ss.000setvariable ("Curdate", DateTime, "s");


In addition, if you are using the field values from the previous node, it is easier to modify the variable values as follows:



(4) using variables (as used in general):



1.1.5 Java code access variable call jar package and generate validation file

This is a more comprehensive example, first defines a record (probably understood as many variables), and then through the Java code to call the jar package, calculate the number of records, file size, MD5 value, etc., assigned to the corresponding record field, and output to a file, the data file to form a checksum file.

(1) The main flow is as follows:



(2) Generate records

(3) Java processing



Import packages edited in Eclipse, mainly used to calculate the number of file lines, MD5 values import cgb.tools.kettlehelper;import java.io.file;import java.io.IOException; A defined row-handling method in kettle, where each row of records executes a public boolean processrow (Stepmetainterface SMI, Stepdatainterface SDI) throws  kettleexception{//(1) Gets the input line to the previous step object[] r = GetRow (); if (r = = null) {Setoutputdone (); return false;} R = Createoutputrow (R, Data.outputRowMeta.size ());//(2) read out parameter variable value string kettleoutputdir = GetVariable ("Kettleoutputdir ", ""); String Hbprovince_code = getvariable ("Hbprovince_code", "" "); String Item_code = getvariable ("Item_code", "" "); String curdate = getvariable ("Curdate", "" "); String filename = ""; String onlyfilename = ""; String RecordCount = ""; String bytecount = ""; String Md5code = "";//(3) Call jar package to calculate MD5 value, number of rows, number of bytes try {filename = kettleoutputdir + "\ \" + Hbprovince_code + "_" + Item_code + "_day_" + curdate + ". csv"; onlyfilename = Hbprovince_code + "_" + Item_code + "_day_" + curdate + ". csv"; Md5code = Kettleh elper.getfilemd5string (filename); RecordCount = string.valueof (KEttlehelper.getfilerecordcount (filename, true)); bytecount = string.valueof (kettlehelper.getfilebytecount (filename ));} catch (IOException e) {e.printstacktrace ();} (4) Put the calculated value into the output record get (fields.out, "filename"). SetValue (R, Onlyfilename); get (Fields.out, "RecordCount"). SetValue ( R, RecordCount); get (Fields.out, "ByteCount"). SetValue (R, ByteCount), Get (Fields.out, "Md5code"). SetValue (R, Md5code) ;//(5) output to the next node to do processing PutRow (Data.outputrowmeta, R); return true;}

(4) Development and export of jar packages

The jar package, which is called, can optionally be generated with eclipse, as follows:


Package Xxx.tools;import Java.io.bufferedreader;import Java.io.file;import java.io.fileinputstream;import Java.io.filenotfoundexception;import Java.io.filereader;import Java.io.ioexception;import Java.io.InputStream;  Import Java.security.messagedigest;import java.security.NoSuchAlgorithmException; public class Kettlehelper {/** * default password string combination, used to convert bytes to 16 binary representation of the character, Apache verify the correctness of the downloaded file is the default of this combination */Priva  Te static char hexdigits[] = {' 0 ', ' 1 ', ' 2 ', ' 3 ', ' 4 ', ' 5 ', ' 6 ', ' 7 ', ' 8 ', ' 9 ', ' A ', ' B ', ' C ', ' d ', ' e ', ' F '        };      private static MessageDigest messagedigest = null;          static {try {messagedigest = messagedigest.getinstance ("MD5"); } catch (NoSuchAlgorithmException Nsaex) {System.err.println ("MD5Util.class.getName ()" + "initialization failed, messagediges T does not support Md5util.              ");          Nsaex.printstacktrace (); }}/** * Generate MD5 checksum for string * * @param s * @return */private StatiC String Getmd5string (string s) {return getmd5string (S.getbytes ()); }/** * Determines if the MD5 checksum of a string matches a known MD5 code * * @param password The string to validate * @param md5pwdstr known MD5 school Verification Code * @return */public static Boolean Checkpassword (string password, string md5pwdstr) {string s          = getmd5string (password);      Return S.equals (MD5PWDSTR); /** * MD5 checksum of generated files * * @param file * @return * @throws IOException */Public        static string getfilemd5string (String fileName) throws IOException {File File = new file (fileName);          InputStream fis;          FIS = new FileInputStream (file);          byte[] buffer = new byte[1024];          int numread = 0;          while ((Numread = fis.read (buffer)) > 0) {messagedigest.update (buffer, 0, numread);          } fis.close ();      Return Buffertohex (Messagedigest.digest ()); }/** * Gets the number of records in the file * * @param file * @return * @throws IOException */public static int Getfilerecordcount (String filename,b Oolean Hasheadrow) throws IOException {file InFile = new file (fileName),//CSV file read @SuppressWarnings ("UN        Used ") String instring =" ";        int count = 0;            try {BufferedReader reader = new BufferedReader (new FileReader (InFile));             while ((instring = Reader.readline ()) = null) {count++;        } reader.close (); } catch (FileNotFoundException ex) {System.out.println ("No Files found!        "); } catch (IOException ex) {System.out.println ("Error reading/writing file!        ");    } if (Hasheadrow) {count--;     } return count; /** * MD5 checksum of generated files * * @param file * @return * @throws IOException */Public s          Tatic String getfilemd5string (file file) throws IOException {InputStream fis;       FIS = new FileInputStream (file);   byte[] buffer = new byte[1024];          int numread = 0;          while ((Numread = fis.read (buffer)) > 0) {messagedigest.update (buffer, 0, numread);          } fis.close ();      Return Buffertohex (Messagedigest.digest ());          } private static String getmd5string (byte[] bytes) {messagedigest.update (bytes);      Return Buffertohex (Messagedigest.digest ());      } private static String Buffertohex (byte bytes[]) {return Buffertohex (bytes, 0, bytes.length); } private static String Buffertohex (byte bytes[], int m, int n) {stringbuffer stringbuffer = new Stringbu          Ffer (2 * n);          int k = m + N;          for (int l = m; l < K; l++) {Appendhexpair (bytes[l], stringbuffer);      } return stringbuffer.tostring (); } private static void Appendhexpair (Byte bt, StringBuffer stringbuffer) {char C0 = hexdigits[(BT & 0x F0) >> 4];//byte-high 4-bit digital conversion,>>> moves the symbol bit right along the right side of the logic, and there is no difference between the two symbols in char c1 = HEXDIGITS[BT & 0xf];//byte-Low 4-bit digital conversion Stringbu          Ffer.append (C0);      Stringbuffer.append (C1); }/** * Gets the number of bytes in a text * @param fileName * @return * @throws IOException */public static L    Ong getfilebytecount (String filename) throws IOException {File File = new file (fileName);    return File.length (); }    }

Export JAR Package


Put into the Kettle jar directory, it will self-loading, different operating systems, directories will be different, I use the Win7, the directory is as follows:

(5) File output


(6) the results of the resulting file are as follows:


1.1.6SQLUse variables in

The following query statement uses a question mark placeholder when the start date (first? Number) and end date (the second one?) Number) is bound to the SQL question mark placeholder, in the query entry date for a certain period of time for President information:

Selectname,took_office frompresidents Wheretook_officebetween? and?


example, first use the build row step (" generdate Rows ") to generate a row of records with two fields, sequentially instead of table input SQL the placeholder in the statement. In a real-world scenario, you typically use dynamic processing results to produce expected values instead of generating row steps.


Next is the table input step, where the configuration SQL The query statement, which contains a question mark placeholder, is passed in the Insert Data Step , select the previous step in the drop-down box to replace the value of the question mark.

Execute a query multiple times by transferring different values

if you want to loop through the query and replace the placeholders with different values, you need the placeholder production steps to generate multiple rows of data and enter the table with the option " Execute for each row "check. This example filename is called placeholders_in_loop.ktr.


Limitations of placeholders

while it is very efficient to query for placeholder bound values, there are some scenarios that cannot be used, some of which SQL you cannot use placeholders. These examples are very common, but you cannot use placeholders.

You cannot replace a table noun with a placeholder, or the query will not execute.

SELECT Some_fieldfrom?

Instead of using a placeholder for the field name of the query, the following query can bind the parameter successfully, but only as a constant, not as the name of the field.

SELECT? Asmy_field from table

You cannot use placeholders to bind multiple list item values separated by commas; " the ″ to the following query statement, you will get unexpected results.

SELECT * from Testwhere ID in (?)

The result you expect to get is:

SELECT * from Testwhere ID in ("All-in-a-box")

But the result of the operation is this, the transmission of a string, but get three values, and the actual situation is completely uncertain how many values are transferred in.

SELECT * from Testwhere ID in (All-in-a-

in order to solve the problem of these scenarios, you need to use Kettle variable dynamically constructs the query text, which is explained in detail below.

SQL used in Queries Kettle variables

The table input steps support the substitution of variables or parameters in the query, assuming a series of fully structured tables, namely : Mammals, birds, insects ( animals, birds, insects ) , you can use Kettle variable as the name of the table. Suppose we have a variable named:animals_table, assigned to birds, we set "Replace Vaiables"option is selected. If we write the following query:

SELECT name,population From${animals_table}

In the execution must be successfully replaced by:

SELECT name,population from birds

if the value of the set variable is " Mammals "or" Insects , the different tables will be queried dynamically. When placeholders are not competent, using variable techniques can help us solve them. The name of the example is variables.ktr, and the runtime does not forget to assign a value to parameter(named parameter) for testing.


Variables and placeholders used together

if necessary, we can mix the two techniques; In this example, a variable is used as a table noun, and the placeholder is used as the input value for the preceding step. Sample file variables_and_placeholders.ktr.

SELECT name, population from${animals_table}where population >?


Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Example application of kettle variables and custom Java code

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.