Kettle variables and examples of custom java code, kettlejava

Source: Internet
Author: User
Tags ftp connection

Kettle variables and examples of custom java code, kettlejava
1 kettle. properties parameter configuration data source connection and FTP connection

The configuration of the database connection FTP in the test environment and production environment will be changed during the deployment process, so it is pre-defined as a configuration item and modified in the configuration file, so that the test and release will become simple, the following uses a database as an example to describe how to use this type of configuration.


(1) first, find the configuration file. Different Operating System paths are different. I used win7 for development. The configuration file path is "C: \ Users \ chenpeng \. kettle \ kettle. properties ", as follows:

(2) The specific configuration in the configuration file is as follows:

You can also set it visually:

(3) Example

L below is the database connection Configuration:

L The following is the FTP connection Configuration:


1.1.2 kettle. properties parameter settings and path and use with regular expressions

(1) set the variable value in kettle. properties.


(2) kettle. properties is set as follows (C: \ Users \ chenpeng \. kettle in win7)

(4) use the path specified in the parameter when outputting the file:

Note: if an error occurs in the path, such as deleting a folder, an error will be reported. If an error is reported, a directory error will still be reported even if the folder is created (as if there is a memory function ), at this time, kettle must be restarted to run properly.

L mixed use cases of parameters and regular expressions in output files:

1.1.3 kettle. properties parameter application in java code

1.1.4 use the variable in the job and use javascript to set the variable value

The above examples are kettle. the applications of variables declared in properties are all universal in the global scope. However, for many times, the subjob requires internal dedicated variable parameters, so kettle cannot be used. the variables declared in properties need to be declared in the process and set the scope to the current job. In the following application scenarios, the file name is named in the current date format. Therefore, a variable is defined at the job level, but cannot be assigned a value. In this case, a javascript script is used to assign a value to the variable, then, you can apply the variable at the location of the output file name. The deletion and uploading of the subsequent files all require the variable as an interface parameter for processing.

(1) The main process is as follows:

(2) define variables:


(3) The Subprocess calls the javascript script to modify the value:

Date. prototype. format = function (fmt) {// author: meizz var o = {"M +": this. getMonth () + 1, // month "d +": this. getDate (), // Day "h +": this. getHours (), // Hour "m +": this. getMinutes (), // minute "s +": this. getSeconds (), // second "q +": Math. floor (this. getMonth () + 3)/3), // quarter "S": this. getMilliseconds () // millisecond}; if (/(y + )/. test (fmt) fmt = fmt. replace (RegExp. $1, (this. getFullYear () + ""). substr (4-RegExp. $1. length ); For (var k in o) if (new RegExp ("(" + k + ")"). test (fmt) fmt = fmt. replace (RegExp. $1, (RegExp. $1. length = 1 )? (O [k]): ("00" + o [k]). substr ("" + o [k]). length); return fmt;} var dateTime = new Date (). format ("yyyyMMdd"); // gives back today at yyyy/MM/dd HH: mm: ss.000setVariable ("curdate", dateTime, "s ");

In addition, if the field value of the previous node is used, it is simpler to modify the variable value, as shown below:

(4) use variables (as usual ):

1.1.5 use Java code to access the variable and call the jar package and generate a verification file

This is a comprehensive example. First, we define a record (which may be understood as many variables) and call the jar package through java code, calculate the number of records, file size, and MD5 value, assign values to the corresponding record fields, and output them to the file to form a data file validation file.

(1) The main process is as follows:

(2) generate records


(3) Java processing

// Import the edited package in eclipse, which is mainly used to calculate the number of file lines and the MD5 value import cgb. tools. kettleHelper; import java. io. file; import java. io. IOException; // The defined row Processing Method in kettle. each row of record executes a public boolean processRow (StepMetaInterface smi, StepDataInterface sdi) throws KettleException {// (1) obtain the input row Object [] r = getRow (); if (r = null) {setOutputDone (); return false;} r = createOutputRow (r, data. outputRowMeta. size (); // (2) read and retrieve the parameter variable value String kettleoutputdir = getVariable ("kettleoutputdir", ""); String hbprovince_code = getVariable ("hbprovince_code ", ""); String item_code = getVariable ("item_code", ""); String curdate = getVariable ("curdate", ""); String filename = ""; string onlyfilename = ""; String recordcount = ""; String bytecount = ""; String md5code = ""; // (3) call the jar package to calculate the MD5 value, number of lines, and number of bytes. try {filename = kettleoutputdir + "\" + hbprovince_code + "_" + item_code + "_ day _" + curdate +". csv "; onlyfilename = hbprovince_code +" _ "+ item_code +" _ day _ "+ curdate + ". csv "; md5code = KettleHelper. getFileMD5String (filename); recordcount = String. valueOf (KettleHelper. getFileRecordCount (filename, true); bytecount = String. valueOf (KettleHelper. getFileByteCount (filename);} catch (IOException e) {e. printStackTrace ();} // (4) Put the calculated value into the output record get (Fields. out, "filename "). setValue (r, onlyfilename); get (Fields. out, "recordcount "). setValue (r, recordcount); get (Fields. out, "byteCount "). setValue (r, bytecount); get (Fields. out, "md5code "). setValue (r, md5code); // (5) output to the next node for processing putRow (data. outputRowMeta, r); return true ;}

(4) development and export of Jar packages

You can use eclipse to generate the jar package called as follows:

Package XXX. tools; import java. io. bufferedReader; import java. io. file; import java. io. fileInputStream; import java. io. fileNotFoundException; import java. io. fileReader; import java. io. IOException; import java. io. inputStream; import java. security. messageDigest; import java. security. noSuchAlgorithmException; public class KettleHelper {/*** default password string combination, used to convert bytes into hexadecimal characters, apache verifies the correctness of the downloaded file using the default combination */privat E static char hexDigits [] = {'0', '1', '2', '3', '4', '5', '6', '7 ', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F'}; private static MessageDigest messagedigest = null; static {try {messagedigest = MessageDigest. getInstance ("MD5");} catch (NoSuchAlgorithmException ngex) {System. err. println ("MD5Util. class. getName () "+" initialization failed. MessageDigest does not support MD5Util. "); Nsaex. printStackTrace () ;}/ *** generate the md5 check value of the String ** @ param s * @ return */private static String getMD5String (String s) {return getMD5String (s. getBytes ());} /*** determine whether the md5 verification code of the string matches a known md5 code ** @ param password the string to be verified * @ param md5PwdStr the known md5 Verification Code * @ return */ public static boolean checkPassword (String password, string md5PwdStr) {String s = getMD5String (password); return s. equals (md5PwdStr ); }/*** Md5 checksum of the generated file ** @ param file * @ return * @ throws IOException */public static String getFileMD5String (String fileName) throws IOException {File file = new File (fileName); InputStream FCM; FCM = new FileInputStream (file); byte [] buffer = new byte [1024]; int numRead = 0; while (numRead = FS. read (buffer)> 0) {messagedigest. update (buffer, 0, numRead);} FCM. close (); return bufferToHex (me Ssagedigest. digest ();}/*** get the number of records of a file ** @ param file * @ return * @ throws IOException */public static int getFileRecordCount (String fileName, boolean hasHeadRow) throws IOException {File inFile = new File (fileName); // read the CSV File @ SuppressWarnings ("unused") String inString = ""; int count = 0; try {BufferedReader reader = new BufferedReader (new FileReader (inFile); while (inString = reader. readL Ine ())! = Null) {count ++;} reader. close ();} catch (FileNotFoundException ex) {System. out. println ("no file found! ");} Catch (IOException ex) {System. out. println (" An error occurred while reading and writing the file! ") ;}If (hasHeadRow) {count --;} return count ;} /*** generate the md5 checksum of the file ** @ param File * @ return * @ throws IOException */public static String getFileMD5String (file) throws IOException {InputStream FCM; FS = new FileInputStream (file); byte [] buffer = new byte [1024]; int numRead = 0; while (numRead = Fi. read (buffer)> 0) {messagedigest. update (buffer, 0, numRead);} FCM. close (); return bufferToHex (messagedigest. digest ();} private static String getMD5String (byte [] bytes) {messagedigest. update (bytes); return bufferToHex (messagedigest. digest ();} private static String bufferToHex (byte bytes []) {return bufferToHex (bytes, 0, bytes. length);} private static String bufferToHex (byte bytes [], int m, int n) {StringBuffer stringbuffer = new StringBuffer (2 * n); int k = m + n; for (int l = m; l <k; l ++) {appendHexPair (bytes [l], stringbuffer);} return stringbuffer. toString ();} private static void appendHexPair (byte bt, StringBuffer stringbuffer) {char c0 = hexDigits [(bt & 0xf0)> 4]; // convert the numbers with four or more characters in a byte. >>> this parameter shifts the right of the logic and the right of the symbols, the difference between the two symbols is not found here. char c1 = hexDigits [bt & 0xf]; // converts the numbers of the Middle and Lower 4 bits to stringbuffer. append (c0); stringbuffer. append (c1);}/*** get the number of bytes of a document handle * @ param fileName * @ return * @ throws IOException */public static long getFileByteCount (String fileName) throws IOException {File file = new File (fileName); return file. length ();}}

Export jar package

Put it into the kettle jar package Directory, which will be self-loaded, the operating system is different, the directory will be different, I am using win7, the directory is as follows:



(5) file output

(6) The final file generation effect is as follows:

1.1.6 use variables in SQL

The following query statement uses a question mark placeholder. When the start date (the first? Number) and end date (second? The question mark placeholder bound to the SQL statement. When querying the President information during the period of entry date:

SELECTname, took_office FROMpresidents wher%k_officebetween? AND?

In this example, the Generdate Rows step is used to generate a record with two fields in one row, replacing the placeholder in the SQL statement entered in the Table in sequence. In actual scenarios, dynamic processing results are usually used to generate an expected value instead of a line step.

Next is the table input Step, where the SQL query statement is configured, INCLUDING THE QUESTION MARK placeholder. In the "Insert Data Step" drop-down box, select the previous Step to replace the question mark value.

Execute the query multiple times by transmitting different values

If you want to Execute the query cyclically and replace the placeholder with different values, you need the placeholder production step to generate multiple rows of data and select the option "Execute for each row" entered in the table. In this example, the file name is placeholders_in_loop.ktr.

Placeholder limitations

Although it is very effective to bind a value to a placeholder, some scenarios cannot be used. Some of the following SQL statements cannot use placeholders. These examples are very common, but do not use placeholders.

Table nouns cannot be replaced by placeholders. Otherwise, the query is not executed.

SELECT some_fieldFROM?

You cannot use placeholders to replace the name of the queried field. parameters can be successfully bound to the following query, but only as a constant rather than a field name.

SELECT? Asmy_field FROM table

You cannot bind multiple list item values separated by commas (,) with placeholders. If you bind "1, 2, 3" to the following query statement, unexpected results are returned.


The expected result is:

SELECT * FROM testWHERE id IN ("1, 2, 3 ")

However, the running result is as follows. If you transmit a string, you get three values, but the actual situation is completely unknown, there are several values transmitted in.

SELECT * FROM testWHERE id IN (1, 2, 3)

To solve these problems, kettle variables must be used to dynamically construct the query text. The following describes in detail.

Kettle variables are used in SQL queries.

The input steps of the table support replacing the variables or parameters in the query. Assume that there are a series of fully related tables: mammals, birds, insects (animals, birds, and insects ), you can use the kettle variable as the table name. Assume that we have a variable named ANIMALS_TABLE and the value is birds. We set the "Replace Vaiables" option to be selected. If we write the following query:

SELECT name, population FROM $ {ANIMALS_TABLE}

The execution must be successfully replaced:

SELECT name, population FROM birds

If the variable value is set to "mammals" or "insects", different tables are queried dynamically. When Placeholders are not competent, variable technology can help us solve the problem. The sample name is variables. ktr. Do not forget to assign a value to parameter during the runtime for testing.

Use variables and placeholders together

If necessary, we can mix these two technologies. In this example, variables are used as table nouns and Placeholders are used as input values for the previous step. The sample file variables_and_placeholders.ktr.

SELECT name, population FROM $ {ANIMALS_TABLE} WHERE population>?


Copyright Disclaimer: This article is an original article by the blogger and cannot be reproduced without the permission of the blogger.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.