Kettle implements dynamic SQL query and kettle implements dynamic SQL
Dynamic SQL query in kettle
In ETL projects, some SQL statements, such as data query, are usually executed based on the input parameters at run time. This article describes the dynamic query and parameter query through the table input step in kettle. The sample code uses a memory database (H2) and can be directly run after downloading. It is easier to learn from the example.
Bind a placeholder field value in an SQL query statement
The first near-dynamic statement is the familiar execution of SQL code. It starts to write an SQL query, including some placeholders, and then binds the values to placeholders, make it a valid query and execute it. You can bind multiple values as needed and execute them cyclically. The name of this example is the placeholders. ktr file.
In the example, first create the presidents table and fill in the data (about the US President's content). The Code is as follows and the fields are: name, State, political party, occupation, Graduate School, employment date, departure date.
Create table presidents (
Name VARCHAR (255 ),
State VARCHAR (255 ),
Party VARCHAR (64 ),
Occupation VARCHAR (64 ),
College VARCHAR (64 ),
Took_office DATE,
Left_office DATE
);
The following query statement uses a question mark placeholder. When the start date (the first? Number) and end date (second? The question mark placeholder bound to the SQL statement. When querying the President information during the period of entry date:
SELECTname, took_office FROM presidents wher1_k_officebetween? AND?
In this example, the Generdate Rows step is used to generate a record with two fields in one row, replacing the placeholder in the SQL statement entered in the Table in sequence. In actual scenarios, dynamic processing results are usually used to generate an expected value instead of a line step.
Next is the table input Step, where the SQL query statement is configured, INCLUDING THE QUESTION MARK placeholder. In the "Insert Data Step" drop-down box, select the previous Step to replace the question mark value.
Execute the query multiple times by transmitting different values
If you want to Execute the query cyclically and replace the placeholder with different values, you need the placeholder production step to generate multiple rows of data and select the option "Execute for each row" entered in the table. In this example, the file name is placeholders_in_loop.ktr.
Placeholder limitations
Although it is very effective to bind a value to a placeholder, some scenarios cannot be used. Some of the following SQL statements cannot use placeholders. These examples are very common, but do not use placeholders.
Table nouns cannot be replaced by placeholders. Otherwise, the query is not executed.
SELECT some_fieldFROM?
You cannot use placeholders to replace the name of the queried field. parameters can be successfully bound to the following query, but only as a constant rather than a field name.
SELECT? Asmy_field FROM table
You cannot bind multiple list item values separated by commas (,) with placeholders. If you bind "1, 2, 3" to the following query statement, unexpected results are returned.
SELECT * FROM testWHERE id IN (?)
The expected result is:
SELECT * FROM testWHERE id IN ("1, 2, 3 ")
However, the running result is as follows. If you transmit a string, you get three values, but the actual situation is completely unknown, there are several values transmitted in.
SELECT * FROM testWHERE id IN (1, 2, 3)
To solve these problems, kettle variables must be used to dynamically construct the query text. The following describes in detail.
Kettle variables are used in SQL queries.
The input steps of the table support replacing the variables or parameters in the query. Assume that there are a series of fully related tables: mammals, birds, insects (animals, birds, and insects ), you can use the kettle variable as the table name. Assume that we have a variable named ANIMALS_TABLE and the value is birds. We set the "Replace Vaiables" option to be selected. If we write the following query:
SELECT name, population FROM $ {ANIMALS_TABLE}
The execution must be successfully replaced:
SELECT name, population FROM birds
If the variable value is set to "mammals" or "insects", different tables are queried dynamically. When Placeholders are not competent, variable technology can help us solve the problem. The sample name is variables. ktr. Do not forget to assign a value to parameter during the runtime for testing.
Use variables and placeholders together
If necessary, we can mix these two technologies. In this example, variables are used as table nouns and Placeholders are used as input values for the previous step. The sample file variables_and_placeholders.ktr.
SELECT name, population FROM $ {ANIMALS_TABLE} WHERE population>?
Sample download
You canHereDownload the sample file. All examples have passed the test in kettle5.1, and the test data uses the H2 memory database. After downloading the data, you can run the test directly, which is very easy. I hope you can learn it smoothly.
How can kettle40 Execute SQL scripts before other tasks?
In the conversion, each [STEP] is executed concurrently, and the data stream is executed in order.
How can I dynamically query fields in SQL?
Create table # test (id INT, start_date DATE, end_date DATE); insert into # test VALUES (1, '2017-01-01 ', '2017-01-31 '); insert into # test VALUES (1, '2017-03-01 ', '2017-03-07'); insert into # test VALUES (2, '2017-01-01 ', '2017-03-07 '); insert into # test VALUES (2, '2017-02-01', '2017-02-07 '); insert into # test VALUES (2, '2017-03-01 ', '2017-03-07'); insert into # test VALUES (2, '2017-04-01 ', '2017-04-07'); -- assume the field 1 is id -- field 2 is uncertain. The field 2 corresponding to each record needs to be determined based on the record returned by field 1. -- If Field 1 = 1, start_date is returned; otherwise, end_dateSELECT id AS [Field 1], case when id = 1 THEN start_date ELSE end_date end as [Field 2] FROM # test; field 1 Field 2 ----------- ---------------- 1 2013-01-01 1 2013-03-01 2 2013-03-07 2 2013-02-07 2 2013-03-07 2 2013-04-07 (6 rows affected)