Take a connection as an example to illustrate the difference between kettle and SQL processing data

Source: Internet
Author: User

First of all, I do not understand Java, but from the implementation of the results of speculation kettle processing mechanism, there is no place, please also advise.

As far as I'm guessing: SQL processing of data is processed in batches , and kettle processing of data is done on a line-by- row basis .

First understand Meger (merge) and join (join), assuming a table of two fields, B table two fields

Meger: The result is two fields.

Join: The result is 4 fields.

The following joins and merges from Kettle to illustrate the difference between kettle and SQL. Because kettle is processed on a line-by- row basis, be sure to order the keywords before connecting them.

1. Join Rows (Cartesian product)

A> getting data from multiple data sources

b> specifies a primary data source that specifies the filter criteria for the primary data source

C> takes one piece of data from the main data source. If the filter criteria are not met, the data loops of the other data sources are added after the filter criteria are met.

D> remove a piece of data from the primary data source and repeat the C step until the primary data source data is exhausted.

Similar to SQL (Kettle execution results are consistent with SQL) are as follows:

--No screening
SELECT * from
Test_join_1 Atest_join_2 Btest_join_3 C;
--There are screening
SELECT *from test_join_1 ajoin test_join_2 bjoin test_join_3 C
WHERE
a.ID = ' 123 '
and a.name = ' lio5n ';

2. Merge Join

A> getting data from two data sources

B> two data source data by line (including: Inside, left, right, outside), the keyword to wait for the condition.

Similar to SQL (Kettle execution results are consistent with SQL) are as follows:

SELECT    * From 
test_join_1 AJOIN on= b.id;

3. Merge Rows (diff)

A> getting data from two data sources

B> a data source as a reference (Reference) data source, with an additional external one as a merge (Compare) data source

C> the data from Reference and Compare in the order of the key fields, merging them together, and adding a marker change (identical, delete, new) field.

D> Note: In identical case, the non-critical fields of Compare appear in the result set

SQL is similar to the following (just like, generally not the following result, because kettle is processed by the sorted keyword line-by-row):

SELECTa.id ID, A.name NAME,'deleted'Falgfield fromtest_join_1 A Left JOINTest_join_2 B ona.ID=b.idWHEREb.id is NULLUNIONSELECTb.id ID, B.name NAME,'identical'Falgfield fromtest_join_1 AJOINTest_join_2 B ona.ID=b.idUNIONSELECTb.id ID, B.name NAME,'New'Falgfield fromtest_join_1 A Right JOINTest_join_2 B ona.ID=b.idWHEREa.id is NULL;

4. Multiway Merge Join

A> getting data from multiple data sources

B> and associates (including: inner and outer associations) are records that have equal fields.

Similar to SQL (Kettle execution results are consistent with SQL) are as follows:

SELECT    *  from       test_join_1 Ajoin  test_join_2 B on a.id = b.idjoin  test_ Join_3 C on a.id = C.id;

5. Sorted Merge

A> getting data from multiple data sources

B> extract data from two streams according to the keyword sequence

C> finally merges the data in two streams into a keyword sequence.

Similar to SQL (Kettle execution results are consistent with SQL) are as follows:

SELECT  from test_join_1 UNION  All SELECT  from test_join_2 UNION  All SELECT  from Test_join_3 ORDER  by ID;

6. Append Streams

A> getting data from two data sources

B> take out all the data in the first stream

C> Append data from the second stream to the first stream

Similar to SQL (Kettle execution results are consistent with SQL) are as follows:

SELECT  from test_join_1 UNION  All SELECT  from test_join_2;

7. Prioritize streams

A> getting data from multiple data sources

B> are appended to a stream according to the specified order

Similar to SQL (Kettle execution results are consistent with SQL) are as follows:

SELECT ID, name from test_join_1union allselect ID, name from test_join_2union allselect ID, name from Test_j Oin_3  
;

Take a connection as an example to illustrate the difference between kettle and SQL processing data

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.