ETL Tool and kettle implement Loop

Source: Internet
Author: User
Tags sqoop

Kettle is an open-source ETL Tool written in Java. It can be run on Windows, Linux, and Unix. It does not need to be installed green, and data extraction is efficient and stable.

 

Business Model: there is a large table in a relational database, which is designed as a parity database storage. Each database has 100 identical tables, each table stores 1000 million data records, and the fields are switched to the next table. This data needs to be synchronized to hive (HDFS) and extracted cyclically. If incremental fields are taken for extraction (the table in which the incremental data is stored every day, the odd or even databases are unknown ).

A sqoop runs directly from MySQL to hive, so some special characters may cause sqoop to terminate abnormally. In this way, a large number of databases on the server are retrieved in a loop, which puts a lot of pressure on the server and can easily paralyze the server.

B uses kettle to handle the conversion process. Kettle supports querying data by page to reduce the pressure on the server to a certain extent.


1. first look at the summary diagram (the following version is 5.1)


2. Set Environment Variables


3: JavaScript code

 


The edited content is

VaR count;

Count = parent_job.getvariable ("v_id ");

If (COUNT = 10 ){

False;

} Else {

Count ++;

Parent_job.setvariable ("v_id", count );

True;

}

4. New Conversion

 

Edit the conversion. The content is as follows:


5 dummy condition judgment, not modified

 

It is important to set the loop logic, arrow direction and type.


6. Execute the job and test the cycle.

In addition, the kettle loop of version 3.2 is attached.


Set Variables


Set judgment Conditions


Conversion table input file output


JS judgment

 

 

 

ETL Tool and kettle implement Loop

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.