Etl tool, kettle implementation loop, etl Tool kettle implementation

Source: Internet
Author: User
Tags sqoop

Etl tool, kettle implementation loop, etl Tool kettle implementation

Kettle is an open-source ETL Tool written in java. It can be run on Windows, Linux, and Unix. It does not need to be installed green, and data extraction is efficient and stable.

 

Business Model: there is a large data storage table in the relational database, which is designed as a parity database storage. Each database has 100 identical tables, and each table stores 1000 million, automatically switch to the next table. This data needs to be synchronized to hive (hdfs) and extracted cyclically. If incremental fields are taken for extraction (the table in which the incremental data is stored every day, the odd or even databases are unknown ).

A sqoop runs directly from mysql to hive, so some special characters may cause sqoop to terminate abnormally. In this way, a large number of databases on the server are retrieved in a loop, which puts a lot of pressure on the server and can easily paralyze the server.

B uses kettle to handle the conversion process. Kettle supports querying data by page to reduce the pressure on the server to a certain extent.


1. first look at the summary diagram (the following version is 5.1)


2. Set Environment Variables


3: javascript code

 


The edited content is

Var count;

Count = parent_job.getVariable ("V_ID ");

If (count = 10 ){

False;

} Else {

Count ++;

Parent_job.setVariable ("V_ID", count );

True;

}

4. New Conversion

 

Edit the conversion. The content is as follows:


5 dummy condition judgment, not modified

 

It is important to set the loop logic, arrow direction and type.


6. Execute the job and test the cycle.

In addition, the kettle loop of version 3.2 is attached.


Set Variables


Set judgment Conditions


Conversion table input file output


Js judgment

 

 

 


Kettle, an ETL tool, has been used to connect to domestic DM databases.

1. download the latest kettle version.

2. Place the jdbc driver of the JDK version of DM under the D: \ kettle \ pdi-ce-4.4.0-stable \ data-integration \ libext \ JDBC installation directory.

3. Select generic database in connection type and fill in the url and Driver on the right.

Who has used kettle spoon's open source etl Tool? Is there a detailed tutorial?

There are a lot of online resources. If you have a foundation, you can learn and create projects. You can get started in one month.

This type of tool is easy to get started with, but to do well, you must have a certain database foundation, certain development capabilities, and a thorough understanding and foresight of the project.

We recommend that you search for a QQ group. Of course, you must have basic, self-learning, and research skills.

KETTLE and SSIS in SQL 2005 are both a type of tool.

KETTLE is widely used now and it is quite easy to use.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.