Kettle (i) overview

Source: Internet
Author: User

The last two months have been dealing with Kettle , from the beginning did not hear, to now can skillfully use, have to say the project driven, learning things is the fastest. Well, although the task of using Kettle to cope with the project is more than enough, but still want to learn a system, summed up. For example , the job is relatively small,kettle cluster mode is not involved, and so on.


Speaking of Kettle, or first talk about ETL(Extract-transform-load,< /c10> is the process of extracting, converting, and loading data warehousing techniques that are used to handle the extraction, transformation, and loading of data from sources (for example, a unit base server) to the destination (the project being done). That is, the new project needs to use the data from the previous project database,ETL is to solve this problem.

ETL achieve common points of attention: correctness, completeness, consistency, completeness, validity, timeliness, accessibility , and so on, i.e. whatever tool we use to achieve ETL technology, to achieve these aspects, only a quality clearance, loss of any one of them is not clearance.


ETL implementation is mainly the implementation of the transformation, including several aspects (from the encyclopedia):

1 , NULL handling: You can capture field null values, load or replace them with other meaning data, and offload to different target libraries based on field null values.

2 , normalize data format: can implement field format constraint definition, for data source data such as time, value, character, etc., can be customized to load the format.

3 , Split data: The fields can be decomposed according to business requirements. Example, the main call 861082585313-8148, can be the area code and phone number decomposition.

4 , verify data correctness: Available Lookup and Split function for data validation . For example, the main call 861082585313-8148, after the region code and phone number decomposition, you can use lookup to return to the calling gateway or switch-recorded calling area for data validation.

5 , data substitution: For business reasons, invalid data, missing data substitution can be implemented.

6 , Lookup : Seizure of Lost data Lookup Implement Sub-query , and returns the missing fields obtained by other means, guaranteeing the integrity of the field.

7 , establish ETL process of the master FOREIGN Key constraints: Non-dependent illegal data can be replaced or exported to the error data file, to ensure that the primary key unique record loading.

and,Kettle is one of the tools, and others:Informatica, DATASTAGE,OWB, Microsoft's DTS and so on. OK, here's a brief talk about kettle.


     kettle popular ) etl tools, pure java write ( java developed well integrated ), can be in the windows,linux,unix run ( linux server popular era, kettle more popular ), data extraction is efficient and stable ( More popular ). kettle Chinese name is commonly known as "kettle", the purpose of development is to put all kinds of data into a pot, and then through a variety of processing processing, in a specific format outflow.


Kettle The family includes: Spoon , Pan,chef,kitchen .


Spoon : Designed through a graphical interface ETL Conversion Process ( Transformation ) ( most commonly used ) .


PAN: allow batch execution by Spoon Design of ETL conversion (for example, using a Time Task Scheduler) Pan is a background execution program with no graphical interface.


CHEF: Allow creation of tasks ( Job ). Tasks are performed, each transformation, task, script, etc., making it more useful for accomplishing more complex tasks.


Kit Chen: allows you to bulk use the Chef design tasks (such as using a time scheduler) and a background execution program.

Kettlethe design of the transformation consists of several aspects: resource pool, database connection, job ( Job), conversion (trans), steps (step) . Image of an example: the resource pool is equivalent to one of ourJavaProject , the database connection is equivalent to ourJavaConnection database in the project,JobequivalentJavaa line in the project, andTransequivalentJavaa class,Stepis the method in the class. So from this perspective, actuallyKettleIt's still very simple. What we need to do is to build the repository, connect the database, build the transformation, write each step in the class, and connect the transitions together to compose the task (and of course the conversion can be performed independently).

          Well, last look at when it's appropriate to use kettle 。 Such a project a a a It's easy to complete the task. That is, our projects need to migrate data between a large number of databases.


Good,kettle theory Introduction, behind see Kettle simple use. Finally, an interface diagram with a kettle tool is attached:


Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Kettle (i) overview

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.