Kettle Introduction __kettle

Source: Internet
Author: User
1. What is Kettle

Kettle is "kettle e.t.t.l. Envirnonment" initials only, which means it is designed to help you achieve your ETL needs: Extract, transform, load data; Kettle translated into Chinese name should be called Kettle, The origin of the name as MATT, the program's main programmer, said in a forum: I want to put all kinds of data in a pot and then flow out in a specified format.

Kettle is an excellent, open source ETL software, which is based on Java implementation, code hosted on the GitHub, the corresponding address is: Https://github.com/pentaho/pentaho-kettle 2. Kettle Installation

Kettle is dependent on JDK and must first be installed with JDK,JDK version 1.6 or above. There are two ways to get a kettle installation package, one is to compile it using your source code (depending on ant, you must install ant), and the other is to download the compiled installation package on the official web. Kettle The latest installation package can be downloaded at the following link:
http://community.pentaho.com/projects/data-integration/

The contents of the installation package are roughly as follows:

3. Kettle Client

The kettle core provides multiple clients and applies to different stages of ETL development, as shown in the following illustration

3.1 Spoon

Spoon, integrated development environment. Provides user graphical interface creation and editing tasks and transformation definitions. It also provides execution and debugging tasks and transformations, and also includes performance monitoring capabilities. In the project development phase, the process is designed to correspond to transformations and jobs by using the tool.

The integrated development environment. offers a graphical user interface for creating and editing job and transformation definitions. Spoon can also be used to execute and debug jobs and transformations, and it also includes for functionality M Onitoring.

You can open the Spoon graphical tool by double-clicking the Spoon.bat under the installation package in the Windows development environment.

In the Linux development environment you need to perform spoon.sh files to open the spoon, you must ensure that Linux is installed and the graphical interface is turned on. 3.2 Kitchen

Kitchen, a kettle client tool that executes on the command line and is used to perform kettle jobs (note that it cannot be used to perform transformations), can be used to integrate with the scripts of the operating system layer. You typically combine cron, or Windows Task Manager, to define a recurring task. For a small-scale ETL production environment, you can complete periodic ETL process calls by writing shell scripts or batch scripts and using the kitchen tool.

A command Line–driven Job runner, which can is used to integrate kettle with Os-level. It is typically used to schedule jobs with a scheduler such as Cron, at, or the Windows Task scheduler. 3.3 Pan

The pan, like kitchen, is a command-line executor, but it can only perform transformation definitions, not jobs.

A command Line–driven just like Kitchen, but it are used for executing transformations of jobs. 3.4 Carte

Carte is a lightweight web application that runs in the background (based on Jetty HTTP Web service application) that performs kettle transformations or jobs by listening to requests. Carte can also be used to deploy kettle clusters and perform transformations or jobs on the cluster. Typically, you should consider using Carte to perform kettle transformations or jobs in a production environment.

A Light-weight Server (based on the Jetty HTTP server) which runs in the background and listens for requests to run a job. Carte is used to distribute and coordinate job execution across a collection of computers forming a kettle cluster. 4. Kettle Building block 4.1 transformation

Transformation (conversion) is the core of the implementation of ETL functions, a transformation by a number of steps (step) with hop connected to form a complete ETL process. The execution of the steps in the transformation is parallel, and one step does not begin until the previous step has been executed. 4.1.1 Step

Step is the core of the transformation, which is connected by hop, and the data (Row data) flows from hop to step and from step to step through hop. 4.1.2 Hop

Hop (link) represents a channel between the step and the next, in which the result or dataset is passed between the lines. 4.1.3 RowData

In the kettle steps, the data exists as a row and is passed between steps and steps. A row of data contains more than one field, including String, number, Integer, BigNumber, Date, Boolean, and Binary. 4.2 Job

A job is a basic unit of kettle implementing business logic, and a job can be composed of multiple transformation (transformations) to accomplish complex logical requirements. The "step" in the job (possibly a child job, possibly a conversion) is performed serially, and a "step" execution must wait until the previous step completes and the result is returned to begin execution. This is not the same as the steps of conversion, pay attention to the distinction. 4.2.1 Job Entry

A job is composed of one or more job entry that are connected in a certain order by job hop, where the job entry is equivalent to converting the step in transformation. 4.2.2 Job Hop

Similar to step hop, linking Jobentry and jobentry to form a complete job, the main difference being that job hop can set the prerequisites for entering the next jobentry, and the decision conditions are: unconditional, when the result is true, enter the next, Enter the next step when the result is false. 4.2.3 Job Entry Results

Job Entry typically produces job Entry results, and a job Entry result consists mainly of the following: A list of outcome rows: Contains a list of the resulting set of job Entry File names: Contains a collection of file names that involve job entry read files The number of lines read, Writeten, input, output, deleted, rejected and in Error by a transformation: Contains rows of read, write, input, output, delete, reject, and error in a transformation 4.3 Database connections

Kettle supports a variety of relational database connections, connected in native (JDBC mode), ODBC, and Jndi, and is often connected using JDBC.

If you are prompted not to find the JDBC Driver when you create a new connection in spoon, it is because the kettle installation package does not have a JDBC driver package placed in the corresponding database, and you can add and restart spoon yourself.
Kettle4 the directory where JDBC is placed is ${kettle_dir}/libext/jdbc/, Kettle5 and Kettle6 place jdbc directory as ${kettle_dir}/lib/, where ${kettle_dir} Represents the kettle extract directory. 5. Kettle Common Component 5.1 conversion Component

TODO here enumerate some common conversion component table input table output text file input text file output 5.2 Job Component

TODO here is a list of some common job components

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.