Kettle Use tutorial (i)

Source: Internet
Author: User

Kettle itself has three main components: Spoon,kitchen,pan. Where spoon is a graphical interface for Windows, set the environment variables first: Pentaho_java_home, for example: C:\Program

Files\java\jdk1.7.0_25, is actually your Java installation directory, more than 1.6 can. Under Windows Double-click on Spoon.bat, the interface is as follows:

Here I set up a library, in fact, can be stored in the form of files, the structure of the storage is XML, but I still feel that the establishment of a resource base is better, look at job and other situations are relatively simple, because the readability of the data table is better than XML

Much more. Create a resource pool and a file repository just click the small plus sign in the upper right corner and the following interface will appear:


The first option is to create a database version of the Repository, and then:


After:

After the test, click OK to go back to the beginning of the interface, this time select test database connection, and then access to your project (I am so called) ID and name, here to remember, because later kitchen scheduling to enter this

Parameters.


Click "Yes" in the box that pops up next, and the interface will appear:

This step will create a lot of tables under your users, so it's best to create a single user for the repository, but under Oracle, under MySQL and DB2, it's best to use the same approach to separate the repository from other libraries. Check:

Sql> Conn Wings/wings@prism

is connected.

Sql> Select COUNT (1) from R_repository_log;

COUNT (1)

----------

0

Sql>

The table has been built. Back to the beginning of the interface, select Test, click OK, then the login dialog will appear, the user password is the default admin, you can change their own.

The next step is to start using the tool.

In fact, for the simple database data extraction, basically only need to transform and work these two things. Here's how to create a transformation:

1 clicks on the file--new---convert.

2 Select "Main Object Tree" in the tree-like list on the left to create a new DB connection. The steps are the same as the repository above. A source library for a target library.

3 Drag a table input in the core object--Enter this place, drag out the "table output" in the "Output" directory, and drag a field selection at "Convert", as shown in figure:


Each object can be double-clicked to modify the properties, and the following is an example of extracting the World database's city table.

Double-click the table input, select the database connection, select the source database, and then click "Get SQL query Statement", in the dialog box to choose, then it will become this


Click the table output below:


Click on the field selection:


In this way, a simple conversion of the extracted data is done. Execute it, click on the Green Start button above.

I am also in the study, I hope that I can share my experience with the same as I the beginner.

Here are the supplemental sections:

After a job or a trans is established, you can set up a timed task. In the case of DS, the DS client natively supports schedule, but kettle because there is no concept of the server and the client, it only uses

Linux crontab, in fact, the job itself also supports timing, but you have to ensure that the graphical interface is always open, so it is not as good as crontab. Using kettle on the command line is simple, the job is scheduled with kitchen, trans uses pan to adjust

Degree.

The following is a kitchen dispatch command:

Bash/home/kettle/data-integration/kitchen.sh/rep kettle_demo/user username/pass passwd/level minimal/dir/dirname/j OB JobName

Rep writes its own repository name.

Trans is the same as above, slightly different:

Bash/home/kettle/data-integration/pan.sh/rep kettle_demo/user username/pass passwd/level Minimal/dir/dirname/trans Transname


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.