Kettle itself has three main components: Spoon,kitchen,pan. Where spoon is a graphical interface for Windows, set the environment variables first: Pentaho_java_home, for example: C:\Program
Files\java\jdk1.7.0_25, is actually your Java installation directory, more than 1.6 can. Under Windows Double-click on Spoon.bat, the interface is as follows:
Here I set up a library, in fact, can be stored in the form of files, the structure of the storage is XML, but I still feel that the establishment of a resource base is better, look at job and other situations are relatively simple, because the readability of the data table is better than XML
Much more. Create a resource pool and a file repository just click the small plus sign in the upper right corner and the following interface will appear:
The first option is to create a database version of the Repository, and then:
After:
After the test, click OK to go back to the beginning of the interface, this time select test database connection, and then access to your project (I am so called) ID and name, here to remember, because later kitchen scheduling to enter this
Parameters.
Click "Yes" in the box that pops up next, and the interface will appear:
This step will create a lot of tables under your users, so it's best to create a single user for the repository, but under Oracle, under MySQL and DB2, it's best to use the same approach to separate the repository from other libraries. Check:
Sql> Conn Wings/wings@prism
is connected.
Sql> Select COUNT (1) from R_repository_log;
COUNT (1)
----------
0
Sql>
The table has been built. Back to the beginning of the interface, select Test, click OK, then the login dialog will appear, the user password is the default admin, you can change their own.
The next step is to start using the tool.
In fact, for the simple database data extraction, basically only need to transform and work these two things. Here's how to create a transformation:
1 clicks on the file--new---convert.
2 Select "Main Object Tree" in the tree-like list on the left to create a new DB connection. The steps are the same as the repository above. A source library for a target library.
3 Drag a table input in the core object--Enter this place, drag out the "table output" in the "Output" directory, and drag a field selection at "Convert", as shown in figure:
Each object can be double-clicked to modify the properties, and the following is an example of extracting the World database's city table.
Double-click the table input, select the database connection, select the source database, and then click "Get SQL query Statement", in the dialog box to choose, then it will become this
Click the table output below:
Click on the field selection:
In this way, a simple conversion of the extracted data is done. Execute it, click on the Green Start button above.
I am also in the study, I hope that I can share my experience with the same as I the beginner.
Here are the supplemental sections:
After a job or a trans is established, you can set up a timed task. In the case of DS, the DS client natively supports schedule, but kettle because there is no concept of the server and the client, it only uses
Linux crontab, in fact, the job itself also supports timing, but you have to ensure that the graphical interface is always open, so it is not as good as crontab. Using kettle on the command line is simple, the job is scheduled with kitchen, trans uses pan to adjust
Degree.
The following is a kitchen dispatch command:
Bash/home/kettle/data-integration/kitchen.sh/rep kettle_demo/user username/pass passwd/level minimal/dir/dirname/j OB JobName
Rep writes its own repository name.
Trans is the same as above, slightly different:
Bash/home/kettle/data-integration/pan.sh/rep kettle_demo/user username/pass passwd/level Minimal/dir/dirname/trans Transname