ETL Sharp Weapon Kettle Practical application

Source: Internet
Author: User
Tags constant save file

The main indexes of this article series are as follows:

First, ETL sharp weapon Kettle Practical Application Analysis Series one "Kettle Use introduction"

Second, ETL sharp weapon Kettle Practical Application Analysis Series two "application Scenarios and actual combat demo Download"

Three, ETL sharp weapon Kettle Practical Application Analysis Series three "ETL background Process Execution configuration mode"

The main reading of this article is as follows:

1. Kettle Concept

2. Download and deploy

3, Kettle Environment configuration

4. Kettle Usage and Components introduction

ETL (extract-transform-load abbreviation, that is, data extraction, transformation, loading process), for enterprise or industry applications, we often encounter a variety of data processing, conversion, migration, so understand and master the use of an ETL tool, essential, Here I introduce a I used in the work of 3 years of ETL tools kettle, the spirit of good things not only enjoy the idea of sharing the collision with everyone to exchange a bit. In use I feel this tool is really powerful, support graphical GUI design interface, and then can flow in the form of workflow, in doing some simple or complex data extraction, quality testing, data cleansing, data conversion, data filtering and other aspects have a relatively stable performance, the most important we use it skillfully, It has reduced a lot of research and development work and improved our productivity, but the only regret for my. NET developers is that the tool is written in Java.

1. Kettle Concept

Kettle is a foreign open source ETL tool, written in Java, can run on Windows, Linux, UNIX, green without installation, data extraction efficient and stable.

Kettle Chinese name is the kettle, the project's main programmer, Matt, wants to put all kinds of data into a pot and then flow out in a specified format.

Kettle This ETL toolset, which allows you to manage data from different databases by providing a graphical user environment to describe what you want to do, rather than what you want to do.

There are two script files in Kettle, transformation and job,transformation complete the fundamental transformation of the data, and the job completes the control of the entire workflow.

2. Download and deploy kettle can download Kettle compression package on http://kettle.pentaho.org/website, because Kettle is green software, unzip to any local path

3, Kettle Environment configuration (with the Java environment directly ignore this section)

3, 1 installing Java JDK

1) first to the official Web download corresponding JDK package, JDK1.5 or above version on the line;

2) Install the JDK;

3) Configure the environment variables, with configuration method:

After the installation is complete, it is necessary to configure it to use, first set some environment variables, for Java, the most necessary environment variables are the system path variable path.

(1) To open the Settings window for the environment variable. Right-click My Computer and in the shortcut menu that pops up, choose the Properties option and go to the System Properties dialog box, as shown in the figure. Select the "Advanced" tab, go to the "Advanced" tab, and click the "Environment variables" button to enter the "Environment variables" dialog box, as shown in the figure:

(2) In the Administrator User variables list box, select the variable path and click the Edit button when the row is highlighted, as shown in the figure.

(3) in the "Edit System Variables" dialog box that pops up, set the bin directory path under the JDK installation path to the path variable, as shown in the figure.

After editing, click the "OK" button to save, the environment variable path setting is formally completed.

Note: Setting the path of the path variable must be the bin directory in the JDK installation directory, and sometimes the installation directory of the JRE will be available on the same level as the JDK installation directory, so carefully select the relevant path and avoid setting the path to the bin directory under the JRE directory.

3, 2 Test the JDK configuration is successful

Once you've set up your environment variables, you can test the variables you just set up and see if Java can run.

(1) Click the Start button, select the Run option, and enter the cmd command in the Run dialog box.

(2) After clicking the "OK" button, open the Command Line window.

(3) At the cursor input: Javac command, press the ENTER key to execute, you can see the test results

3, 3 run kettle into the kettle directory, if Kettle deployed in the Windows environment, double-click Run Spoon.bat or Kettle.exe file, the following interface appears: So the configuration environment this piece is basically finished. 4. Kettle Usage and Component Introduction 4.1 kettle uses kettle to provide a way to integrate all the work, but because the resource pool is not easy to transplant, so we choose no resource pool; 1) Create a new transformation and click Save to Local Path , for example, save to D:/etltest, save the file name etltesttrans,kettle default transformation file is saved after the suffix named ktr; 2) Create a new job, click Save to Local path, for example save to D:/etltest , save file name etltestjob,kettle default job file after saving suffix named KJB; 4.2 Component Tree Introduction

The main tree menu lists the basic properties of a transformation that can be viewed through individual nodes.

DB connection: Displays the database connection in the current transformation, and each transformation database connection needs to be configured separately.

Steps: A list of links that are applied to a transformation

Hops: A list of node connections that are applied to a transformation

The Core Objects menu lists the links that can be called in transformation, which can be added by dragging the mouse.

Input: Import link

Output: Export link

Lookup: Query link

Transform: Conversion link

Joins: Connection link

Scripting: Script Link

4.3 Transformation Conversion Introduction Each link can be added to the main window by dragging the mouse. And through the shift+ mouse drag, to achieve the link between links. Introduction to the common link of conversion

Category

Link name

Function description

Input

Text file input

Entering data from a local text file

Table input

Entering data from a database table

Get System Information

Read System Information input data

Output

Text file output

Output processing results to a text file

Table Output

Output processing results to a database table

Insert/Update

Inserts an update on the database table model based on the results of the processing, or an update if no related records exist in the database. Will be judged based on the fields in the query criteria

Update

Update the database according to the processing result, if the data that need to be updated is not recorded in the database table, the error will stop.

Delete

Database records are deleted based on processing results, and if data that needs to be deleted is not recorded in the database table, an error is stopped

Lookup

Database queries

Query the target table according to the set query criteria and return the required result fields

Stream query

Reads the target table into memory and queries the in-memory dataset with query criteria

Call DB Stored Procedure

Calling a database stored procedure

Transform

Field Selection

Select the fields you want, filter out the fields you want, or do database fields

Filter Records

Classify records according to criteria

Sort records

Sort the data according to a condition

Empty operation

No action

Add constant

Increase the required constant fields

Scripting

Modified Java Script Value

Extend functionality, write JavaScript scripts, and process data accordingly

Mapping

Mapping (sub-transformations)

Data mapping

Job

Sat Variables

Setting environment variables

Get Variables

Get Environment variables

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.