ETL Learning Series 1--etl Tool installation

Source: Internet
Author: User

ETL (extract-transform-load abbreviation, that is, data extraction, transformation, loading process), for enterprise or industry applications, we often encounter a variety of data processing, conversion, migration, so understand and master the use of an ETL tool, essential, Here I introduce a I used in the work of 3 years of ETL tools kettle, the spirit of good things not only enjoy the idea of sharing the collision with everyone to exchange a bit! In use I feel this tool is really powerful, support graphical GUI design interface, and then can flow in the form of workflow, in doing some simple or complex data extraction, quality testing, data cleansing, data conversion, data filtering and other aspects have a relatively stable performance, the most important we use it skillfully, It has reduced a lot of research and development work and improved our productivity, but the only regret for my. NET developers is that the tool is written in Java.

1. Kettle Concept

Kettle is a foreign open source ETL tool, written in Java, can run on Windows, Linux, UNIX, green without installation, data extraction efficient and stable.

Kettle Chinese name is the kettle, the project's main programmer, Matt, wants to put all kinds of data into a pot and then flow out in a specified format.

Kettle This ETL toolset, which allows you to manage data from different databases by providing a graphical user environment to describe what you want to do, rather than what you want to do.

There are two script files in Kettle, transformation and job,transformation complete the fundamental transformation of the data, and the job completes the control of the entire workflow.

2. Download and deploy

? Kettle can be downloaded from the http://kettle.pentaho.org/website?? Download the kettle package, as the kettle is green software, unzip to any local path can

3, Kettle Environment configuration (with the Java environment directly ignore this section)

3, 1 installing Java JDK

1) first to the official Web download corresponding JDK package, JDK1.5 or above version on the line;

2) Install the JDK;

3) Configure the environment variables, with configuration method:

After the installation is complete, it is necessary to configure it to use, first set some environment variables, for Java, the most necessary environment variables are the system path variable path.

(1) To open the Settings window for the environment variable. Right-click My Computer and select the Properties option from the popup shortcut menu to go to the System Properties dialog box. Select the "Advanced" tab, go to the "Advanced" tab, click the "Environment variables" button, and go to the "Environment variables" dialog box:

(2) in the "Administrator User variables" list box, select the variable path, and then click the Edit button when the row becomes highlighted.

(3) in the "Edit System Variables" dialog box that pops up, set the bin directory path under the JDK installation path to the PATH variable.

After editing, click the "OK" button to save, the environment variable path setting is formally completed.

Note: Setting the path of the path variable must be the bin directory in the JDK installation directory, and sometimes the installation directory of the JRE will be available on the same level as the JDK installation directory, so carefully select the relevant path and avoid setting the path to the bin directory under the JRE directory.

3, 2 Test the JDK configuration is successful

Once you've set up your environment variables, you can test the variables you just set up and see if Java can run.

(1) Click the Start button, select the Run option, and enter the cmd command in the Run dialog box.

(2) After clicking the "OK" button, open the Command Line window.

(3) At the cursor input: Javac command, press the ENTER key to execute, you can see the test results

3, 3 running Kettle

into the kettle directory, if Kettle is deployed in a Windows environment, double-clicking the run Spoon.bat or Kettle.exe file, the following interface appears: Configure the environment this piece is basically done. 4. Kettle Usage and Components introduction ? 4.1 Kettle UseKettle provides a way to integrate all of the work, but because of the resource pool portability is not convenient, so we choose no resource pool; 1) Create a new transformation, click Save to Local path, for example, save to D:/etltest, Save file named Etltesttrans,kettle default transformation file after saving suffix named ktr;2) create a new job, click Save to Local path, for example save to D:/etltest, save file named Etltestjob, Kettle The default job file is saved with the suffix named KJB; 4.2 Introduction to the component Tree

The main tree menu lists the basic properties of a transformation that can be viewed through individual nodes.

DB connection: Displays the database connection in the current transformation, and each transformation database connection needs to be configured separately.

Steps: A list of links that are applied to a transformation

Hops: A list of node connections that are applied to a transformation

The Core Objects menu lists the links that can be called in transformation, which can be added by dragging the mouse.

Input: Import link

Output: Export link

Lookup: Query link

Transform: Conversion link

Joins: Connection link

Scripting: Script Link

4.3 Transformation Conversion Introduction? Each link can be dragged by the mouse to add links to the main window.? and can be shift+ by dragging the mouse to achieve the link between links. Conversion Common Links Introduction

Category

Link name

Function description

Input

Text file input

Entering data from a local text file

Table input

Entering data from a database table

Get System Information

Read System Information input data

Output

Text file output

Output processing results to a text file

Table Output

Output processing results to a database table

Insert/Update

Inserts an update on the database table model based on the results of the processing, or an update if no related records exist in the database. Will be judged based on the fields in the query criteria

Update

Update the database according to the processing result, if the data that need to be updated is not recorded in the database table, the error will stop.

Delete

Database records are deleted based on processing results, and if data that needs to be deleted is not recorded in the database table, an error is stopped

Lookup

Database queries

Query the target table according to the set query criteria and return the required result fields

Stream query

Reads the target table into memory and queries the in-memory dataset with query criteria

Call DB Stored Procedure

Calling a database stored procedure

Transform

Field Selection

Select the fields you want, filter out the fields you want, or do database fields

Filter Records

Classify records according to criteria

Sort records

Sort the data according to a condition

Empty operation

No action

Add constant

Increase the required constant fields

Scripting

Modified Java Script Value

Extend functionality, write JavaScript scripts, and process data accordingly

Mapping

Mapping (sub-transformations)

Data mapping

Job

Sat Variables

Setting environment variables

Get Variables

Get Environment variables

4.3 Job Task Description

The main tree menu lists the basic properties of a job that can be viewed through individual nodes.

DB connection: Displays the database connection in the current job, and the database connection for each job needs to be configured separately.

Job Entries: A list of links referenced in a job

         

The Job Entries menu lists the list of links that can be called in the job, which can be added by dragging the mouse.

Each link can be dragged through the mouse to add links to the main window.

And through the shift+ mouse drag, to achieve the link between links.

? Introduction of common links

Category

Link name

Function description

Job entries

START

Begin

DUMMY

End

Transformation

Referencing the transformation process

Job

Reference Job Process

Shell

Invoking the shell script

Sql

Execute SQL statement

Ftp

Download via FTP

Table exists

Checks if the target table exists, returns a Boolean value

File exists

Checks if a file exists, returns a Boolean value

Javascript

Executing JavaScript scripts

Create file

Create a file

Delete file

deleting files

Wait for file

Wait for the file, after the file appears, continue to the next link

File Compare

File comparison, returns a Boolean value

Wait for

Wait time, set a period of time, kettle process is waiting state

Zip file

Compress files as Zip packages

Transfer from: http://www.cnblogs.com/limengqiang/archive/2013/01/16/KettleApply1.html

ETL Learning Series 1--etl Tool installation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.