Kettle Basic Concept Learning

Source: Internet
Author: User

First, understand the development environment and production environment.

For example, after designing a process under Windows or Mac, execute the design file on the machine that was uploaded to the Linux cluster. Then, the work done under Windows is the development environment, and the task is executed on the Linxu machine as a production environment.

Two, kettle conversion

The transformation consists of one or more steps, which are connected by jumping (hop). Jump defines a one-way channel that allows data to flow from one step to another. in kettle, the unit of data is the row, and the data flow is the movement of the data row from one step to another.

Step: Is the basic part of the conversion, appearing as an icon. such as (table input, text file output). Steps to write the data to one or more of the output hops connected to it, and then to the other end of the jump. This indicates that the jump is a line with arrows between the steps, in fact, two steps between, called rowset (rowset), the data row cache. (The size of the rowset can be defined in the conversion)

The data sending of a step can be set to send in turn and copy send; Send data rows sequentially to each output hop; Copy send: All data rows are sent to all output hops. (SHIFT + left mouse button to quickly create a new hop)

In kettle, all the steps are executed in a concurrent manner, and when the conversion is started, all the steps are started at the same time, the data is read from their input hops, and the processed data is written to the output hop until the input hop is no longer in the data, the step is aborted. When all the steps are aborted, the entire conversion is aborted. Data rows: A data row is a collection of 0 to more fields.

Three, kettle work

A job consists of one or more job items, and the job items are executed in some order.

Job item: Similar to the steps in the transformation, the job item is also graphically displayed as an icon. The result object can be passed between the job items. The result object contains rows of data that are not passed in a stream. Instead, wait for a job item to finish executing before passing it to the next job item. By default, all job items are executed serially.

Job jumps: The connection between jobs is called job hopping. The different running results of each job item in the job determine the different execution paths of the job. The operation results of the job item are judged as follows:

1, unconditional execution: The next job item executes regardless of whether the previous job item was executed successfully or not. Logo, black wire, with a lock icon on it

2, when the run result is true: marked as, green wire, with a hook number

3, when the result of the run is false: marked as, red line, there is a red stop icon

Kettle uses a backtracking algorithm to execute all job items. That is, when executing a node of a path in a job, all the child paths of that node are executed sequentially until there are no more sub-paths that can be executed, then the previous node of that node is returned, and the process is repeated.

Note: The jump defined in the job is the control flow, and the jump defined in the transformation is the data stream.

Four, kettle tools

Spoon: Graphical interface tools to quickly design and maintain complex ETL workflows.

Kitchen: command-line tool to run a job

Pan: command-line tools to run transformations

Carte: A lightweight Web server that can be used to perform transformations or jobs remotely

Five, version naming rules

GA (general availability) releases: Stable release version

Release candidates: Candidate versions such as, ...-rcxx

Milestone releases: The latest milestone version, there will be some new features such as, ...-mxx

Nightly builds: Build version, latest version, and most unstable version of the day

Summary: Spoon is the integrated development environment of kettle, that is to say, in spoon design good job or conversion. Jobs and transformations can be performed in the graphical interface, but this is only during the development, testing, and debugging phases. Once the development is complete, it needs to be deployed to the actual running environment, which is rarely used during the deployment phase of spoon.

Kettle Basic Concept Learning

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.