First, understand the development environment and production environment.
For example, after designing a process under Windows or Mac, execute the design file on the machine that was uploaded to the Linux cluster. Then, the work done under Windows is the development environment, and the task is executed on the Linxu machine as a production environment.
Two, kettle conversion
The transformation consists of one or more steps, which are connected by jumping (hop). Jump defines a one-way channel that allows data to flow from one step to another. in kettle, the unit of data is the row, and the data flow is the movement of the data row from one step to another.
Step: Is the basic part of the conversion, appearing as an icon. such as (table input, text file output). Steps to write the data to one or more of the output hops connected to it, and then to the other end of the jump. This indicates that the jump is a line with arrows between the steps, in fact, two steps between, called rowset (rowset), the data row cache. (The size of the rowset can be defined in the conversion)
The data sending of a step can be set to send in turn and copy send; Send data rows sequentially to each output hop; Copy send: All data rows are sent to all output hops. (SHIFT + left mouse button to quickly create a new hop)
In kettle, all the steps are executed in a concurrent manner, and when the conversion is started, all the steps are started at the same time, the data is read from their input hops, and the processed data is written to the output hop until the input hop is no longer in the data, the step is aborted. When all the steps are aborted, the entire conversion is aborted. Data rows: A data row is a collection of 0 to more fields.
Three, kettle work
A job consists of one or more job items, and the job items are executed in some order.
Job item: Similar to the steps in the transformation, the job item is also graphically displayed as an icon. The result object can be passed between the job items. The result object contains rows of data that are not passed in a stream. Instead, wait for a job item to finish executing before passing it to the next job item. By default, all job items are executed serially.
Job jumps: The connection between jobs is called job hopping. The different running results of each job item in the job determine the different execution paths of the job. The operation results of the job item are judged as follows:
1, unconditional execution: The next job item executes regardless of whether the previous job item was executed successfully or not. Logo, black wire, with a lock icon on it
2, when the run result is true: marked as, green wire, with a hook number
3, when the result of the run is false: marked as, red line, there is a red stop icon
Kettle uses a backtracking algorithm to execute all job items. That is, when executing a node of a path in a job, all the child paths of that node are executed sequentially until there are no more sub-paths that can be executed, then the previous node of that node is returned, and the process is repeated.
Note: The jump defined in the job is the control flow, and the jump defined in the transformation is the data stream.
Four, kettle tools
Spoon: Graphical interface tools to quickly design and maintain complex ETL workflows.
Kitchen: command-line tool to run a job
Pan: command-line tools to run transformations
Carte: A lightweight Web server that can be used to perform transformations or jobs remotely
Five, version naming rules
GA (general availability) releases: Stable release version
Release candidates: Candidate versions such as, ...-rcxx
Milestone releases: The latest milestone version, there will be some new features such as, ...-mxx
Nightly builds: Build version, latest version, and most unstable version of the day
Summary: Spoon is the integrated development environment of kettle, that is to say, in spoon design good job or conversion. Jobs and transformations can be performed in the graphical interface, but this is only during the development, testing, and debugging phases. Once the development is complete, it needs to be deployed to the actual running environment, which is rarely used during the deployment phase of spoon.
Kettle Basic Concept Learning