Recently, due to the needs of the project, kettle was initially involved. Now I will sort out my experiences on using kettle to develop a job over the past two weeks and share it with you.
I. What is kettle?
Kettle is an ETL tool that is mainly used to manage data from different data sources and stream data in a certain way. It is the most commonly used scenario and data transmission between different systems, you can use kettle to create a conversion job. Currently, it is written in pure Java, so it has the best compatibility with Java.
Kettle consists of four parts: Spoon, pan, kitchen, and chef. This summary mainly involves spoon and kitchen, which are widely used. Among them: spoon is the core graphical processing interface. It completes the conversion of a series of data streams by dragging components and configuring components. Currently, kitchen mainly creates BAT files to batch process jobs in some columns, for example, scheduled tasks in windows.
Ii. Kettle script files
1. Transformation: complete basic data conversion.
2. Job: controls the entire workflow.
Iii. resource library configuration (based on version 4.4.0)
The resource library is mainly used to store the conversion and job written on the kettle tool.
There are two types of resource libraries:
Kettle database repository
Kettle file reposity
One is the data resource library: converts and jobs are stored in the corresponding tables in the data resource library. When you configure these tables, an SQL statement for creating tables appears, execute these SQL statements to create a table. Most of the data resources are created.
Another is the file resource library: the conversion and job written are stored in the file, which is not widely used.
The following describes the configuration of the MySQL resource database (the Oracle configuration is relatively simple and the configuration steps are basically the same, and the corresponding resource library table creation is a bug in MySQL)
1. click the button to go to the resource library configuration page.
2. Select the first database resource configuration and click OK. On the displayed page, select create resource database.
3. Configure the database as follows: Kettle itself comes with a jar package without data, so you need to manually place the jar package in the directory of the kettle Installation File (D: \ tools \ kettle \ data-integration \ Lib), and click test to test whether the connection is successful.
4. If the database connection is successful, click OK to create a database resource table.
5. Click Create or update here. A bounce box is displayed. The box contains the SQL statements used to create tables. We will not execute them here. copy and paste these SQL statements to the database processing tool, run directly in the database (MySQL statements used to create a table in kettle will first report an error, but it will not run directly in the database. In addition, we have found that this situation exists in MySQL, oracle does not)
6. log on to the resource database. The default username and password are admin and Admin.
7. Now the resource database configuration is complete.