ETL (extract-transform-load abbreviation, that is, data extraction, transformation, loading process), for enterprise or industry applications, we often encounter a variety of data processing, conversion, migration, so understand and master the use of an ETL tool, essential, Here I introduce a I used in the work of 3 years of ETL tools
Kettle
Main content:
one. ETL Introduction
two. Kettle Introduction
three. Java Invoke Kettle API
first, the introduction of the ETL
1. What is ETL.
1). ETL is "Extract", "Transform",
Cost:Software costs include software products, pre-sales training, after-sales consulting, and technical support.Open-source products are free of charge and the cost is mainly training and consulting, so the cost will remain at a low
When data is extracted from the production environment to the warehouse, Chinese characters in the target database are garbled. My environment is from MySQL to MySQL. There are no heterogeneous databases at the moment, and the architecture is
The first time to write a blog, a little excitement in the heart, there must be a lot of improvement in the place, looking haihan.Kettle is my relatively more in-depth study of open source software, is one of my favorite open source software, it can complete the work of many manual labor, in the ETL data extraction has been widely used. I am not very familiar with the various controls used by kettle, but on
Kettle series tutorials 1. kettle series tutorialsIntroduction to kettle
Kettle is an ETL (Extract, Transform and Load extraction, conversion, and loading) tool. It is frequently used in data warehouse projects. kettle can also be
Multi-thread kettle conversion and multi-thread kettle Conversion
Kettle multi-thread Conversion
In ETL projects, performance considerations are usually the most important. In particular, the tasks discussed are frequently executed, or some columns of tasks must be executed within a fixed period of time. This article
As a data warehouse system, ETL is a key link. If it is big, ETL is a data integration solution. If it is small, it is a tool for data dumping. Recall that there have been a lot of data migration and transformation operations over the past few years. However, the work is basically a one-time job or a small amount of data. You can use access, DTS, or compile a small program on your own. However, in the data
The kettle plug-in function is similar to kettle existing definition Java class plug-in, custom Java class Plug-in is mainly to support writing Java code directly in kettle to implement custom special functions, and this control is mainly to transfer custom code to the jar package, This means that the implementation of the custom feature has been developed in the
ETL scheduling development (1) -- writing instructions, etl Scheduling
Preface:
During database operation and maintenance, files are often transferred between systems to perform operations such as data extraction, conversion, and integration. In addition, statistical scheduling is performed after data integration. Here, I will describe an ETL scheduling developed
ETL scheduling development (5) -- connect to the database to execute database command subroutines and etl Scheduling
In ETL scheduling, you need to connect to the database to read and write data. The following subprograms use the input database connection string and database commands (or SQL) to perform the required operations:
#!/usr/bin/bash#created by lubinsu
data from the total data source into the database tables in each subsidiary, at this time the subsidiaries in the development of the report only need to connect their own database tables, so that the control of data rights, but also better the data of the subsidiaries in the various subsidiaries of the database table.three,Project Construction Plan:1Kettle Introduction to the tools usedKettle is a foreign open source ETL tool, written in pure Java,
kettle and then import it to the database.
The uuid generated by default has a '-' interval, so it is null by replacing it with "Replace in string;
In the excel step, the naming parameters are used. Therefore, you must set the naming parameters in the conversion configuration.
The sample code is as follows:
Publicclass KettleUtil2 {public String RES_DIR = "res"; private String fullFileName; public KettleUtil2 (String fileName) {fullFileName = Syst
Kettle implements dynamic SQL query and kettle implements dynamic SQLDynamic SQL query in kettle
In ETL projects, some SQL statements, such as data query, are usually executed based on the input parameters at run time. This article describes the dynamic query and parameter query through the table input step in
the plug-in and defines the Display Effect of the steps on the kettle graphic workbench. For better understanding, I will use this step to design a conversion process and execute it.For plug-in development, we will start with the plugin. xml configuration file, then talk about metadata and dialog box classes, and finally talk about step classes and data classes.
========================================================== ============================
1. What is Kettle
Kettle is "kettle e.t.t.l. Envirnonment" initials only, which means it is designed to help you achieve your ETL needs: Extract, transform, load data; Kettle translated into Chinese name should be called Kettle,
to facilitate the administrator to troubleshoot the error. ETL is a key part of BI project, is also a long-term process , only to constantly identify problems and solve problems, to make ETL run more efficient, for BI project late development to provide accurate and efficient data. Postscript As a data warehouse system, ETL is the key link. Said Big,
The last two months have been dealing with Kettle , from the beginning did not hear, to now can skillfully use, have to say the project driven, learning things is the fastest. Well, although the task of using Kettle to cope with the project is more than enough, but still want to learn a system, summed up. For example , the job is relatively small,kettle cluster m
Tags: ETL kettle error Connect1 IntroductionThe error occured while trying to connect to the database error occurs when developing with kettle, but careful observation of the log causes the errors to vary. This error seems simple, but sometimes the simpler the error is not patient change, especially when busy, accidentally fill in the wrong parameter to cause the
(for example, kettle) = free ETL solution.
The two complement each other. I generally get used to it. complex business logic is implemented by programs, and scheduling and simple business logic are implemented by tools;Both tools and code have their own application scenarios. For example, tools have advantages in synchronization between heterogeneous data sources, but they are suit
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.