Introduction to kettle
Kettle is an ETL (Extract, transform and load extraction, conversion, and loading) tool. It is frequently used in data warehouse projects. Kettle can also be used in the following scenarios:
Integrate data between different applications or databases
Export data from the database to a text file
Load large volumes of data into the database
Data cleansing
Integration of application-related projects is a use
Kettle i
remote debugging monitoring port number in eclipse.
1. Enter the directory extracted from pdi-ce-4.0.0-stable.zip (the ETL tool -- kettle plug-in development (basic) in the previous article, edit the startup configuration file spoon. bat, and spoon. Sh in Linux.
Add the following sentence to the file:
Set opt =-xdebug-xnoagent-djava. compiler = none-
The ETL Tool kettle, after the design of the old version, encountered a memory overflow error when using the new version: javaheap or OutOfMemory, which is insufficient memory allocated by kettle. Open Spoon in the text editor in the kettle running path. bat, find: REM *************************************** ******
The ETL Tool kettle, after the design of the old
The data warehouse has encountered several problems recently, which are summarized as follows.
1. Migration from mysql to oracle. This is a complicated problem, because we have no plans to invest in an ETL tool such as datastage, so at first I decided to write my own code and import mysql Data into the text, use sqlldr to import data to oracle. This process is not very complicated, but it is annoying because I encountered the following problems:
1.1 N
Original link1, Introduction kettle kitchen and spanThe first two articles mainly about the transformation of the kettle Spoon and the GUI design of the operation and the operation, also give the demo, then in fact, our application mode may require the server to run as a background process of the ETL task, Just as we traditionally use Windows services to process data, how do we do it with kettle? This will
Kattle sends post request and kattlepost request
I. IntroductionKattle is an open-source ETL Tool written in java. It can be run on Windows, Linux, and Unix, and data extraction is efficient and stable. It allows you to manage data from different databases and describe what we need to do by providing a graphical user environment. The pdi-ce-5.4.0.1-130 is used here. For http://community.pentaho.com/projects/data-integration.
Ii. Example
1. Requirement
Label:Problem Description: 1:oracle database connection suddenly becomes very slow when landing; Sqldeveloper link database is slow; 2:kettle-spoon ETL Program Access database, task execution times: Database connection IO error: Socket Time out error. Solve: 1:lsnrctl status uses commands to view the status of Oracle listening, and after the command executes, the results are displayed for a long time (norma
Recently, due to the needs of the project, kettle was initially involved. Now I will sort out my experiences on using kettle to develop a job over the past two weeks and share it with you.
I. What is kettle?
Kettle is an ETL tool that is mainly used to manage data from different data sources and stream data in a certain way. It is the most commonly used scenario and data transmission between different systems, you can use kettle to create a con
? Will ETL tools be used?
What is ETL?
|
From Baidu
FunctionETL extracts data from distributed and heterogeneous data sources, such as relational data and flat data files, to a temporary middle layer for cleaning, conversion, and integration. Finally, it loads the data to a data warehouse or a data set, it is the basis for Online Analytical Processing and data mining.
There are 3 common resource libraries in Kettle: Database repository, file repository, Pentaho resource library.The file repository is defined as a repository in a file directory, because Kettle uses a virtual file system (Apache VFS), so the file directory here is a broad concept that includes zip files, Web services, FTP services.The Pentaho repository is a plugin (available in the Kettle Enterprise Edition) and is actually a content management system (CMS) that has all the features of an idea
Environment:
Kettle: Kettle-Spoon version stable release-4.3.0
MySQL: MySQL Server 5.5.
Database connection information:
Test the database connection.
Error connecting database [MySql-1]: org. pentaho. Di. Core. Exception. kettledatabaseexception:
Erroccured while trying to connect to the database
Predictionwhile loading class
Org. gjt. Mm. MySQL. Driver
Org. pentaho. Di. Core. Exception. kettledatabaseexception:
Erroccured while trying to connec
stick to mainstream browsers.
Online Preview2. Browsera
Browsera provides automated compatibility testing. It automatically highlights the differences in your design in a browser, simplifying the test process. It also detects JavaScript errors, and the commercial version can test the subscribed web pages or logs on the wall. It can also test dynamic pages.
Online Preview3. Browserling
Browserling is a multi-browser testing tool for online websites. It integrates mainstream browsers to help we
Tags: ETL kettle variable parameters Kettle parameters and variables In versions earlier than kettle 3.2, only variable and argument are available. Kettle 3.2 introduces the parameter concept. variable is environment variables (environment or global variable ), even different conversions have the same value, while argument (location parameter) and parameter (name parameter) can be mapped to a local variable, only for a specific conversion, for exa
Syn Good son source: Http://www.cnblogs.com/cssdongl Welcome ReprintRecently wrote the Hadoop MapReduce program summed up, found that a lot of logic is basically the same, and then thought can use ETL tools to configure related logic to implement the MapReduce code automatically generated and executed, This simplifies both the existing and later parts of the work. The Pentaho kettle, which is easy to get started with, and has been tested for the more
When doing ETL, connect MySQL to read the table containing timestamp type, the following error occurred:By Google, it is said to be the problem of MySQL itself. The workaround is also simple, in the Spoon database connection, open the option to add a single line of command parameters:zeroDateTimeBehavior=convertToNull:Problem solving.Turn from:"Pentaho Spoon (Ket
executed serially.Job jumps: The connection between jobs is called job hopping. The different running results of each job item in the job determine the different execution paths of the job. The operation results of the job item are judged as follows:1, unconditional execution: The next job item executes regardless of whether the previous job item was executed successfully or not. Logo, black wire, with a lock icon on it2, when the run result is true: marked as, green wire, with a hook number3,
Business IntelligenceSystem feasibility analysis report:PentahoTechnical Overview 1. Comparison of business intelligence systems: Download(48.72 KB) Bi comparison Ii. pentahoCommunityTechnology Overview 2.1 resource addressAll Kit Download: http://sourceforge.net/projects/pentaho/2.2 Kettle ETL Solution: DataIntegration, suitable for ETL work in various scenarios. It includes several parts:
KETTLE _ memory overflow error and kettle overflow error
Original Works are from the blog of "Deep Blue blog". You are welcome to reprint them. Please note the following source when reprinting them. Otherwise, you will be held legally liable for copyright.
Deep Blue blog: http://blog.csdn.net/huangyanlong/article/details/42453831
Kettle memory overflow error Solution
Environment:Source database: oracle 10G R2Target Database: oracle 11G R2Kettle version: 5.0.1-stable
Error:An error is reported w
In the previous section, we crawled nearly 70 thousand pieces of second-hand house data using crawler tools. This section pre-processes the data, that is, the so-called ETL (extract-transform-load)
I. Necessity of ETL tools
Data cleansing is a prerequisite for data analysis. No matter how high the algorithm is, when an error data is encountered, an exception is thrown out, and it is absolutely dead. Howeve
Original works, from the "Blue Blog" blog, Welcome to reprint, please be sure to indicate the following sources, otherwise, the legal responsibility to pursue copyright.Deep Blue Blog:http://blog.csdn.net/huangyanlong/article/details/42453831Kettle Memory Overflow Error ResolutionEnvironment:Source-Side database: Oracle 10G R2Target-side database: Oracle 11G R2Kettle Version: 5.0.1-stableError:When extracting large data scale, error, log information as follows:2015/01/05 11:27:42-
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.