System platform: Windows system, other operating systems please refer to other information.The scheduled Task Scheduler for Kettle is not stable, and you must turn on kettle to implement timed jobs through the Windows Task Scheduler calling Kettle Kitchen.bat.Online to find some kitchen.bat parameters, but also smattering, no in-depth study.Kitchen.bat back can b
First, understand the development environment and production environment.For example, after designing a process under Windows or Mac, execute the design file on the machine that was uploaded to the Linux cluster. Then, the work done under Windows is the development environment, and the task is executed on the Linxu machine as a production environment.Two, kettle conversionThe transformation consists of one or more steps, which are connected by jumping
The most recent synchronization of a system's data to another system requires that the data results of the new system be completed and synchronized to the other system datasheet in real time.
That is, dynamically passing an associated ID. Because the old system is made of VB, can not provide WebService interface, and the synchronized table involves more than 10 tables, and the two system table structure is completely different, so think of the kettle.
1, Ali Open source software: datax
Datax is a heterogeneous data source offline Synchronization tool that is dedicated to achieving stable and efficient data synchronization between heterogeneous data sources including relational databases (MySQL, Oracle, etc.), HDFS, Hive, ODPS, HBase, FTP, and more. (Excerpt from Wikipedia)
2. Apache Open source software: Sqoop
Sqoop (pronunciation: skup) is an open source tool that is used primarily in Hadoop (Hive) and traditional databases (MySQL, PostgreSQ
kettle parameters, variables detailed explanation
In previous versions of Kettle 3.2, only variable and Argument,kettle 3.2 were introduced, and variable was environment variables (environment variable or global variable), even if different They also have the same value, whereas argument (positional parameters) and parameter (named arguments) can be mapped to loc
Zookeeper
Use kettle to convert an XML document into a data table structure
Read and parse the XML file in kettle's get data from XML step and XML input stream (Stax) step. The get data from XML step is parsed using Dom, which consumes a lot of memory and is not desirable when the file is large. The XML input stream (Stax) Step parses large and complex files in different ways and can quickly load data. Therefore, we recommend that you use this step.
If an error occurs during kettle execution, kettle stops running. In some cases, you do not want kettle to stop running. In this case, you can use step error handling ). Error Handling allows you to configure a step to stop running a conversion when an error occurs, and the record with the error will be passed to another step. In the step error handling Settings
Use kettle to insert the text file content into the mysql table under the Linux Virtual Machine. kettlemysql
I. decompress the kettle package
1. Copy the package to Linux.
Mysql driver package
2. decompress the zip package
Enter the command: unzip/software/pdi-ce-7.0.0.0-25.zip
You can delete the original package.
Enter the command: rm-f pdi-ce-7.0.0.0-25.zip
2. Create databases and tables
3. inse
Implement data verification and check in kettle
In ETL projects, input data usually cannot be consistent. There are some steps in kettle for data verification or check. The verification steps can verify the licensed fields based on some calculations; the filtering steps implement data filtering; and The javascript steps implement more complex calculations.
Generally, it is useful to view the data in a cer
In kettle, the repository of the storage of resources such as transformations or jobs is called a resource library: it is divided into a file resource library and a database repository.A transformation or job can belong to a repository or a separate file form exists.I. Database Resource Library1.1 Creating a database as a database repository in MySQL1.2 Create DATABASE Repository Tool--"Resource Library--" Connect Resource Library--"Click the Plus--"
The following two articles explain how to use $ and? In kettle, what do we do when we can't meet our needs?Dynamic SQL Queries in PDI a.k.a. KettleImplementing dynamic SQL queries in kettleOnly support single placeholder, if you want to have more than one parameter to pass, we want to use the toolI'm using the first one, the internal structure is as followsSee, when using the Multiway Merge join must remember to use the previous sort control, here als
Kettle uses javascript steps and fireToDB functions to implement custom database queries. Suppose you need to perform non-traditional database queries. To discuss this situation, we assume that you need to read the regular expressions in the database, then, check the number of matching expressions for each field in the input line. Execute database query in javascript steps in Objective C
Kettle uses javascr
not kill people, but looked quite uncomfortable, of course, welcome to provide a good plan) ,
Here, we take kettle as an example to analyze how to solve this problem (put aside the performance first, kettle is really a good thing)
1), the main flow is probably the following
Here, let's take a look at the contents of the first component (get multiple table names), and we'll take a look at the contents of
Each component in the kettle transformation is a parallel relationship, and the job is sequential, so you may encounter a situation where I want to perform the following steps after a step is completed. Then you can use the "blocking data" and "blocking data until the completion" of two components;"Blocking data": This component only allows the last data of the previous step to pass, which is often used with the "Execute SQL Script" component;"Block d
1 Introduction erroccuredwhiletryingtoconnecttothedatabase Errors often occur during kettle development. However, when you carefully observe the logs, the causes of these errors are different. This error seems simple, but sometimes the simpler the error, the less patient it is to be changed. Especially when you are busy, you may accidentally enter a parameter
1 Introduction Error occured while trying to connect to the database often occurs during
1. First go to the official website to download the installation package, this installation package is common on all platforms.2.kettle is developed in the Java language, so you need to configure Java_home3. Unzip the Kettle installation package4. Configure environment variables, kettle_home, this directory is the directory where the kettle configuration files ar
Calling the kettle transform file in JavaIt can also be called through the command line, and then the command line code in Java can also be called. This does not integrate seamlessly with the Java code logic. This article explains that kettle5.1 is assumed to be seamlessly integrated through other APIs and Java code, and most of the information on the web is in the low version. cannot be executed in kettle5.x.1. What jar files are requiredIt is necess
Want to use kettle to achieve a very simple requirement, from MySQL to Oracle import a table data, if the table in Oracle does not exist, first table and then import data. This is a seemingly simple feature, but it may be a bit confusing for users who have just started to touch kettle. There is a "check table exists" step in kettle conversions and assignments, bu
Accessing kettle internal components using JavaScriptThere are few ETL project requirements that cannot be achieved using kettle standard steps. Let's say that every record needs to be marked as information from that database, and the original database is set through the DB connection, how do I get these settings? (Type, host, port, database name, etc.)There are no standard steps to implement in
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.