ETL Tool Pentaho Kettle's transformation and job integration

Source: Internet
Author: User

ETL Tool Pentaho Kettle's transformation and job integration

 

1. Kettle

 

1.1. Introduction

Kettle is an open-source etl Tool written in pure java. It extracts data efficiently and stably (data migration tool ). Kettle has two types of script files: transformation and job. transformation completes basic data conversion, and job controls the entire workflow.

2. Integrated Development

 

2.1. transformation implementation Parsing
// Initialize the Kettle environment and load the configuration
KettleEnvironment.init();
// File path and file name
String filename=”/foo/bar/trans.ktr”;
// Parse the transformation File
TransMeta transmeta = new TransMeta(filename);
// Load transformation
Trans trans = new Trans(transmeta);
// Execute transformation in an independent thread. "null" can be replaced by a parameter set.
trans.execute(null);
// Wait until transformation execution is complete
trans.waitUntilFinished();
// Obtain the execution result
Result  result=trans.getResult();

 

2.2. job implementation Parsing
// Initialize the Kettle environment and load the configuration
KettleEnvironment.init();
// File path and file name
String filename=”/foo/bar/jobn.kjb”;
// Parse the job file without using the resource library
JobMeta jobmeta=new JobMeta(filename, null,null);
// Load the job
Job job=new Job(null, jobmeta);
// Execute a job in an independent thread
job.start();
// Wait until job execution is complete
job.waitUntilFinished();
// Obtain the execution result
Result result=job.getResult();

 

2.3. resource library-based integration
// Initialize the Kettle environment and load the configuration
KettleEnvironment.init();
// Initialize the resource library type plug-in
PluginRegistry.init();
// Instantiate the resource library object
Repository repository=null;
RepositoriesMeta repositoriesMeta = new RepositoriesMeta();
// Read the resource library
repositoriesMeta.readData();
// Traverse the resource library
/*
for ( int i = 0; i < repositoriesMeta.nrRepositories(); i++ ) {
RepositoryMeta rinfo = repositoriesMeta.getRepository( i );
System.out.println( "#"+ ( i + 1 ) + " : " + rinfo.getName() + " [" + rinfo.getDescription() + "]  id=" + rinfo.getId() );
}
*/
// Search for the resource library based on the resource library name "1.0"
RepositoryMeta repositoryMeta = repositoriesMeta.findRepository( "1.0" );
// Obtain the PluginRegistry instance
PluginRegistry registry = PluginRegistry.getInstance();
// Load the resource library
repository = registry.loadClass(RepositoryPluginType.class,repositoryMeta, Repository.class);
// Resource library Initialization
repository.init(repositoryMeta);
// Obtain the resource library path
RepositoryDirectoryInterface directory = repository.loadRepositoryDirectoryTree();
// JobMeta instantiation
JobMeta jobmeta = new JobMeta();
// Parse the job file of the resource library
jobmeta = repository.loadJob("job2", directory, null, null);
// Job instantiation
Job job = null;
// Load the resource library job
job = new Job(repository, jobmeta);
// Execute a job in an independent thread
job.start();
// Wait until job execution is complete
job.waitUntilFinished();
// Obtain the execution result
Result result=job.getResult();

 

2.4. Log Integration

Log output preparation:

// FileLoggingEventListener instantiation
FileLoggingEventListener fileLoggingEventListener=null;
// Append logs to the tran. log File. true indicates append, and false indicates no append.
fileLoggingEventListener=new FileLoggingEventListener( "tran.log", true );

Running result log output:

// Obtain the result set
Map
 
   map=result.getResultFiles();
 
// Traverse the running result and output the log file
for(String key:map.keySet()){
// Obtain the ResultFile object
ResultFile rf=map.get(key);
// Create a log channel
LogChannelInterface log =NewLogChannel ("running result ");
// Output logs to log files
log.logBasic(rf.getFile().getName().toString());

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.