Kettle uses timestamp to implement incremental synchronization of databases (I)

Source: Internet
Author: User

The main idea of this experiment is that when creating a database table,

By adding an additional field, that is, the timestamp field,

For example, when synchronizing table tt1 and table tt2,

By checking that the table is updated, the table is used as the new table, and the old table is updated by the data in the new table.

The experiment data is as follows:

Mysql database 5.1

Test. tt1 (id int primary key, name varchar (50 ));

Mysql. tt2 (id int primary key, name varchar (50 ));

 

Snapshot table, which can be stored in the test database,

You can also create a temporary table for ease of use.

 

Data kettle-1

Kettle-1

========================================================== ================================

 

Main Process kettle-2

Kettle-2

 

In prepare, add the timestamp field to the tt1 and tt2 tables,

Because the databases where tt1 and tt2 are located are different, create two database connections respectively.

Prepare

 

Kettle-3

 

After the job is executed, the following fields are displayed during database query:

 

Kettle-4

Then, let's perform an insert operation and an update operation on the tt1 table ~

Kettle-5

In the original table, whether it is an insert or update operation, the corresponding updateTime will change.

If the updateTime field in Table tt1 and table tt2 is the latest time, it indicates that the table is a new table.

 

As long as it corresponds to main_thread:

Kettle-6

 

Here we will introduce the Main hierarchy:

Main

START

Main. prepare

Main. main_thread

{

START

Main_thread.create_tempTable

Main_thread.insert_tempTable

Main_thread.tt1_tt2_syn

SUCCESS

}

Main. finish

SUCCESS

 

The process in main_thread is as follows:

As a local whole, it performs a loop every S,

In this case, if the specified table tt1 or tt2 is updated or inserted,

The updateTime field in the table will be captured and synchronized.

If no update appears, the switch's default path corresponds to write to log.

Continue the loop.

 

Create a snapshot table, and insert the maximum (latest) timestamp value in Table tt1 and table tt2 to the snapshot table.

Then, a transformation is used to judge which table has the latest updateTime value,

To select whether the tt1 table is used to update the tt2 or tt2 table to update the tt1 table;

 

Main_thread.create_tempTable.JOB:

 

 

Main_thread.insert_tempTable.Job:

PS: For the second SQL statement, it should be changed (errors will occur if it is not modified)

Set @ var1 = (select MAX (updatetime) from tt2 );

Insert into test. temp values (2, @ var1 );

Because conn corresponds to a connection to mysql (database instance name ),

However, we saved both the snapshot table and tt1 table to test (database instance name.

 

In the preceding figure, the corresponding statement is to insert two row record tuples In the temp table.

The temp. lastTime field corresponding to the tuples with id 1 is the latest updateTime value selected from the tt1 table,

The temp. lastTime field corresponding to the tuples whose id is 2 is the latest updateTime value selected from the tt2 table.

Of course, id is used to provide a reference for subsequent switch operations. It is used to indicate whether the latest updateTime is from tt1 or tt2,

You can also use the tableName varchar (50) field to store the database corresponding to the latest updateTime. The name of the data table is also acceptable.

 

Main_thread.tt1_tt2_syn.Transformation:

First, create a connection to the temp table of the test database,

Select the record with the latest lastTime value in the temp table

The corresponding id number.

First, sort the lastTime field in temp in descending order,

Select the id and set the selected record to only one row.

Then, select a switch based on the value of id.

Here LZ is very keen to use, SQL Executor,

However, it cannot return the corresponding id value.

However, the corresponding id value can be returned for table input,

And received by the switch.

 

 

When the corresponding switch id = 1: tt1 update tt2

Pay attention to the selection of new and old data sources for row comparison.

And Target table Selection in Insert/Update

 

When the corresponding switch id is 2: tt2 updates tt1

Pay attention to the selection of new and old data sources for row comparison.

And Target table Selection in Insert/Update

 

 

 

However, adding a column will waste a lot of space,

Therefore, after the synchronization ends, use the finish operation step to delete the updateTime field.

This corresponds to the prepare operation in Main.

Main. finish

 

 

 

 

In this case, the experiment environment has been set up,

Next, we will test the experiment data and write it to the next blog.

Of course, triggers are also a good method for synchronization. Please write them to your blog later ~

 

Compared with triggers, the timestamp method is simple and common,

However, the timestamp field in the database table is very free and cannot be deleted,

That is to say, when a row of records is deleted in a table, the table should be used as a new table to update other tables. However, the deletion timestamp cannot be recorded because there is no attachment.

 

 

 

 

 

 

 

 

 

 

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.