Java Learning: Design and implementation of incremental interfaces __java

Source: Internet
Author: User
Java Learning: Design and implementation of incremental interfaces

Introduction

In the Java program development process, we always encounter this scenario: a system needs to synchronize the data of our system to do some business logic, when the amount of data is small, can provide a full quantity, but when the volume of data is very large, the full amount of supply is very cumbersome, not only time-consuming and do a lot of hard work, At this point we need a mechanism to provide incremental data that tells only the data that the other person is changing. The provision of incremental data can be broadly divided into two ways: MQ and interface provision, MQ advantage is timely, the disadvantage is lost, repeat, backtracking complex and so on issues (depending on the specific MQ implementation), there are not too much to repeat; the interface provides not only RPC or HTTP, but the advantages and disadvantages of the interface are exactly the same as the MQ reverse, Timeliness depends on the call cycle.

Interface design

Only one version parameter is required, other parameters are added according to the actual business scenario, and version is added to the return value, and the caller uses version in the return value for the next call.

Interface uses

String lastversion = getlastversion ()//Get last version number
try {
while (true) {
list<data> datas = Syncdata (lastversion);/remote or local
if (Datas.isempty ()) {
Break
}
for (Data Data:datas) {
1. Do some logical processing///.......
2. Temporary storage
Version lastversion = Data.getversion ();
}
}
finally {
Saveversion (lastversion);
Save version number
}
1234567891011121314151617

The above code needs to be placed in a scheduled scheduling module, the shorter the cycle, the lower the data delay. If a problem occurs due to a failure or bug, simply adjust the version number forward, and backtrack is simple.

Interface implementation

Implementation to consider the following aspects, memory footprint, version design, data deletion.

Memory footprint

The incremental interface is likely to be called frequently by other systems, especially if there is a very core data in our system, so there is a control over the amount of data returned per call, such as only 1000 at a time, followed by 1000 for example. I recommend that this amount of data be controlled on the data provider, not the caller, even if the caller can control it, the provider has to do a maximum limit.

Version design

Suppose our data is similar to the following table:

ID Update_time
2 2017-03-09 23:59:59
68 2017-03-09 23:59:59
26 2017-03-09 23:59:59
71 2017-03-09 23:59:59
17 2017-03-09 23:59:59
14 2017-03-09 23:59:59
11 2017-03-09 23:59:59
8 2017-03-09 23:59:59
5 2017-03-09 23:59:59
65 2017-03-09 23:59:59
66 2017-03-09 23:59:59

Version design a lot of people think about data update times the first time, SQL may be like this:

SELECT * from Datawhere update_time > #{version}order by update_time ASC LIMIT 1000;123

Data is lost when many data update_time are the same. For example, the last number of times returned to Id=71,version is 2017-03-09 23:59:59, then this query will ignore the back update_time=2017-03-09 23:59:59 data. Then someone might think the following way:

SELECT * from Datawhere update_time >= #{version}order by Update_time ASC LIMIT 1000;123

Update_time Adds a =, this will not lose the data, but will return the duplicate data, even dead loops. For example, the last batch of returns is Id=71,version is 2017-03-09 23:59:59,id=71 after 10,000 update_time=2017-03-09 23:59:59 data, the interface returned 1000 each time, At this point the caller will never jump out of this batch of data. In view of the above, it is clear that version only uses the data update time is not enough, then can add other auxiliary items, such as the ID. For example, the last batch of returns is id=71, the version format is this: 2017-03-09 23:59:59@71,sql into the following, two-step query:

The first step: Query Update_time = ' 2017-03-09 23:59:59 ' and the ID > 71 data;
SELECT * from Datawhere update_time = ' 2017-03-09 23:59:59 ' and IDs > 71ORDER by update_time ASC, ID ASC LIMIT 1000;123
Step two: Query Update_time > ' 2017-03-09 23:59:59 ' data;
SELECT * from Datawhere update_time > ' 2017-03-09 23:59:59 ' ORDER by update_time ASC, ID ASC LIMIT?; 123

Here are some details to control, if the first step returns the amount of data has reached 1000, you do not need to perform a second step, if less than 1000, you need to perform a second step, the amount of data should be based on the amount of data returned in the first step. Finally, version format: Update time milliseconds @ Data ID, above to facilitate the description, the direct use of the format after the time.

But the above synchronization method based on the data update time may have problems in the concurrent write scenario, such as a data update in 2017-03-09 23:59:59, but the transaction was submitted in 2017-03-10 00:00:01, exactly in 2017-03-09 23:59:59 There is a synchronization that is not synchronized with this data because the transaction has not yet been committed, and the next synchronization will not synchronize this data because the time (2017-03-09 23:59:59) is very likely to have passed. To solve this problem is also relatively simple, we can update the data at the same time, record a data log, and have a thread to periodically clean up the overdue duplicate data, and finally our version number is the log table's self-added primary key ID.

Data deletion

The acquisition of incremental data is dependent on the update time, which has an implicit premise that the data exists and that if the data is actually deleted, it will not be able to obtain a change to the data. Therefore, the use of the interface to provide incremental data can not really delete data, but to fake delete (add a state, means valid or invalid), which is a disadvantage. (Original address: Click to read the full text jump view)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.