Test the canal's ability to accumulate data, and if the canal can accumulate data, it does not require Message Queuing to accumulate.
Results found
1. If the data is not confirmed, it will be obtained from the first time after each connection, but it is inconvenient because the canal cannot get the data according to the time.
2. Since the canal maintains a connected ClientID, the data that has been connected will always fetch the next next time, which is continuous data. As long as you do not confirm the data, you will get the complete data when you re-add a connection.
3. In addition, Canal uses File-instance.xml by default, so even if the canal is kiil-9 and restarted, the data will still exist.
4. According to the above points, if you are happy, you can consider recording BatchId and time, you can accumulate a lot of days of data, and then according to the point of time to confirm the data.
5. Today, we have heard the real-time price search framework for the Art Dragon tour of the Global Developers Conference. They do a full index every 1 hours, with incremental indexes every 1 minutes, with real-time indexing per second (they're pulled from the database, or using Ali's Canal or other similar software).
6. So based on the above, if the canal does not hang up after accumulating large amounts of data, perhaps you can consider writing it yourself.
7. The experiment found that when the canal was synchronizing the data, about 100,000 of the accumulated data was slower, and more than 200,000 was very slow. When the data is inserted synchronously, as long as 10,000 will lead to suspended animation, it is necessary to confirm the data before continuing to obtain synchronization data.
8. In addition, due to data accumulation and the canal does not provide access to data through time points, it will get all the data once reconnected, which will result in very slow and very slow access to data, or even death. (or you can expand the source code to achieve the point of time, but the canal accumulation data ability really not)
9. The final result confirms that this approach is unrealistic in the canal. Whether the canal's memory is 1G or 4G, the result is the same. Therefore, it is necessary to use Message Queuing for data accumulation, such as Kafka and ROCKETMQ Message Queuing.
10. The large companies have verified the data accumulation capabilities of Kafka and ROCKETMQ, but I will verify it once again about our use.