product |
model |
API |
warranty |
fault tolerant mechanism |
State Management |
Delay |
Throughput |
Maturity |
Strom |
Native |
Combined |
At-least-once |
Record ACKs |
No |
Very Low |
Low |
High |
Trident |
Mirco-batching |
Combined |
Exectly-once |
Record ACKs |
Operation-based state management |
Low |
Low |
High |
Spark Streaming |
Mirco-batching |
Declarative |
Exectly-once |
RDD Checkpoint |
Ddstream-based state management |
Low |
Low |
High |
Flink |
Native |
Combined |
Exectly-once |
Checkpoint |
Operation-based state management |
Low |
High |
Low |
1. Model Streaming ModelsNAITVE: Data into the immediate processing, Micro-batch: Data inflow, first divided into Micro-batch, and then processing;
2. API formModular: Operation more basic API operation, step by step fine control, each set up a combination of definitions assembling topology; declarative: provides high-order functions after encapsulation. The package can provide preliminary optimization, and can provide advanced operation such as window management and state management.
3. Guarantee MechanismAt-least-once, at least once, error cases need to be executed multiple times, exectly-once, one execution, ensure OK;
4. Fault-tolerant mechanismRecord ACKs, after each tuple is processed by ACK confirmation; Rdd Checkpoint, Checkpoint based on Rdd. Only need to recalculate specific rdd; Checkpoint:flink's Checkpoint, is a snapshot (to be added in detail)
5. State ManagementOperation-based state management: each operation has a state, data-based state management: Each data has a corresponding processing state;
6. Delay & Throughput
How to make a test plan. 7. Maturity Level
Product maturity, based on the Flink has been distributed in 2016.3 1.0 versions, how to define the maturity level. 8. PostScript
Twitter later developed the Heron, whether to join the alignment camp.
There's a good translation document: Http://developer.51cto.com/art/201603/507444.htm