This article is a summary of the Flink fault tolerance
. Although there are some details that are not covered, the basic implementation points have been mentioned in this series.
Reviewing this series, each article involves at least one point of knowledge. Let's sum it up in a minute.
Recovery mechanism implementation
The objects in Flink that normally require state recovery are operator
as well function
. They are able to achieve state snapshots and state recovery in different ways. It implements the interface by implementing function
Checkpointed
an interface operator
StreamOpeator
. The behavior of the two interfaces is similar.
Of course for the data Source component ( SourceFunction
), in order to make Flink a complete failure recovery, external data providers are required to have the ability to re-consume data ( Apache Kafka
providing the message offset
mechanism to have the ability to flink kafka-connector
This is also used to achieve the failure recovery of the data source, the concrete implementation see FlinkKafkaConsumerBase
).
Checkpoint trigger mechanism
Checkpoints vary according to state, and are divided into:
- Pendingcheckpoint: The checkpoint being processed
- Completedcheckpoint: Completed checkpoint
PendingCheckpoint
Indicates that a checkpoint has been created, but has not yet been answered for all of the answers task
. Once all is task
answered, it will be converted into one CompletedCheckpoint
.
The trigger mechanism of a checkpoint is a periodic trigger based on time. The driver that triggers the checkpoint is JobManager
, and the performer of the checkpoint is TaskManager
.
Checkpoint trigger needs to meet many conditions, such as the need to all have a task
trigger checkpoint conditions and so on, the checkpoint can be triggered execution, if the Checkpoint timer task encountered in the execution of the last task is not completed, then the current scheduled task will be "queued", waiting for the last task to complete.
A coordination mechanism based on AKKA message-driven
The control center of the Flink runtime is that the trigger of the JobManager
checkpoint is initiated by the JobManager
Real checkpoint's performer TaskManager
. Flink JobManager
and the TaskManager
use of Akka for message communication. Therefore, the coordination mechanism of checkpoints is also based on Akka (driven by messages), Flink defines several different message objects to drive checkpoint execution, such as, and DeclineCheckpoint
TriggerCheckpoint
so on AcknowledgeCheckpoint
.
Zookeeper-based high availability
The Flink provides two recovery modes RecoverMode
:
STANDALONE
Represents an incorrect JobManager
failure to recover. Instead ZOOKEEPER
, JobManager
the HA (high availability) is implemented based on the zookeeper.
As a flink high-availability implementation mechanism, zookeeper is used to generate 原子的
单调递增
the checkpoint ID for & and to store completed checkpoints.
The checkpoint ID generator, together with the storage of the completed checkpoint, is called the Checkpoint recovery service .
Save Point
The so-called savepoint , in fact, is a user-triggered by a special checkpoint . It's essentially a checkpoint, but it's two points different than the checkpoint:
- User self-triggering
- Does not automatically expire when a new completed checkpoint is created
Save point is user-triggered, how to trigger it? This relies on the flink provided client
that the user can client
trigger a savepoint via (CLI). After the user triggers the save point operation, client
it akka
sends JobManager
a message and JobManager
then notifies each TaskManager
trigger checkpoint . A callback that executes when the checkpoint is triggered, TaskManager
JobManager
and tells the result of triggering the savepoint in the callback JobManager
(also by akka
sending a message to the client). SavePoint it does not automatically expire as new completed checkpoints are generated. Also, unlike checkpoints, the savepoint does not save the state as part of itself as a checkpoint. The savepoint does not store state, it only points to the state that the specific checkpoint belongs to by a pointer.
The storage point of the save. Flink supports the storage of two forms of SavePoint: memory
and filesystem
. Recommended for use in production environments filesystem
(such as HDFs can be used to provide durable assurance). Because memory
the savepoint-based storage mechanism is the memory in which the savepoint is stored JobManager
. Once the JobManager
outage occurs, the information on the savepoint will not be restored.
Status Terminal
The final states that are directly supported in Flink are:
- Valuestate: single-value status
- Liststate: Collection Status
- Foldingstate:
folding
status, forFoldFunction
- Reducingstate:
reducing
status, forReduceFunction
But eventually the state representation of the storage and recovery combined with the checkpoint mechanism is that KvState
it represents a common user-defined key-value pair state that can be simply seen as a container for the state that is ultimately supported. Instead KvStateSnapshot
KvState
, a snapshot of the state that is used to restore the state. StateHandle
to an operator
interface that provides an operational state, restores the state from the original representation of the storage medium to an object representation.
The state terminal is used to persist the state for storage, and Flink supports multiple state terminals:
- Memorystatebackend
- Fsstatebackend
- Rocksdbstatebackend (implemented by third-party developers)
Consistency assurance based on the barrier mechanism
The Flink offers two different consistency guarantees:
- Exactly_once: Happens once
- At_least_once: At least once
It EXACTLY_ONCE
supports usage scenarios where the accuracy of data processing is high, but sometimes generates noticeable delays. AT_LEAST_ONCE
It should be a scenario where low latency is required, but the accuracy of the data is not high.
It is important to note that the consistency guarantee here does not refer to the guaranteed flow of the handled element Stream Dataflow
, but rather the operator
final effect of the subsequent data on the state change after the last state change (combined with checkpoints).
Consistency guarantee is inseparable from Flink checkpoint barrier
.
A single data flow perspective, barrier
indicating:
Distributed multi- input channel
view, barrier
:
The figure shows a multi-barrier aligning (alignment), but only EXACTLY_ONCE
when consistency requires this
JobManager
Will indicate the source
launch barriers
. When a certain operator
one is received from its input CheckpointBarrier
, it will realize that it is currently in between the previous checkpoint and the last checkpoint. Once a certain operator
is received from all of it input channel
checkpoint barrier
. Then it will realize that the checkpoint has been completed. It can trigger a operator
special checkpoint behavior and barrier
broadcast that to downstream operator
.
In response to two different conformance guarantees, Flink provides two different CheckpointBarrierHandler
implementations whose corresponding relationships are:
- Barrierbuffer-exactly_once
- Barriertracker-at_least_once
BarrierBuffer
These blocks are released by blocking incoming traffic that has been received barrier
input channel
and cached for channel
subsequent inflows until all barrier
are received or not satisfied with a particular checkpoint, channel
and this mechanism is called--aligning (alignment). It is this mechanism to achieve EXACTLY_ONCE
consistency (it makes the data in the checkpoint very precise).
BarrierTrack
the implementation is much simpler, it simply tracks the data flow barrier
, but the elements in the data flow buffer
are directly released. This situation causes the same checkpoint to be pre-mixed with the elements of the subsequent checkpoint, which can only provide AT_LEAST_ONCE
consistency.
Complete Sample Checkpoint Process
Summary
This article is fault tolerance
the end of the Flink series, summarizing and combing the key concepts and processes.
Scan code Attention public number: Apache_flink
QQ Scan Code concern QQ Group: Apache Flink Learning Exchange Group (123414680)
Apache Flink Fault tolerance Source parsing end of article