implementation of State
Flink through the asynchronous checkpoint mechanism to realize the fault tolerance in the streaming process, the simple point is to serialize the local state to a persistent storage, when the error is to restore the state of the checkpoint to achieve fault tolerance, for a detailed description of the mechanism can see this link, This chapter mainly describes the implementation of the State in the Flink source code. Statebackend
Flink the operations in our code into a set of tasks to be executed in TaskManager. Each of these tasks is a thread, and each task contains a Abstractinvokable object, and the primary logic in the task is to call the Abstractinvokable.invoke () method. The corresponding implementations in streaming are inherited from Streamtask. The streamtask contains a operatorchain and specifies a number of hook functions to define the life cycle. The abstractstatebackend is initialized here. The implementation of Flink provides 3 state backend:memorystatebackend,fsstatebackend, and Rocksdbstatebackend. The memorystatebackend is mainly used in debugging development, and the latter 2 is suitable for use in production environments. All three implementations inherit from the Abstractstatebackend class. All operator in Operatorchain are initialized during streamtask initialization, and Abstractkeyedbackend is initialized in this process, and there is only one in the Streamtask, This is also common sense, because the operator of multiple key by operations must be in different threads. Abstractstatebackend
The definition of abstractstatebackend is simple, requiring subclasses to implement three interfaces: Createstreamfactory: Creating checkpointstreamfactory for a operator of a job, In fact, only fsstatebackend implements this interface, Rocksdbstatebackend implementations need to pass in a abstractstatebackend, typically fsstatebackend Createkeyedstatebackend: Create a keyed state backend to manage keyed state createoperatorstatebackend: Create a operatestatebackend, Abstractstatebackend provides an implementation, that is, a map,key in memory is the state name, value is the list state, and the reason for the list state is to see the state type. Fsstatebackend
Fsstatebackend will store the state in a persisted storage, such as HDFs, when checkpoint. For keyed state,fsstatebackend It is simply placed in memory, so for larger state,fsstatebackend it is possible to cause a more severe GC. And the process of snapshot is a synchronous process, which means that the process of serializing State and writing to the file system is a synchronous process, and an oversized state can also cause blocking. Rocksdbstatebackend
Unlike Fsstatebackend, Rocksdbstatebackend stores key state in Rocksdb. This approach has 2 benefits: first, the larger state does not cause the GC, and secondly, because ROCKSDB supports snapshot operations, the snapshot process is an asynchronous process that does not block. But there are several possible drawbacks to the state of the ROCKSDB implementation: first, the state's update and get operations will have a serialization and deserialization process, so the efficiency will be lower than directly in memory, and secondly, Rocksdb uses Lsm-tree as the storage structure. The compaction process requires a large number of read and write disks, so there is also the possibility of blocking, and one possible optimization for this problem is to use memory filesystem to put all the storage in RAM; Finally, ROCKSDB tuning is more complex, How to perform on a normal SATA hard drive also needs to be confirmed. using the state State Type
The state in Flink can be divided from 2 latitude: Whether it belongs to a key (key state or operator state) and whether it is managed by Flink (Raw state or managed state). The key state is used to save states in Keyedstream, and operater state is used to save states in normal non-key. Managed state refers to the status that is managed by Flink. The raw state is managed by the application itself, and Flink invokes the appropriate interface method to implement the status of Restore and snapshot.
Flink since 1.2.0 began to add a new feature: Dynamic scalable state. Its purpose is to recover from the last checkpoint or savepoint when the parallism of the flink operator changes. To achieve this, key state is organized according to Key group, which is a similar idea to pre-sharding, such as the key group has 128, then Flink will divide the key state into 128 parts to store, This way, as long as your processor's parallelism is less than 128, it is always part of the key group state. For operator State,flink, the state is organized by list so that it can be restored when processor's degree of parallelism changes. Managed Key State
Let's start by looking at how Flink handles key state. The first thing to note is that all key state in Flink is managed and is obtained through the GetState method in Runtimecontext. An exception is thrown if the Runtimecontext.getstate method is used in a normal stream that does not pass the key by. Earlier we talked about how different state backend store key state, which is not discussed here. In Flink, whenever a new data arrives, the system calls the Setcurrentkey method, so that when we visit the state, we can know which key the system corresponds to. Managed Operator State
To obtain the managed operator state, the user needs to implement the Checkpointedfunction interface and initialize state in the Initializestate method, where the list state is obtained. In the previous article we said that using operator state is stored in memory. Raw Operator State
For users who need to manage operator state themselves, the Listcheckpointed interface can be implemented, which requires the user to provide the state as a list. In the actual implementation, the state will still be put into the operatorstatebackend when snapshot. Legacy State
Before Flink 1.2.0, the user-defined state needs to implement the Checkpointed interface, because this interface cannot be partition, so this interface has been marked as deprecated. snapshot and restore of state Snapshot
In this section we describe how Flink stores state in persisted storage. With the concepts described above, the code for this part of snapshot is more intuitive. The main logic of the code in Streamtask.performcheckpoint this function, basically is to call each abstractstreamoperator snapshot function. It is important to note that although there is a lot of future to abstract different snapshot process, but basically only for Rocksdbstatebackend Key state is asynchronous, this is because only rocksdb support snapshot operation, The other backend essence is map, which functions synchronously. When the snapshot is finished, the state handle is sent to the job manager. Restore
The restore process is relatively straightforward, essentially pulling the file through the state handle when the task is initialized, and then restoring the state.