the distributed runtime of Apache FlinkTasks and Operator Chains
When distributed execution, Flink can link operator subtasks to tasks, each task is executed by one thread, which is an effective optimization, avoids the overhead of thread switching and buffering, improves the overall throughput under the premise of reducing delay, and the link behavior can be configured
Job managers,task Managers and clients
The Flink runtime includes two types of processes, ①jobmanager is also known as Master, which coordinates distributed execution, is responsible for scheduling tasks (schedule Task), and coordinates processing checkpoints (coordinate Checkpoints) and coordinated error recovery (coordinate recovery on failures) and so on, there needs to be at least one jobmanager in the cluster, a highly available cluster can have multiple jobmanager, one of which is leader, The rest of the standby (standby) state ②taskmanager is also known as a Worker, which handles tasks or subtasks on the data flow, and is responsible for buffering and exchanging streaming data, which requires at least one TaskManager in the cluster
You can start JobManager and TaskManager in a variety of ways, such as running on a standalone cluster of stand-alone simulations, running inside a container, or running through a resource management framework such as Mesos and YARN.
TaskManager actively connects JobManager, reports that it is available and accepts assigned tasks
The client is not part of the runtime and program execution, but is used to prepare and send the data stream to JobManager, which can be disconnected after the work is done and, of course, to remain connected to receive the output of the handler
Task Slots and Resource
Each Worker, TaskManager, is a JVM process and may perform several subtasks in different threads
To handle multiple tasks assigned to a worker, the worker sets a series of task slots (the task slot)
Each task slot represents a fixed resource on the TaskManager, which is separated from each other, and the separation of resources means that subtasks do not compete with each other's memory, and in the current design, CPU resources are not separated, only storage resources are separated
By adjusting the number of task slots, users can define a method for isolating sub-tasks, and assigning only one slot means that each task group has a separate JVM, and assigning multiple slots means that multiple subtasks share a single JVM, and the tasks on the same JVM share the TCP by multiplexing Connect and share heartbeat (heartbeat) information, and may also share data information and structures, reducing the overhead of each task
By default, Flink supports subtasks sharing slots as long as they are in the same job, which allows the same Slot to have the information needed for the entire process of the job, which has the following benefits
Flink Cluster partitioning slots only need to be concerned with the highest level of complexity in the JOB
Better use of resources, if not using slot sharing, non-intensive source/map()
tasks will be blocked and intensive as much resources, if using slot sharing, for example, in the following example, we can increase the basic degree of parallelism from 2 to 6, so that the full use of existing resources, While ensuring that difficult subtasks are distributed equitably between TaskManager
The Flink API also provides a mechanism for grouping resources to prevent unwanted Slot sharing
State Backends
The specific data structure that stores the key-value index depends on the selected state backend, a state backend storing the data in an in-memory hash table, and the other state backend using ROCKSDB storage
In addition to defining the data structure of the saved state, the backend also implements the logical that gets the point-in-time snapshot of the key-value index and stores the snapshot as part of the checkpoint
savepoints
Applications written in the DataStream API can be executed from the savepoint, which makes it possible to update or replace the cluster without losing state information
SavePoint is similar to the Checkpoint,checkpoint relies on the periodic save state of the checkpointing mechanism, the program executes periodic snapshot of the work node and generate Checkpoint, in order to achieve the effect of recovery execution, We only need the latest Checkpoint, when the new Checkpoint generated, the previous Checkpoint can be safely discarded
In contrast, savepoint is generated manually by the user and does not actively expire
[Note] The distributed runtime of Apache Flink