Flink principle and implementation: Architecture and Topology Overview

Flink principle and implementation: Architecture and Topology Overview _flink

Last Update:2018-08-22 Source: Internet

Author: User

Tags stream api

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

To understand a system, you typically start with a schema. Our concern is: What services are started by each node after the system has been successfully deployed, and how each service interacts and coordinates. Below is the Flink cluster startup architecture diagram.

When the Flink cluster is started, a jobmanger and one or more TaskManager are started first. The Client submits the task to the Jobmanager,jobmanager to perform the task to each TaskManager, then taskmanager the heartbeat and statistic information to JobManager. The transmission of data between TaskManager in the form of a stream. All three of them are independent JVM processes. Clients are the client that submits the JOB, which can be run on any machine (connected to the JobManager environment). After the job is submitted, the Client can end the process (the streaming task), or it can not end and wait for the result to return. JobManager is mainly responsible for scheduling jobs and coordinating tasks to do checkpoint, and responsibilities like Storm Nimbus. When a resource such as a Job and a JAR package is received from a Client, the optimized execution plan is generated and dispatched to the individual TaskManager to execute with the Task's unit. TaskManager set the slot number (Slot) at boot time, each Slot can start a task,task as a thread. Receives the Task that needs to be deployed from the JobManager, after deployment starts, establishes the Netty connection with its upstream, receives the data and processes.

You can see that the Flink Task Scheduler is a multithreaded model, and that different job/task are mixed in a taskmanager process. Although this approach can effectively improve CPU utilization, but the individual does not like this design, because not only the lack of resource isolation mechanism, but also inconvenient debugging. A process model similar to Storm, which is more reasonable in a JVM that runs only the Tasks in the Job application. Job Example

The example shown in this article is flink-1.0.x version

We use the Sockettextstreamwordcount from the Flink examples package, which is an example of how many occurrences of a word are counted from the socket stream.

First, start the local server using Netcat:

$ NC-L 9000

Then submit the Flink program

$ bin/flink Run examples/streaming/sockettextstreamwordcount.jar \
  --hostname 10.218.130.9 \
  --port 9000

You can see the results of Word statistics by entering words at the netcat end and monitoring the output of the TaskManager.

The specific code for Sockettextstreamwordcount is as follows:

public static void Main (string[] args) throws Exception {
  //Check input
  final Parametertool params = Parametertool.froma RGS (args);
  ...

  Set up the execution environment
  final streamexecutionenvironment env = Streamexecutionenvironment.getexecutionenvironment ();

  Get input data
  datastream<string> text =
      Env.sockettextstream (params.get ("hostname"), Params.getint ("Port"), ' \ n ', 0);

  Datastream<tuple2<string, integer>> counts =
      //split up the lines in pairs (2-tuples) containing: (word,1 )
      Text.flatmap (New Tokenizer ())
          //Group by the tuple field "0" and sum up tuple field "1"
          . Keyby (0)
          . SUM (1);
  Counts.print ();
  
  Execute program
  env.execute ("WordCount from Sockettextstream Example");
}

We replace the last line of code Env.execute with System.out.println (Env.getexecutionplan ()); and run the code locally (the concurrency is set to 2), you can get the JSON string of the logical execution plan diagram for the topology, paste the JSON string into http://flink.apache.org/visualizer/, and visualize the execution diagram.

But this is not the final execution diagram that runs in Flink, but a plan diagram that represents the topology node relationship, which corresponds to the steramgraph in Flink. In addition, after submitting the topology (concurrency is set to 2) you can see another execution plan diagram in the UI, as shown below, which corresponds to the jobgraph in Flink.

Graph

It looks a little messy, how there are so many different graphs. In fact, there are more graphs. The execution diagram in Flink can be divided into four layers: streamgraph-> jobgraph-> executiongraph physical execution diagram. Streamgraph: Is the original diagram generated from the code written by the user through the Stream API. Used to represent the topology of a program. Jobgraph:streamgraph has been optimized to become a jobgraph, submitted to the JobManager data structure. The main optimization is to chain multiple eligible nodes together as a node, which can reduce the serialization/deserialization/transmission consumption required for data to flow between nodes. According to the distributed execution graph generated by jobgraph, Executiongraph:jobmanager is the core data structure of the dispatch layer. Physical execution diagram: JobManager according to Executiongraph to the Job, after the deployment of a Task on each TaskManager "diagram", is not a specific data structure.

For example, the evolution process of the 2 concurrent sockettextstreamwordcount four-level execution diagram above is shown in the following illustration (click to view larger image):

Here is a simple explanation for some nouns. streamgraph: the original diagram generated from code written by the user through the Stream API. Streamnode: A class that represents operator and has all the related attributes, such as concurrency, entry, and margin. Streamedge: An edge that connects two streamnode. Jobgraph: streamgraph has been optimized to become a jobgraph, submitted to the JobManager data structure. Jobvertex: Multiple streamnode that are optimized to meet the conditions may chain together to generate a jobvertex, that is, a Jobvertex input that contains one or more Operator,jobvertex is Jobedge, The output is intermediatedataset. Intermediatedataset: Represents the output of Jobvertex, the dataset that is produced by operator processing. Producer is Jobvertex,consumer is Jobedge. Jobedge: Represents a data transfer channel in job graph. Source is Intermediatedataset,target is Jobvertex. That is, the data is passed through Jobedge by Intermediatedataset to the target Jobvertex. According to the distributed execution graph generated by jobgraph, Executiongraph: jobmanager is the core data structure of the dispatch layer. Executionjobvertex: Corresponds to the Jobvertex one by one in Jobgraph. Each executionjobvertex has as many Executionvertex as the concurrency degree. Executionvertex: Represents one of the concurrent subtasks of the Executionjobvertex, the input is Executionedge, and the output is intermediateresultpartition. Intermediateresult: Corresponds to the Intermediatedataset one by one in Jobgraph. Each intermediateresult has a intermediateresultpartition with the same concurrent number as the downstream Executionjobvertex. Intermediateresultpartition: Means Executionvertex.An output partition, producer is Executionvertex,consumer is a number of Executionedge. Executionedge: Indicates Executionvertex input, source is Intermediateresultpartition,target is Executionvertex. Source and Target are only one. Execution: It's an attempt to perform a executionvertex. Executionvertex may have multiple executionattemptid when a failure occurs or the data needs to be calculated. A Execution is uniquely identified by Executionattemptid. Both JM and TM updates on task deployment and task status are determined by Executionattemptid to the message recipient. Physical execution diagram: jobmanager according to Executiongraph to the Job, after the deployment of a Task on each TaskManager "diagram", is not a specific data structure. Task:execution is scheduled to start the corresponding Task in the assigned TaskManager. Task wraps a operator with user-executed logic. Resultpartition: Represents the data generated by a task and corresponds to the intermediateresultpartition one by one in Executiongraph. Resultsubpartition: is a child partition of the resultpartition. Each resultpartition contains multiple resultsubpartition, the number of which is determined by downstream consumption tasks and distributionpattern. Inputgate: Represents the input package for a task, and corresponds to the Jobgraph Jobedge one by one. Each inputgate consumes one or more resultpartition. Inputchannel: Each inputgate will contain more than one inputchannel, which corresponds to Executionedge one by one in Executiongraph, and is also connected to resultsubpartition one-to-one. That is, a inputchannel receives a resultsubpartition output.

Subsequent articles will detail how Flink builds these execution diagrams. The main contents are: How to generate streamgraph how to build jobgraph how to build executiongraph how to schedule (how to generate a physical execution diagram)

From: http://wuchong.me/blog/2016/05/03/flink-internals-overview/

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More