Flink principle and implementation: Architecture and Topology Overview _flink

Source: Internet
Author: User
Tags stream api

To understand a system, you typically start with a schema. Our concern is: What services are started by each node after the system has been successfully deployed, and how each service interacts and coordinates. Below is the Flink cluster startup architecture diagram.

When the Flink cluster is started, a jobmanger and one or more TaskManager are started first. The Client submits the task to the Jobmanager,jobmanager to perform the task to each TaskManager, then taskmanager the heartbeat and statistic information to JobManager. The transmission of data between TaskManager in the form of a stream. All three of them are independent JVM processes. Clients are the client that submits the JOB, which can be run on any machine (connected to the JobManager environment). After the job is submitted, the Client can end the process (the streaming task), or it can not end and wait for the result to return. JobManager is mainly responsible for scheduling jobs and coordinating tasks to do checkpoint, and responsibilities like Storm Nimbus. When a resource such as a Job and a JAR package is received from a Client, the optimized execution plan is generated and dispatched to the individual TaskManager to execute with the Task's unit. TaskManager set the slot number (Slot) at boot time, each Slot can start a task,task as a thread. Receives the Task that needs to be deployed from the JobManager, after deployment starts, establishes the Netty connection with its upstream, receives the data and processes.

You can see that the Flink Task Scheduler is a multithreaded model, and that different job/task are mixed in a taskmanager process. Although this approach can effectively improve CPU utilization, but the individual does not like this design, because not only the lack of resource isolation mechanism, but also inconvenient debugging. A process model similar to Storm, which is more reasonable in a JVM that runs only the Tasks in the Job application. Job Example

The example shown in this article is flink-1.0.x version

We use the Sockettextstreamwordcount from the Flink examples package, which is an example of how many occurrences of a word are counted from the socket stream.

First, start the local server using Netcat:

$ NC-L 9000

Then submit the Flink program

$ bin/flink Run examples/streaming/sockettextstreamwordcount.jar \
  --hostname 10.218.130.9 \
  --port 9000

You can see the results of Word statistics by entering words at the netcat end and monitoring the output of the TaskManager.

The specific code for Sockettextstreamwordcount is as follows:

public static void Main (string[] args) throws Exception {
  //Check input
  final Parametertool params = Parametertool.froma RGS (args);
  ...

  Set up the execution environment
  final streamexecutionenvironment env = Streamexecutionenvironment.getexecutionenvironment ();

  Get input data
  datastream<string> text =
      Env.sockettextstream (params.get ("hostname"), Params.getint ("Port"), ' \ n ', 0);

  Datastream<tuple2<string, integer>> counts =
      //split up the lines in pairs (2-tuples) containing: (word,1 )
      Text.flatmap (New Tokenizer ())
          //Group by the tuple field "0" and sum up tuple field "1"
          . Keyby (0)
          . SUM (1);
  Counts.print ();
  
  Execute program
  env.execute ("WordCount from Sockettextstream Example");
}

We replace the last line of code Env.execute with System.out.println (Env.getexecutionplan ()); and run the code locally (the concurrency is set to 2), you can get the JSON string of the logical execution plan diagram for the topology, paste the JSON string into http://flink.apache.org/visualizer/, and visualize the execution diagram.

But this is not the final execution diagram that runs in Flink, but a plan diagram that represents the topology node relationship, which corresponds to the steramgraph in Flink. In addition, after submitting the topology (concurrency is set to 2) you can see another execution plan diagram in the UI, as shown below, which corresponds to the jobgraph in Flink.

Graph

It looks a little messy, how there are so many different graphs. In fact, there are more graphs. The execution diagram in Flink can be divided into four layers: streamgraph-> jobgraph-> executiongraph physical execution diagram. Streamgraph: Is the original diagram generated from the code written by the user through the Stream API. Used to represent the topology of a program. Jobgraph:streamgraph has been optimized to become a jobgraph, submitted to the JobManager data structure. The main optimization is to chain multiple eligible nodes together as a node, which can reduce the serialization/deserialization/transmission consumption required for data to flow between nodes. According to the distributed execution graph generated by jobgraph, Executiongraph:jobmanager is the core data structure of the dispatch layer. Physical execution diagram: JobManager according to Executiongraph to the Job, after the deployment of a Task on each TaskManager "diagram", is not a specific data structure.

For example, the evolution process of the 2 concurrent sockettextstreamwordcount four-level execution diagram above is shown in the following illustration (click to view larger image):

Here is a simple explanation for some nouns. streamgraph:  the original diagram generated from code written by the user through the Stream API. Streamnode: A class that represents operator and has all the related attributes, such as concurrency, entry, and margin. Streamedge: An edge that connects two streamnode. Jobgraph: streamgraph has been optimized to become a jobgraph, submitted to the JobManager data structure. Jobvertex: Multiple streamnode that are optimized to meet the conditions may chain together to generate a jobvertex, that is, a Jobvertex input that contains one or more Operator,jobvertex is Jobedge, The output is intermediatedataset. Intermediatedataset: Represents the output of Jobvertex, the dataset that is produced by operator processing. Producer is Jobvertex,consumer is Jobedge. Jobedge: Represents a data transfer channel in job graph. Source is Intermediatedataset,target is Jobvertex. That is, the data is passed through Jobedge by Intermediatedataset to the target Jobvertex. According to the distributed execution graph generated by jobgraph, Executiongraph: jobmanager is the core data structure of the dispatch layer. Executionjobvertex: Corresponds to the Jobvertex one by one in Jobgraph. Each executionjobvertex has as many Executionvertex as the concurrency degree. Executionvertex: Represents one of the concurrent subtasks of the Executionjobvertex, the input is Executionedge, and the output is intermediateresultpartition. Intermediateresult: Corresponds to the Intermediatedataset one by one in Jobgraph. Each intermediateresult has a intermediateresultpartition with the same concurrent number as the downstream Executionjobvertex. Intermediateresultpartition: Means Executionvertex.An output partition, producer is Executionvertex,consumer is a number of Executionedge. Executionedge: Indicates Executionvertex input, source is Intermediateresultpartition,target is Executionvertex. Source and Target are only one. Execution: It's an attempt to perform a executionvertex. Executionvertex may have multiple executionattemptid when a failure occurs or the data needs to be calculated. A Execution is uniquely identified by Executionattemptid. Both JM and TM updates on task deployment and task status are determined by Executionattemptid to the message recipient. Physical execution diagram:  jobmanager according to Executiongraph to the Job, after the deployment of a Task on each TaskManager "diagram", is not a specific data structure. Task:execution is scheduled to start the corresponding Task in the assigned TaskManager. Task wraps a operator with user-executed logic. Resultpartition: Represents the data generated by a task and corresponds to the intermediateresultpartition one by one in Executiongraph. Resultsubpartition: is a child partition of the resultpartition. Each resultpartition contains multiple resultsubpartition, the number of which is determined by downstream consumption tasks and distributionpattern. Inputgate: Represents the input package for a task, and corresponds to the Jobgraph Jobedge one by one. Each inputgate consumes one or more resultpartition. Inputchannel: Each inputgate will contain more than one inputchannel, which corresponds to Executionedge one by one in Executiongraph, and is also connected to resultsubpartition one-to-one. That is, a inputchannel receives a resultsubpartition output.

Subsequent articles will detail how Flink builds these execution diagrams. The main contents are: How to generate streamgraph how to build jobgraph how to build executiongraph how to schedule (how to generate a physical execution diagram)

From: http://wuchong.me/blog/2016/05/03/flink-internals-overview/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.