Spark Introduction Combat series--4.spark Running Architecture __spark

Source: Internet
Author: User
Tags shuffle hadoop mapreduce


Http://www.cnblogs.com/shishanyuan/archive/2015/08/19/4721326.html


1, spark operation structure 1.1 term definitions

LApplication: The Spark application concept is similar to that of the Hadoop mapreduce, which refers to a user-written Spark application that contains a driver Functional code and executor code that runs on multiple nodes in a cluster;

LDriver: The Driver in Spark runs the main () function of the application above and creates Sparkcontext, where Sparkcontext is created to prepare the operating environment for the spark application. In Spark, the Sparkcontext is responsible for and Clustermanager communication, the application of resources, assignment and monitoring of tasks, etc. driver is responsible for closing the sparkcontext when the executor part is completed. Sparkcontext is usually used to represent drive;

LExecutor: Application A process running on a worker node that runs a task and is responsible for the presence of data in memory or on disk, each with a separate batch of Executor. In spark on yarn mode, the process name is Coarsegrainedexecutorbackend, similar to Yarnchild in the Hadoop mapreduce. A coarsegrainedexecutorbackend process has and only one executor object, which is responsible for wrapping the task into Taskrunner and extracting an idle thread from the thread pool to run the task. The number of tasks that each coarsegrainedexecutorbackend can run in parallel depends on the number of CPUs allocated to it;

LCluster Manager: Refers to the external services to obtain resources on the cluster, currently:

Østandalone:spark resource Management, the Master is responsible for the allocation of resources;

Øhadoop Yarn: Responsible for the allocation of resources by the ResourceManager in Yarn;

LWorker: Any node in the cluster that can run application code, similar to the NodeManager node in yarn. In the standalone mode is the slave file configuration of the worker node, in the spark on the yarn mode of the middle finger is nodemanager node;

L Job: A parallel computation consisting of multiple task tasks, often spawned by the spark action, a job that contains multiple RDD and various operation acting on the corresponding RDD;

L phase (Stage): Each job will be split into a lot of task, each group of tasks is called Stage, also can be called Taskset, a job is divided into several stages;

L Task: A work assignment that is sent to a executor;

1.2 Spark running basic process

Spark Run basic process see schematic below

1. Build the Spark application operating environment (start Sparkcontext), Sparkcontext register with the resource Manager (can be standalone, mesos, or yarn) and request to run executor resources;

2. The resource manager allocates executor resources and starts the standaloneexecutorbackend,executor operation to be sent with the heartbeat to the explorer;

3. Sparkcontext is constructed into a DAG graph, which decomposes the Dag graph into stage and sends Taskset to task Scheduler. Executor applies task,task scheduler to Sparkcontext to run the Task to executor and sparkcontext to distribute the application code to executor.

4. The task runs on the executor and releases all resources.

Spark Operation Architecture Features:

• Each application gets the exclusive executor process, which resides throughout the application and runs tasks in multi-threaded fashion. This application isolation mechanism has its advantages, either from the scheduling point of view (each driver schedules its own task) or from the operational point of view (tasks from different application run in different JVMs). Of course, this also means that spark application cannot share data across applications unless the data is written to an external storage system.

Lspark is not related to the resource manager, as long as it can get the executor process and keep communicating with each other.

L The client submitting the Sparkcontext should be near the worker node (the node running the executor), preferably in the same rack, because spark There is a great deal of information exchange between Sparkcontext and executor in the process of application operation; If you want to run in a remote cluster, it is best to use RPC to submit sparkcontext to the cluster and not to run sparkcontext away from the worker.

Ltask adopts the optimization mechanism of data locality and conjecture execution. 1.2.1 Dagscheduler

Dagscheduler transforms a spark job into a stage dag (directed acyclic graph-direction-free graph) and finds the least expensive scheduling method based on the relationship between Rdd and stage. The stage is then presented to the TaskScheduler in the form of Taskset, and the following illustration shows the role of Dagscheduler:

1.2.2 TaskScheduler

Dagscheduler determines the ideal location for running tasks and passes this information to the lower TaskScheduler. In addition, Dagscheduler also handles failures due to shuffle data loss, which may require the stage to be resubmitted before the run (shuffle data loss caused by a task failure is handled by TaskScheduler).

TaskScheduler maintains all taskset, and when Executor sends a heartbeat to driver, TaskScheduler assigns the corresponding task based on the remainder of its resources. In addition, TaskScheduler maintains the running state of all tasks and retries the failed task. The following figure shows the role of TaskScheduler:

In different operating modes the Task Scheduler is specific:

L Spark on standalone mode is TaskScheduler;

L yarn-client Mode is Yarnclientclusterscheduler

l yarn-cluster mode is Yarnclusterscheduler operation Principle of 1.3 Rdd

So how does RDD work in the spark architecture? The overall high-level view, mainly divides into three steps:

1. Create a RDD object

The 2.DAGScheduler module is involved in the computation and calculates the dependence between Rdd. The dependence of Rdd forms a DAG

3. Each job is divided into multiple stage, one of the main basis for dividing the stage is whether the input of the current calculation factor is determined, if it is divided into the same stage, to avoid the message passing overhead between multiple stage.

Take the following example of a-Z to find out the total number of different names under the same initial letter to see how RDD is running.

Step 1: Create the example above RDD to remove the last collect is an action, will not create RDD, the first four conversions will create a new RDD. So the first step is to create all the RDD (five internal information).

Step 2: Create an execution plan Spark as much as possible, and based on whether you want to rearrange the data to divide the phase (stage) , for example, the GroupBy () transformation will divide the entire execution plan into two phases. Eventually , a DAG (directed acyclic graph, a direction-free graph) is produced as a logical execution plan.

Step 3: The scheduling task divides the phases into different tasks , each of which is a combination of data and computation. All tasks for the current phase are completed before the next stage. Because the first transition in the next phase must be to rearrange the data, all the result data for the current phase must be calculated to continue.

Assuming that there are four blocks under the hdfs://names in this example, Hadooprdd partitions will have four partitions corresponding to the four block data, and Preferedlocations will indicate the best location for the four blocks. Now you can create four tasks and dispatch them to the appropriate cluster nodes.

2, Spark in different clusters of the operating framework

Spark focus on the establishment of a good ecosystem, it not only supports a variety of external file storage systems, provides a variety of cluster operating mode. When deployed on a single machine, it can be run either locally or in a pseudo distributed mode; when deployed in a distributed cluster, you can choose the standalone mode (spark mode) According to the actual situation of your cluster. Yarn-client mode or Yarn-cluster mode. Although the various operating modes of spark are different in starting mode, running position and scheduling strategy, their purpose is basically consistent, that is, to run and manage tasks in the appropriate position according to user's configuration and job needs. 2.1 Spark on standalone running process

Standalone mode is a resource scheduling framework implemented by Spark, whose main nodes are client node, master node and worker node. Where driver can run either on the master node or on the local client side. When you submit a spark job with the Spark-shell Interactive tool, driver runs on the master node, when you submit a job with the Spark-submit tool or use the new on a development platform such as eclips, idea, etc. When Sparkconf.setmanager ("spark://master:7077") runs the spark task, driver is run on the local client side.

The process is as follows:

1.SparkContext Connect to master, register with master and request resources (CPU Core and memory);

2.Master decide on which worker to allocate resources based on the Sparkcontext resource request and the information reported within the worker heartbeat cycle, then acquire resources on the worker, then start the standaloneexecutorbackend;

3.StandaloneExecutorBackend registration to Sparkcontext;

4.SparkContext sends the Applicaiton code to Standaloneexecutorbackend and Sparkcontext parses the Applicaiton code, builds the DAG map, and submits it to the DAG Scheduler is decomposed into stage (when the action is encountered, a job is created; each job contains 1 or more stage,stage that are typically generated before external data and shuffle are obtained). It is then submitted to task Scheduler,task by stage (or Taskset) Scheduler responsible for assigning the task to the corresponding worker and finally submitting it to the standaloneexecutorbackend for execution;

The 5.StandaloneExecutorBackend establishes the executor thread pool, starts the task, and reports to the Sparkcontext until the task completes.

6. After all tasks have been completed, Sparkcontext logs out to master to release resources.

2.2 Spark on yarn running process

Yarn is a unified resource management mechanism on which multiple sets of computational frameworks can be run. In the current world of large data technology, most companies use spark to compute data, and other computational frameworks such as MapReduce and Storm are used for historical reasons or performance considerations for unilateral business processes. Spark based on this situation developed spark on yarn operating mode, because of the yarn good flexible resource management mechanism, not only the deployment of application more convenient, and users in the yarn cluster running services and application resources are completely isolated, A more practical application value is that yarn can manage multiple services running in the cluster in a queue way.

The Spark on yarn mode is divided into two modes depending on the location of the driver in the cluster: one is the yarn-client mode and the other is Yarn-cluster (or Yarn-standalone mode). 2.2.1 Yarn Framework process

The combination of any framework and yarn must follow the yarn development model. Before analyzing the implementation details of spark on yarn, it is important to analyze some of the fundamentals of the yarn framework.

The basic running flowchart of the yarn framework is:

The ResourceManager is responsible for allocating the cluster resources to each application, and the basic unit of resource allocation and dispatch is container, which encapsulates machine resources such as memory, CPU, disk, and network, and each task is assigned a container, The task can only be executed in the container and use the resources encapsulated by the container. NodeManager is a computing node, mainly responsible for the start-up of the application required container, monitoring resources (memory, CPU, disk and network, etc.) and report to the use of ResourceManager. ResourceManager and nodemanagers together constitute the entire data computing framework, applicationmaster related to the specific application, It is primarily responsible for consulting with ResourceManager to obtain the appropriate container and to track the status of these container and monitor their progress. 2.2.2 yarn-client

In yarn-client mode, driver is run locally on the client, which allows spark application to interact with the client because driver is on the client, so WebUI can be accessed through driver, by default http:// hadoop1:4040 access, while yarn is accessed through http://hadoop1:8088.

The Yarn-client workflow is divided into the following steps:

1.Spark Yarn client to Yarn's ResourceManager request to start application Master. At the same time, Dagscheduler and TaskScheduler will be created in sparkcontent initialization, because we choose the yarn-client mode, The program will choose Yarnclientclusterscheduler and yarnclientschedulerbackend;

2.ResourceManager after receiving the request, select a NodeManager in the cluster, assign the first container to the application, and ask it to start the application's applicationmaster in this container. The difference between the Yarn-cluster and the Applicationmaster is that the sparkcontext is not run and the allocation of resources is carried out only in connection with the Sparkcontext;

After the initialization of the Sparkcontext in 3.Client, establish communication with Applicationmaster, register with ResourceManager, and apply resources to ResourceManager according to the task Information (Container);

4. Once the applicationmaster applies to the resource (i.e. container), it communicates with the corresponding NodeManager, requiring it to start the boot in the container obtained coarsegrainedexecutorbackend , Coarsegrainedexecutorbackend will register with the client Sparkcontext and apply for a task after startup;

The Sparkcontext assignment task in 5.Client to Coarsegrainedexecutorbackend execution, Coarsegrainedexecutorbackend runs the task and reports the status and progress of the operation to the driver so that the client can keep abreast of the running of each task so that the task restarts when the task fails;

6. After the application has been run, the client's Sparkcontext requests the ResourceManager to log off and close himself.

2.2.3 yarn-cluster

In Yarn-cluster mode, when a user submits an application to yarn, yarn runs the application in two stages: the first stage is to start the driver of Spark as a applicationmaster in the yarn cluster The second stage is to create the application by Applicationmaster, then request resources for it to ResourceManager, and start executor to run the task while monitoring its entire run until the run is complete.

The Yarn-cluster workflow is divided into the following steps:


1. Spark Yarn Client submits the application to the Yarn, including the Applicationmaster procedure, the command to start the applicationmaster, the procedure needed to run in the executor, etc.

2. ResourceManager receives the request, selects a NodeManager in the cluster, assigns the first container to the application, and requires it to start the application's applicationmaster in this container. The initialization of Applicationmaster is sparkcontext;

3. Applicationmaster registers with the ResourceManager so that the user can view the running state of the application directly through the Resourcemanage, and then it will use polling to request resources for each task through the RPC protocol. and monitor their state of operation until the end of the run;

4. Once the applicationmaster applies to the resource (i.e. container), it communicates with the corresponding NodeManager, requiring it to start the boot in the container obtained coarsegrainedexecutorbackend , Coarsegrainedexecutorbackend will register with the Sparkcontext in Applicationmaster and request a task after startup. This is just like the standalone pattern, except that when Sparkcontext is initialized in Spark application, Use Coarsegrainedschedulerbackend with Yarnclusterscheduler to perform task scheduling, where Yarnclusterscheduler is just a simple package for Taskschedulerimpl , which increases the waiting logic of the executor;

5. The Sparkcontext assignment task in Applicationmaster to Coarsegrainedexecutorbackend execution, Coarsegrainedexecutorbackend Run the task and report the status and progress of the operation to the Applicationmaster so that Applicationmaster can keep abreast of the running status of each task. This allows the task to be restarted when the task fails;

6. After the application has been run, Applicationmaster requests to ResourceManager to log off and close himself.

The difference between 2.2.4 Yarn-client and Yarn-cluster

Before understanding the deep differences between yarn-client and Yarn-cluster, one concept is clear: application Master. In yarn, each application instance has a Applicationmaster process, which is the first container that the application starts. It is responsible for dealing with ResourceManager and requesting resources, after obtaining resources to tell NodeManager to start container for it. The difference between the Yarn-cluster and yarn-client patterns is actually the difference between the applicationmaster process and the deep meaning.

In Yarn-cluster mode, driver runs in AM (Application Master), which is responsible for applying resources to YARN and overseeing the operation of the job. When the user submits the job, the client can be turned off and the job continues to run on the yarn, so the yarn-cluster mode is not suitable for running the interactive type of job;

In yarn-client mode, application master only asks the YARN to request executor,client and request container communication to dispatch their work, which means that the client cannot leave.

3, Spark in different clusters of the operation of the demo

The Hadoop and spark clusters need to be started in the following run demo, where Hadoop needs to start HDFs and yarn, and the boot process can be seen in section III, "Spark programming Model (on)-concepts and shell tests." 3.1 standalone running process demo

In the nodes of the spark cluster, 40% of the data is calculated and 60% of the memory is used to save the results, in order to visually perceive the difference between memory and non memory speed, the Sogou3.txt data file of size 1G is used in the demo (see section III Spark programming Model (upper)--concept and Shell Test 3.2 test data file upload, through contrast to get the gap. 3.1.1 View test file storage location

Use the HDFs command to observe the location of the Sogou3.txt data storage node

$CD/app/hadoop/hadoop-2.2.0/bin

$hdfs fsck/sogou/sogouq3.txt-files-blocks-locations

You can see that the file is separated into 9 blocks in a cluster

3.1.2 Start spark-shell

Start Spark-shell with the following command to allocate 1G memory per executor in the demo

$CD/app/hadoop/spark-1.1.0/bin

$./spark-shell--master spark://hadoop1:7077--executor-memory 1g

Viewing executors through the Spark monitoring interface, you can observe 1 driver and 3 executor, where Hadoop2 and HADOOP3 start a executor, and HADOOP1 initiates a executor and driver. In this mode driver run Sparkcontect, that is, dagsheduler and Tasksheduler processes are running on the nodes, stage and task assignment and management.

Operation Process and result analysis of 3.1.3

The first step is to compute the number of data sets after reading the file and to cache the dataset using the cache () method during the calculation

Val sogou=sc.textfile ("Hdfs://hadoop1:9000/sogou/sogouq3.txt")

Sogou.cache ()

Sogou.count ()

Through page monitoring you can see that the job is divided into 8 tasks, one of the tasks of data from two data fragments, the other tasks each corresponding to a data fragment, that is to show 7 tasks to obtain the type of data (node_local), a task to get the data type for any position (any).

In the storage monitoring interface, we can see that the number of cache copies is 3, the size is 907.1M, the cache rate is 38%

The result of the operation was 10 million data sets, which took 352.17 seconds.

The second step reads the file again after the data set number, the calculation using cached data, contrast

Sogou.count ()

Page monitoring allows you to see that the job is still divided into 8 tasks, where 3 of the task data comes from memory (process_local), 3 task data from native (Node_local), and 2 other task data from anywhere (any). The amount of time spent on tasks is sorted by:any> node_local> process_local, which shows that the data used in memory is at least 2 orders of magnitude faster than using the native or any location.

The entire job runs at a speed of 34.14 seconds, an order of magnitude higher than no cache. Since the data in the example is just a partial cache (cache rate of 38%), if the full cache speed can be further enhanced, from this experience to spark very memory, but also fast enough, sharp enough.

3.2 yarn-client Run process demo 3.2.1 Start Spark-shell

Start Spark-shell with the following command, allocating 3 executor in the demo, each executor as 1G memory

$CD/app/hadoop/spark-1.1.0/bin

$./spark-shell--master yarn-client--num-executors 3--executor-memory 1g

The first step is to upload the relevant running Jar pack to HDFs.

The HDFs view interface allows you to see the application number in the/user/hadoop/.sparkstaging/, looking up these files:

Step two to start Application Master, register executor

The application launches application Master to the ResourceManager, assigns Cotainer after the boot completes, and feeds the information back to Sparkcontext,sparkcontext and related NM communications. Start the executor on the container, and from the following figure you can see that HADOOP3 was started in Hadoop1, HADOOP2, and executor respectively.

Step three to see startup results

In yarn-client mode, driver is run locally on the client, which allows spark application to interact with the client because driver can access the WebUI status through driver on the client, by default http:// hadoop1:4040 access, while yarn is accessed through http://hadoop1:8088.

Operation Process and result analysis of 3.2.2

The first step is to compute the number of data sets after reading the file and to cache the dataset using the cache () method during the calculation

Val sogou=sc.textfile ("Hdfs://hadoop1:9000/sogou/sogouq3.txt")

Sogou.cache ()

Sogou.count ()

Through page monitoring you can see that the job is divided into 8 tasks, one of the tasks of data from two data fragments, the other tasks each corresponding to a data fragment, that is to show 7 tasks to obtain the type of data (node_local), a task to get the type of data for any location (rack_local).

By running the log, you can observe that at the end of all tasks, the Yarn cluster task is completed by Yarnclientscheduler, the resource is recycled, the Sparkcontext is closed, and the whole process takes 108.6 seconds.

The second step is to view the data cache situation

You can see from the monitoring interface that 38% of the data, like standalone, is already cached in memory

Step three &nbs

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.