Spark running Architecture

Source: Internet
Author: User

1. Build the spark application runtime environment;Create a sparkcontext in driver program (the program containing sparkcontext is called driver program ); Spark application runs as follows: a group of independent executor processes are running on the cluster, which are coordinated by sparkcontext.; 2. sparkcontext applies to resource manager for running executor resources, starts standaloneexecutorbackend, and executor applies to sparkcontent for tasks;The cluster connects to different cluster managers (standalone, yarn, and mesos) through sparkcontext. Cluster Manager allocates resources for executors running applications. Once the connection is established, each application of spark obtains the executor (process) on each node. Each application has its own independent executor process. The executor is the worker process that actually runs on the worknode, they compute or store data for applications; 3. After sparkcontext obtains the executor, the application code of the application will be sent to each executor; 4. sparkcontext: Build an RDD Dag graph, break down the RDD Dag graph into a stage Dag graph, submit the stage to taskscheduler, and then taskscheduler sends the task to the executor for running; 5. Tasks run on executor and all resources are released after running;Spark running architecture features: 1. Each application obtains an exclusive executor process, which stays in the application and runs tasks in multiple threads. The advantages of this application isolation mechanism are, from the perspective of scheduling (each driver schedules its own tasks) or from the perspective of running (tasks from different applications run in different JVMs ). Of course, this also means that spark application cannot share data across applications unless it relies on external storage systems. For example: tachyon and sharkserver; 2. Spark does not care about what Cluster Manager is running at the underlying layer. It only cares about whether executor can be obtained and mutual communication can be maintained, because the final task is run on the executor; 3. Keep your driver program as close as possible to the worker (run the executor node), it is best to be in the same rack. Because there is a large amount of information interaction between sparkcontext and executor during application running. If you want to run it in a remote cluster, it is best to submit the application to the set using RPC instead of running the application far away from worker; 4. The task adopts the optimization mechanism of local data and speculative execution. For details, see

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.