Application:
Application is the spark user who created the Sparkcontext instance object and contains the driver program:
Spark-shell is an application because Spark-shell created a Sparkcontext object when it was started, with the name SC:
Job:
As opposed to Spark's action, each action, such as Count, Saveastextfile, and so on, corresponds to a job instance that contains multi-tasking parallel computations.
Driver Program:
The program that runs the main function and creates the Sparkcontext instance
Cluster Manager:
Cluster resource management external services, on Spark now has standalone, yarn, mesos and other three kinds of cluster resource manager, Spark's own standalone mode can meet most of the spark computing environment for cluster resource management needs, Yarn and Mesos are basically only considered when running multiple sets of computing frameworks in a cluster
Worker Node:
A working node in a cluster that can run application code, equivalent to the slave node of Hadoop
Executor:
On a worker node, the worker processes that are started on the application, the task is assigned to run in the process, and the data is stored in memory or disk, it must be noted that each application will have only one executor on a worker node, The tasks of the application are processed concurrently in a multithreaded manner within the executor.
Task:
A unit of work that is sent to executor by driver, typically a task handles a split data, each split is typically the size of a block chunk:
State:
A job is split into many tasks, each set of tasks is called State, and the MapReduce map is like the reduce task, which is based on the fact that state is usually started by reading external data or shuffle data, The end of a state is usually due to the occurrence of a shuffle (such as a reducebykey operation) or the end of the entire job, such as placing data on a storage system such as HDFs:
Spark kernel secret -01-spark kernel core terminology parsing