1. Driver: Run the main () function of application and create the Sparkcontext.
2, Client: Users submit jobs clients.
3. Worker: Any node in the cluster that can run application code, running one or more executor processes.
4. Executor: The task executor running in the Worker, the Executor starts the thread pool to run the task, and is responsible for the memory or the disk on which the data exists. Every application will apply for their own Executor.
Processing tasks.
5, Sparkcontext: The entire application context, control the life cycle of the application.
6, the basic calculation unit of the Rdd:spark, a group of RDD to form the implementation of the direction of the non-ring diagram Rdd graph.
7. Dag Scheduler: Build a stage-based DAG workflow based on the job and submit the stage to TaskScheduler.
8, TaskScheduler: The Task is distributed to Executor execution.
9. Sparkenv: A thread-level context that stores references to important components of the runtime.
Apache Spark Architecture