# # Spark worker working mechanism # #
[Sleeping Water-hzjs.2016.08.22]
First, the start of the worker process
1, Driver and executor start-up process
Second, the Worker starts driver
1, when the driver in Cluster fails, if supervise is true, the worker who initiates the driver will be responsible for restarting the driver;
2, Driverrunner Start process is through the processbuilder in the Process.get.waitFor to complete.
Third, Spark Executor working principle
1, the need to pay special attention to the Coasegraninedexecutorbackend start-up to driver registration executor Its essence is a registered Executorbackend instance, and executor instances have a direct relationship.
2, Coasegrainedexecutorbackend is the name of the process where the Executor is running, Excutor is the object that is working on the task, Executor internal is the thread pool to complete the task calculation;
3, Coasegrainedexecutorbackend and executor are one by one corresponding;
4, Coasegrainedexecutorbackend is a message communication body (which specifically implements the Treadsaferpcendpoint), can send information to driver, and can accept the driver sent over the instructions, such as the start task.
5. There are two critical endpoint in the Driverendpoint process:
A, Clientendpoint is mainly responsible for registering the current program to master, is the internal member of Appclient;
B, Drivereenpoint this is the entire program run time Drive, is the internal member of Coasegranedexecutorbackend;
6, in the driver through the Executordata encapsulation and registration executorbackend information into the driver memory data structure Executormapdata:
7, actually in the execution time Driverendpoint will write the information to the Coasegrainedschedulerbackend memory data structure Executormapdata, So it is finally registered to the Coasegrainedschedulerbackend, that is to say Coasegrainedschedulerbackend Master for the former program to assign all the executorbackend process, In each Executorbackend instance, the Executor object is responsible for the operation of the task, and the Synchronized keyword is used to ensure the Executormapdata secure concurrent write operation.
Context.reply (Registeredexecutor (executoraddress.host)) is sending a message to Coasegrainedexecutorbackend, receiving the following message:
8, coasegrainedexecutorbackend receive Driverendpoint sent over Registeredexecutor message will start executor instance object, and executor instance object responsible for real task calculated;
Thirdly, Executor how to work in detail.
1, when driver sent over the task, is actually sent to the coarsegrainedexecutorbackend this rpcendpoint, instead of directly sent to the Executor (Executor because not the message loop body, So you can never and cannot directly accept the remote information):
2, Executorbackend in the receipt of driver sent to the message will provide call Launchtask to executor to execute;
[Praise]