I. Cluster Startup Process-start master
$SPARK_HOME/sbin/start-master.sh
Start-master.sh script key content:
spark-daemon.sh start org.apache.spark.deploy.master.Master 1 --ip $SPARK_MASTER_IP --port $SPARK_MASTER_PORT --webui-port $SPARK_MASTER_WEBUI_PORT
Log information: $ spark_home/logs/
14/07/22 13:41:33 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:7077]14/07/22 13:41:33 INFO master.Master: Starting Spark master at spark://hadoop000:707714/07/22 13:41:33 INFO server.Server: jetty-8.y.z-SNAPSHOT14/07/22 13:41:33 INFO server.AbstractConnector: Started [email protected]0.0.0.0:808014/07/22 13:41:33 INFO ui.MasterWebUI: Started MasterWebUI at http://hadoop000:808014/07/22 13:41:33 INFO master.Master: I have been elected leader! New state: ALIVE
Ii. Cluster Startup Process-start worker
$ Spark_home/sbin/start-slaves.sh
Start-slaves.sh script key content:
spark-daemon.sh start org.apache.spark.deploy.worker.Worker master-spark-URL
When the worker is running, you need to register the specified master URL, Which is spark: // hadoop000: 7077.
After the worker is started, it mainly does two things:
1) Register yourself to the master (registerworker );
2) Send heartbeat information to the master periodically;
Worker sends registration information to the master:
Worker.scala ==>preStart ==>registerWithMaster ==>tryRegisterAllMasters ==> actor ! RegisterWorker(workerId, host, port, cores, memory, webUi.boundPort, publicAddress)
The master side receives the registerworker notification:
Master. Scala ==> caseRegisterworker(ID, workerhost, workerport, cores, memory, workeruiport, publicaddress) => {Val worker = new workerinfo (ID, workerhost, workerport, cores, memory, sender, workeruiport, publicaddress) if (registerworker (worker) {persistenceengine. addworker (worker) sender!Registeredworker(Masterurl, masterwebuiurl) // after successful registration, the message schedule () is sent to worker ()}}
After receiving the successful registration information from the master, the worker periodically sends heartbeat information to the master.
Worker.scala ==>case SendHeartbeat => masterLock.synchronized {if (connected) { master ! Heartbeat(workerId) } }
The master updates the last heartbeat time after receiving the heartbeat information sent by the worker.
Master.scala ==>case Heartbeat(workerId) => { idToWorker.get(workerId) match { case Some(workerInfo) => workerInfo.lastHeartbeat = System.currentTimeMillis() } }
The master periodically removes heartbeat messages that are not sent to the worker node of the master when the timeout period is reached.
Master.scala ==>preStart ==>CheckForWorkerTimeOut ==>case CheckForWorkerTimeOut => {timeOutDeadWorkers()} //Check for, and remove, any timed-out workers
Log information: $ spark_home/logs/
Some master log information:
14/07/22 13:41:36 INFO master.Master: Registering worker hadoop000:48343 with 1 cores, 2.0 GB RAM
Some worker log information:
14/07/22 13:41:35 INFO Worker: Starting Spark worker hadoop000:48343 with 1 cores, 2.0 GB RAM14/07/22 13:41:35 INFO Worker: Spark home: /home/spark/app/spark-1.0.1-bin-2.3.0-cdh5.0.014/07/22 13:41:35 INFO WorkerWebUI: Started WorkerWebUI at http://hadoop000:808114/07/22 13:41:35 INFO Worker: Connecting to master spark://hadoop000:7077...14/07/22 13:41:36 INFO Worker: Successfully registered with master spark://hadoop000:7077
Iii. application submission process
A. Submit Application
Run spark-shell:$ Spark_home/bin/spark-shell -- master spark: // hadoop000: 7077
Log information: $ spark_home/work
Spark-shell is an application. It is created when the createtaskschedend of sparkcontext is started to create sparkdeployschedulerbackend.
client = new AppClient(sc.env.actorSystem, masters, appDesc, this, conf)client.start()
The registerapplication request is sent to the master.
AppClient.scala ==>preStart ==>registerWithMaster ==>tryRegisterAllMasters ==>actor ! RegisterApplication(appDescription)
B. The master processes the registerapplication request.
On the master side, the processing branch is the registerapplication. After the master receives the registerapplication request, the master node schedules the application:If a worker has been registered, send the launchexecutor command to the corresponding worker.
Master.scala ==>case RegisterApplication(description) => { logInfo("Registering app " + description.name) val app = createApplication(description, sender) registerApplication(app) logInfo("Registered app " + description.name + " with ID " + app.id) persistenceEngine.addApplication(app) sender ! RegisteredApplication(app.id, masterUrl) schedule() }
==>schedule ==>launchExecutor(worker, exec) ==> worker.addExecutor(exec) worker.actor ! LaunchExecutor(masterUrl,exec.application.id, exec.id, exec.application.desc, exec.cores, exec.memory) exec.application.driver ! ExecutorAdded(exec.id, worker.id, worker.hostPort, exec.cores, exec.memory)
C. Start executor
After the worker receives the launchexecutor command, it starts the executor process.
Worker.scala ==>case LaunchExecutor(masterUrl, appId, execId, appDesc, cores_, memory_) => logInfo("Asked to launch executor %s/%d for %s".format(appId, execId, appDesc.name)) val manager = new ExecutorRunner(appId, execId, appDesc, cores_, memory_, self, workerId, host, appDesc.sparkHome.map(userSparkHome => new File(userSparkHome)).getOrElse(sparkHome), workDir, akkaUrl, ExecutorState.RUNNING) executors(appId + "/" + execId) = manager manager.start() coresUsed += cores_ memoryUsed += memory_ masterLock.synchronized {master ! ExecutorStateChanged(appId, execId, manager.state, None, None)} }
D. Register executor
The started executor process registers itself to schedulerbackend In the Driver Based on the input parameters at startup.
SparkDeploySchedulerBackend.scala ==>preStart (CoarseGrainedSchedulerBackend) ==> case RegisterExecutor(executorId, hostPort, cores) => logInfo("Registered executor: " + sender + " with ID " + executorId) sender ! RegisteredExecutor(sparkProperties) executorActor(executorId) = sender executorHost(executorId) = Utils.parseHostPort(hostPort)._1 totalCores(executorId) = cores freeCores(executorId) = cores executorAddress(executorId) = sender.path.address addressToExecutorId(sender.path.address) = executorId totalCoreCount.addAndGet(cores) makeOffers()CoarseGrainedExecutorBackend.scala case RegisteredExecutor(sparkProperties) => ogInfo("Successfully registered with driver") executor = new Executor(executorId, Utils.parseHostPort(hostPort)._1, sparkProperties,false)
Executor log information location: console/$ spark_home/logs
E. Run the task
Sample Code:
sc.textFile("hdfs://hadoop000:8020/hello.txt").flatMap(_.split(‘\t‘)).map((_,1)).reduceByKey(_+_).collect
After schedulerbackend receives the registration message of executor, it splits the submitted spark job into multiple specific tasks, and then disperses these tasks to various executors for real operation through the launchtask command..
CoarseGrainedSchedulerBackend.scala def makeOffers() { launchTasks(scheduler.resourceOffers( executorHost.toArray.map {case (id, host) => new WorkerOffer(id, host, freeCores(id))})) } ==>executorActor(task.executorId) ! LaunchTask(new SerializableBuffer(serializedTask)) ==>CoarseGrainedSchedulerBackend case LaunchTask(data) => if (executor == null) { logError("Received LaunchTask command but executor was null") System.exit(1) } else { val ser = SparkEnv.get.closureSerializer.newInstance() val taskDesc = ser.deserialize[TaskDescription](data.value) logInfo("Got assigned task " + taskDesc.taskId) executor.launchTask(this, taskDesc.taskId, taskDesc.serializedTask) }
Some master log information:
14/07/22 15:25:27 INFO master.Master: Registering app Spark shell14/07/22 15:25:27 INFO master.Master: Registered app Spark shell with ID app-20140722152527-000114/07/22 15:25:27 INFO master.Master: Launching executor app-20140722152527-0001/0 on worker worker-20140722134135-hadoop000-48343
Some worker log information:
Spark assembly has been built with Hive, including Datanucleus jars on classpath14/07/22 15:25:27 INFO Worker: Asked to launch executor app-20140722152527-0001/0 for Spark shellSpark assembly has been built with Hive, including Datanucleus jars on classpath14/07/22 15:25:28 INFO ExecutorRunner: Launch command: "java" "-cp" "::/home/spark/app/spark-1.0.1-bin-2.3.0-cdh5.0.0/conf:/home/spark/app/spark-1.0.1-bin-2.3.0-cdh5.0.0/lib/spark-assembly-1.0.1-hadoop2.3.0-cdh5.0.0.jar:/home/spark/app/spark-1.0.1-bin-2.3.0-cdh5.0.0/lib/datanucleus-rdbms-3.2.1.jar:/home/spark/app/spark-1.0.1-bin-2.3.0-cdh5.0.0/lib/datanucleus-core-3.2.2.jar:/home/spark/app/spark-1.0.1-bin-2.3.0-cdh5.0.0/lib/datanucleus-api-jdo-3.2.1.jar" "-XX:MaxPermSize=128m" "-Xms1024M" "-Xmx1024M" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "akka.tcp://[email protected]:50515/user/CoarseGrainedScheduler" "0" "hadoop000" "1" "akka.tcp://[email protected]:48343/user/Worker" "app-20140722152527-0001"
Some log information in the console:
14/07/22 15:25:31 INFO cluster.SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://[email protected]:45150/user/Executor#-791712793] with ID 014/07/22 15:25:31 INFO CoarseGrainedExecutorBackend: Successfully registered with driver
Every time a new application is registered to the master, the master will schedule the schedule function to send the application to the corresponding worker, start the corresponding executorbackend in the corresponding worker, and the final task will run in the executorbackend.