Source analysis for Spark's master and worker cluster startup

Source: Internet
Author: User

Analysis based on the source code of spark1.3.1

Spark Master start Source analysis

1, the main method of calling Master in Start-master.sh, the Main method call
def main (argstrings:array[string]) {
Signallogger.register (log)
Val conf = new sparkconf
Val args = new Masterarguments (argstrings, conf)
Val (Actorsystem, _, _, _) = Startsystemandactor (Args.host, Args.port, Args.webuiport, conf)//start system and actor
Actorsystem.awaittermination ()
}
2. Call Startsystemandactor to start the system and create the actor
Def startsystemandactor (
Host:string,
Port:int,
Webuiport:int,
conf:sparkconf): (actorsystem, int, int, option[int]) = {
Val securitymgr = new SecurityManager (conf)
Val (Actorsystem, Boundport) = Akkautils.createactorsystem (SystemName, host, port, conf = conf,
SecurityManager = securitymgr)
Val actor = Actorsystem.actorof (
Props (Classof[master], host, Boundport, Webuiport, Securitymgr, conf), Actorname)
Val Timeout = akkautils.asktimeout (conf)
Val portsrequest = Actor.ask (boundportsrequest) (timeout)
Val portsresponse = Await.result (portsrequest, timeout). Asinstanceof[boundportsresponse]
(Actorsystem, Boundport, Portsresponse.webuiport, Portsresponse.restport)
3. Call Akkautils.createactorsystem to create Actorsystem
Def Createactorsystem (
Name:string,
Host:string,
Port:int,
Conf:sparkconf,
Securitymanager:securitymanager): (Actorsystem, Int) = {
Val startservice:int = (Actorsystem, Int) = {Actualport =
Docreateactorsystem (name, host, Actualport, conf, SecurityManager)
}
Utils.startserviceonport (port, StartService, conf, name)
}
4. Call Utils.startserviceonport to start a service on a port, call Docreateactorsystem create after successful creation Actorsystem
5. Actorsystem Create actor after successful creation
6. Call Master's main constructor, execute Prestart ()




1. start-slaves.sh invoke the main method of the worker class
def main (argstrings:array[string]) {
Signallogger.register (log)
Val conf = new sparkconf
Val args = new Workerarguments (argstrings, conf)
Val (Actorsystem, _) = Startsystemandactor (Args.host, Args.port, Args.webuiport, Args.cores,
Args.memory, Args.masters, Args.workdir)
Actorsystem.awaittermination ()
}
2. Call Startsystemandactor to start the system and create the actor
Def startsystemandactor (
Host:string,
Port:int,
Webuiport:int,
Cores:int,
Memory:int,
Masterurls:array[string],
Workdir:string,
Workernumber:option[int] = None,
conf:sparkconf = new sparkconf): (Actorsystem, Int) = {


The Localsparkcluster runs multiple local Sparkworkerx actor systems
Val systemName = "Sparkworker" + workernumber.map (_.tostring). Getorelse ("")
Val actorname = "Worker"
Val securitymgr = new SecurityManager (conf)
Val (Actorsystem, Boundport) = Akkautils.createactorsystem (SystemName, host, Port,
conf = conf, SecurityManager = securitymgr)
Val masterakkaurls = Masterurls.map (Master.toakkaurl (_, Akkautils.protocol (Actorsystem)))
Actorsystem.actorof (Props (Classof[worker), host, Boundport, Webuiport, cores, memory,
Masterakkaurls, SystemName, Actorname, Workdir, conf, securitymgr), name = Actorname)
(Actorsystem, Boundport)
}
3. Call Akkautils Createactorsystem to create Actorsystem
Def Createactorsystem (
Name:string,
Host:string,
Port:int,
Conf:sparkconf,
Securitymanager:securitymanager): (Actorsystem, Int) = {
Val startservice:int = (Actorsystem, Int) = {Actualport =
Docreateactorsystem (name, host, Actualport, conf, SecurityManager)
}
Utils.startserviceonport (port, StartService, conf, name)
}
4. Call the worker's main constructor after creating Actorsystem, and execute the Prestart method
Override Def Prestart () {
ASSERT (!registered)
Loginfo ("Starting Spark worker%s:%d with%d cores,%s RAM". Format (
Host, port, cores, utils.megabytestostring (memory)))
Loginfo (S "Running Spark version ${org.apache.spark.spark_version}")
Loginfo ("Spark home:" + sparkhome)
Createworkdir ()
Context.system.eventStream.subscribe (self, classof[remotinglifecycleevent])
Shuffleservice.startifenabled ()
WebUi = new Workerwebui (this, Workdir, Webuiport)
Webui.bind ()
Registerwithmaster ()


Metricssystem.registersource (Workersource)
Metricssystem.start ()
Attach The worker Metrics servlet handler to the Web UI after the metrics system is started.
MetricsSystem.getServletHandlers.foreach (Webui.attachhandler)
}
5. Call the Registerwithmaster method to register the started worker with master
Def registerwithmaster () {
Disassociatedevent May is triggered multiple times, so don ' t attempt registration
If there is outstanding registration attempts scheduled.
Registrationretrytimer Match {
Case None =
Registered = False
Tryregisterallmasters ()
Connectionattemptcount = 0
Registrationretrytimer = Some {
Context.system.scheduler.schedule (Initial_registration_retry_interval,
Initial_registration_retry_interval, self, reregisterwithmaster)
}
Case Some (_) + =
Loginfo ("Not spawning another attempt to register with the master, since there are an" +
"Attempt scheduled already.")
}
}
6. Call Tryregisterallmasters to send registered worker messages to master
Private Def tryregisterallmasters () {
For (Masterakkaurl <-masterakkaurls) {
Loginfo ("Connecting to Master" + Masterakkaurl + "...")
Val actor = context.actorselection (Masterakkaurl)
Actor! Registerworker (Workerid, host, port, cores, memory, Webui.boundport, publicaddress)
}
}
7, Master's receivewithlogging received message execution
Case Registerworker (ID, workerhost, workerport, cores, memory, workeruiport, publicaddress) = =
{
Loginfo ("Registering worker%s:%d with%d cores,%s RAM". Format (
Workerhost, Workerport, Cores, utils.megabytestostring (memory)))
if (state = = Recoverystate.standby) {
Ignore, don ' t send response
} else if (Idtoworker.contains (ID)) {
Sender! Registerworkerfailed ("Duplicate worker ID")
} else {
Val worker = new Workerinfo (ID, workerhost, workerport, cores, memory,
sender, Workeruiport, publicaddress)
if (Registerworker (worker)) {
Persistenceengine.addworker (worker)
Sender! Registeredworker (MasterUrl, Masterwebuiurl)
Schedule ()
} else {
Val workeraddress = worker.actor.path.address
Logwarning ("Worker registration failed. Attempted to re-register worker at Same "+
"Address:" + workeraddress)
Sender! Registerworkerfailed ("Attempted to re-register worker at same address:"
+ workeraddress)
}
}
}
8. Failure returns a failure message to the worker, and success returns information about Master
9. Schedule is called after returning the message, but because there is no application, the allocation of resources is not done at this time


The entire spark cluster has started to complete

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Source code analysis initiated by the master and worker clusters of spark

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.