Apache Spark-1.0.0 Code Analysis (ii): Spark initialization

Source: Internet
Author: User

Localwordcount, you need to first create the sparkconf configuration master, appname and other environment parameters, if not set in the program, the system parameters will be read. Then, create the Sparkcontext with sparkconf as a parameter and initialize the spark environment.

New Sparkconf (). Setmaster ("local"). Setappname ("Local Word Count"new sparkcontext (sparkconf)

During initialization, according to the information from the console output, the entire initialization process is done as follows:

Spark. SecurityManager configuration authentication, slf4j. Slf4jlogger start, start remoting monitoring, sparkenv register Mapoutputtracker and Blockmanagermaster,storage. Diskblockmanager Create directory, storage. Memorystore allocate space, network. ConnectionManager bind port, storage. Blockmanagermaster registered Blockmanager,spark. Httpserver boot, server. Abstractconnector Launch RELATED links, broadcast. Httpbroadcast start the broadcast service, Spark. Httpfileserver configuration directory, and finally start Sparkui.

15/07/14 13:20:56 INFO Spark. Securitymanager:changingViewACLS To:kevin15/07/14 13:20:56 INFO Spark. SecurityManager:SecurityManager:authentication Disabled; UI ACLs Disabled; Users withViewPermissionsSet(Kevin) 15/07/14 13:20:58 INFO slf4j.Slf4jlogger:slf4jlogger started15/07/14 13:20:58INFO remoting:starting Remoting15/07/14 13:20:58 INFO remoting:remoting started; Listening onAddresses: [Akka.tcp://[email protected]:50494]15/07/14 13:20:58 INFO remoting:remoting now listens onAddresses: [Akka.tcp://[email protected]:50494]15/07/14 13:20:59 INFO Spark.sparkenv:registering Mapoutputtracker15/07/14 13:20:59 INFO Spark.sparkenv:registering Blockmanagermaster15/07/14 13:20:59 INFO Storage. Diskblockmanager:created Local Directory at C:\Users\Kevin\AppData\Local\Temp\spark-local-20150714132059-E5A315/07/14 13:20:59 INFO Storage. Memorystore:memorystore started with capacity 2.1 GB.15/07/14 13:20:59 INFO Network. Connectionmanager:bound socket to port 50497 with id = Connectionmanagerid (kevin-thinkpad,50497) 15/07/14 13:20:59 INFO St Orage.blockmanagermaster:trying to register Blockmanager15/07/14 13:20:59 INFO Storage. Blockmanagerinfo:registering block manager kevin-thinkpad:50497 with 2.1GB RAM15/07/14 13:20:59 INFO Storage.blockmanagermaster:registered Blockmanager15/07/14 13:20:59 INFO Spark.httpserver:starting HTTP Server15/07/14 13:20:59 INFO Server. server:jetty-8.1.14.v2013103115/07/14 13:20:59 INFO Server. abstractconnector:started [email PROTECTED]:5049815/07/14 13:20:59 INFO broadcast. Httpbroadcast:broadcast server started at HTTP://10.88.121.10:5049815/07/14 13:20:59 INFO Spark. Httpfileserver:http File Server directory is C:\Users\Kevin\AppData\Local\Temp\spark-105cdf2e-8671-4323-af35-1668fd462f5515/07/14 13:20:59 INFO Spark.httpserver:starting HTTP Server15/07/14 13:20:59 INFO Server. server:jetty-8.1.14.v2013103115/07/14 13:20:59 INFO Server. abstractconnector:started [email PROTECTED]:5049915/07/14 13:21:00 INFO server. server:jetty-8.1.14.v2013103115/07/14 13:21:00 INFO Server. abstractconnector:started [Email protected]:404015/07/14 13:21:00 INFO UI. sparkui:started Sparkui at http://Kevin-ThinkPad:4040

Here, the initialization process is complete.

Entering the source code of the Sparkcontext, in the process of Sparkcontext instantiation, the class constructor executes several key statements:

Instantiate the Livelistenerbus and start

Private New Livelistenerbus...listenerbus.start ()

Create Sparkenv

Private [Spark] val env = sparkenv.create (    conf,    "<driver>",    conf.get (" Spark.driver.host "),    conf.get (" Spark.driver.port "). ToInt,    true,     = isLocal,    = listenerbus)  sparkenv.set (env)

Sparkenv includes a number of key components

class sparkenv (    Val executorid:string,    Val actorsystem:actorsystem,    Val Serializer:serializer,    Val Closureserializer:serializer,    Val cachemanager:cachemanager,    Val mapoutputtracker:mapoutputtracker,    Val Shufflefetcher:shufflefetcher,    Val broadcastmanager:broadcastmanager,    Val Blockmanager:blockmanager, Val    Connectionmanager:connectionmanager,    Val securitymanager:securitymanager,    val httpfileserver: Httpfileserver,    Val sparkfilesdir:string,    Val metricssystem:metricssystem,    extends Logging

Start Sparkui and start

Private New Sparkui (this) ui.bind ()

Create the TaskScheduler, and use this as a parameter to try to create the Dagscheduler, and then start TaskScheduler

//Create and start the scheduler  Private[Spark] var TaskScheduler = Sparkcontext.createtaskscheduler ( This, Master) @volatile Private[Spark] var dagscheduler:dagscheduler = _  Try{Dagscheduler=NewDagscheduler ( This)  } Catch {     CaseE:exception =Throw      NewSparkexception ("Dagscheduler cannot is initialized due to%s". Format (e.getmessage)} //start TaskScheduler after TaskScheduler sets Dagscheduler reference in Dagscheduler ' s//ConstructorTaskscheduler.start ()
Create TaskScheduler use Createtaskscheduler to match different master types with regular expressions to create corresponding TaskScheduler and backend
Privatedef createtaskscheduler (Sc:sparkcontext, master:string): TaskScheduler = {    //Regular expression used for local[n] and local[*] Master formatsVal Local_n_regex = "" "local\[([0-9\*]+) \]" "". R//Regular expression for local[n, maxretries], used on tests with failing tasksVal Local_n_failures_regex = "" "local\[([0-9]+) \s*,\s* ([0-9]+] \]" "". R//Regular expression for simulating a Spark cluster of [N, cores, memory] locallyVal Local_cluster_regex = "" "local-cluster\[\s* ([0-9]+) \s*,\s* ([0-9]+] \s*,\s* ([0-9]+) \s*]" "". R//Regular expression for connecting to Spark deploy clustersVal Spark_regex = "" "spark://(. *)" "". R//Regular expression for connection to Mesos cluster by Mesos://or ZK://URLVal Mesos_regex = "" "(MESOS|ZK)://.*" "". R//Regular expression for connection to SIMR clusterVal Simr_regex = "" "simr://(. *)" "". R//When running locally, don ' t try to re-execute tasks on failure.Val max_local_task_failures = 1Master Match { Case"Local" =Val Scheduler=NewTaskschedulerimpl (SC, max_local_task_failures, isLocal =true) Val Backend=NewLocalbackend (Scheduler, 1) Scheduler.initialize (backend) Scheduler CaseLocal_n_regex (threads) =def Localcpucount=Runtime.getRuntime.availableProcessors ()//LOCAL[*] estimates the number of cores on the machine ; Local[n] uses exactly N threads.Val ThreadCount =if(Threads = = "*") LocalcpucountElseThreads.toint Val Scheduler=NewTaskschedulerimpl (SC, max_local_task_failures, isLocal =true) Val Backend=Newlocalbackend (Scheduler, ThreadCount) scheduler.initialize (backend) Scheduler CaseLocal_n_failures_regex (threads, maxfailures) =Val Scheduler=NewTaskschedulerimpl (SC, maxfailures.toint, isLocal =true) Val Backend=Newlocalbackend (Scheduler, Threads.toint) scheduler.initialize (backend) Scheduler CaseSpark_regex (Sparkurl) =Val Scheduler=NewTaskschedulerimpl (SC) Val masterurls= Sparkurl.split (","). Map ("spark://" +_) Val Backend=Newsparkdeployschedulerbackend (Scheduler, SC, masterurls) scheduler.initialize (backend) Scheduler  CaseLocal_cluster_regex (Numslaves, Coresperslave, memoryperslave) =//Check to make sure memory requested <= Memoryperslave. Otherwise Spark would just hang.Val Memoryperslaveint =Memoryperslave.tointif(Sc.executormemory >memoryperslaveint) {          Throw NewSparkexception ("Asked to launch cluster and%d MB Ram/worker but requested%d Mb/worker". Format (Memoryperslaveint, sc.executormemory))} Val Scheduler=NewTaskschedulerimpl (SC) Val localcluster=NewLocalsparkcluster (Numslaves.toint, Coresperslave.toint, Memoryperslaveint) Val masterurls=Localcluster.start () Val backend=Newsparkdeployschedulerbackend (Scheduler, SC, masterurls) scheduler.initialize (backend) Backend.shutdownca Llback= (backend:sparkdeployschedulerbackend) = ={localcluster.stop ()} Scheduler Case"Yarn-standalone" | "Yarn-cluster" =if(Master = "Yarn-standalone") {logwarning ("\" Yarn-standalone\ "is deprecated as of Spark 1.0. Use \ "Yarn-cluster\" instead. ")} Val Scheduler=Try{val Clazz= Class.forName ("Org.apache.spark.scheduler.cluster.YarnClusterScheduler") Val Cons=Clazz.getconstructor (Classof[sparkcontext]) cons.newinstance (SC). Asinstanceof[taskschedulerimpl]} 
    Catch {          //Todo:enumerate The exact reasons why it can fail//but irrespective of it, it means we cannot proceed!           CaseE:exception = {            Throw NewSparkexception ("YARN mode not available?"), E)} } Val Backend=Newcoarsegrainedschedulerbackend (Scheduler, Sc.env.actorSystem) scheduler.initialize (backend) Scheduler  Case"Yarn-client" =Val Scheduler=Try{val Clazz=Class.forName ("Org.apache.spark.scheduler.cluster.YarnClientClusterScheduler") Val Cons=Clazz.getconstructor (Classof[sparkcontext]) cons.newinstance (SC). Asinstanceof[taskschedulerimpl]} 
    Catch {           CaseE:exception = {            Throw NewSparkexception ("YARN mode not available?"), E)} } Val Backend=Try{val Clazz=Class.forName ("Org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend") Val Cons=Clazz.getconstructor (Classof[taskschedulerimpl], Classof[sparkcontext]) cons.newinstance (Scheduler, SC). AsIn Stanceof[coarsegrainedschedulerbackend]}Catch {           CaseE:exception = {            Throw NewSparkexception ("YARN mode not available?"), E)} } scheduler.initialize (Backend) Scheduler CaseMesosurl @ Mesos_regex (_) =Mesosnativelibrary.load () Val Scheduler=NewTaskschedulerimpl (SC) Val coarsegrained= Sc.conf.getBoolean ("Spark.mesos.coarse",false) Val URL= Mesosurl.stripprefix ("mesos://")//strip scheme from raw Mesos URLsVal backend =if(coarsegrained) {Newcoarsemesosschedulerbackend (Scheduler, SC, URL)}Else {          Newmesosschedulerbackend (Scheduler, SC, URL)} scheduler.initialize (backend) Scheduler CaseSimr_regex (Simrurl) =Val Scheduler=NewTaskschedulerimpl (SC) Val backend=Newsimrschedulerbackend (Scheduler, SC, Simrurl) scheduler.initialize (backend) Scheduler Case_ =Throw NewSparkexception ("Could not parse Master URL: '" + Master + "'")    }  }

END

Apache Spark-1.0.0 Code Analysis (ii): Spark initialization

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.