Remote debug, especially in the cluster mode, it is very convenient to understand how the code runs, which is also the way the code farmers prefer
Although Scala's syntax is different from Java, Scala is running on a JVM virtual machine, that is, Scala is finally compiled into bytecode to run on the JVM, so remote debugging is how the JVM is debugged
On the server side:
The client can debug the code remotely with the socket
1. Debug Submit, Master, worker Code 1.1 submit debug Client run submit, not described here, usually spark's use cases are
Spark-submit
Submit a Spark task
Its essence is similar to the following command
/usr/java/jdk1.8.0_111/bin/java-cp/work/spark-2.1.0-bin-hadoop2.7/conf/:/work/spark-2.1.0-bin-hadoop2.7/jars/* -XDEBUG-XRUNJDWP:SERVER=Y,TRANSPORT=DT_SOCKET,ADDRESS=7000,SUSPEND=Y-XMX1G Org.apache.spark.deploy.SparkSubmit- -master spark://raintungmaster:7077--class rfcexample--jars/work/spark-2.1.0-bin-hadoop2.7/examples/jars/scopt_ 2.11-3.3.0.jar,/work/spark-2.1.0-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.1.0.jar/tmp/ Machinelearning.jar
Call the Sparksubmit class to submit the task, debug the parameters directly to the top add just
1.2 Master, Setup debugging of the worker
Export spark_worker_opts= "-xdebug-xrunjdwp:server=y,transport=dt_socket,address=8000,suspend=n" Export SPARK_ Master_opts= "-xdebug-xrunjdwp:server=y,transport=dt_socket,address=8001,suspend=n"
Setting the environment variable on startup is possible.
2. Debug Executor Code Discovery set Woker environment parameters, but do not always debug in the spark executor run code, since executor is running on the worker, of course, can be remote debug, but why executor can not debug it?
3. Spark Standalone's cluster scheduling since executor can not debug, we need to put the submit, master, worker scheduling relationship to clarify 3.1 submit The commit task has just described that the submit actually initializes the Sparksubmit class, and the Runmain method is called in the Main method of Sparksubmit.
try { mainmethod.invoke (null, Childargs.toarray) } catch {case t:throwable = findcause (t) match { Case sparkuserappexception (exitCode) = System.exit (exitCode) case t:throwable = throw t } }
And the core is called the Main method of the class we submitted, in the example above is the parameter
--class Rfcexample
Called the Main method of the Rfcexample
Usually we are writing Spark's method of running the class that initializes the spark context
Val sc = new Sparkcontext (conf)
Sparkcontext initiates a task when initializing
Create and start the scheduler val (sched, ts) = Sparkcontext.createtaskscheduler (this, master, Deploymode) _s Chedulerbackend = sched _taskscheduler = ts _dagscheduler = new Dagscheduler (this) _heartbeatreceiver.ask [Boolean] (Taskschedulerisset) Start TaskScheduler after TaskScheduler sets Dagscheduler reference in Dagscheduler ' s //constructor _ Taskscheduler.start ()
In standalone mode, the last call isthe Start method of Standaloneschedulerbackend.scala
Val Appdesc = new Applicationdescription (Sc.appname, Maxcores, sc.executormemory, command, appuiaddress, Sc.eventlogdir, Sc.eventlogcodec, Coresperexecutor, initialexecutorlimit) client = new Standaloneappclient ( SC.ENV.RPCENV, Masters, Appdesc, this, conf) Client.start () launcherbackend.setstate ( SparkAppHandle.State.SUBMITTED) waitforregistration () launcherbackend.setstate ( SparkAppHandle.State.RUNNING)
Build the application descriptor and start a standaloneappclient to connect master
3.2 Master Assignment Task submit creates a client that constructs a application description, registers application to master, Dispatcher distribution message in master will receive RegisterApplication message
Case RegisterApplication (description, driver) = //TODO Prevent repeated registrations from some driver if (s Tate = = Recoverystate.standby) { //Ignore, don ' t send response } else { loginfo ("registering app" + descript Ion.name) val app = createapplication (description, driver) registerapplication (APP) Loginfo (" Registered App "+ Description.name +" with ID "+ app.id) persistenceengine.addapplication (APP) Driver.send (Reg Isteredapplication (App.id, self)) schedule () }
Create a new application ID, register this application, a application can only bind a client port, the same client's Ip:port can only register a application, In schedule, a valid worker assignment is performed by computing application's memory, core requirements Executor
Private Def launchexecutor (Worker:workerinfo, exec:executordesc): Unit = { Loginfo ("Launching Executor" + Exec.full Id + "on worker" + worker.id) worker.addexecutor (exec) worker.endpoint.send (Launchexecutor (MasterUrl, Exec.application.id, Exec.id, Exec.application.desc, Exec.cores, exec.memory)) Exec.application.driver.send ( executoradded (exec.id, Worker.id, Worker.hostport, Exec.cores, exec.memory)) }
The endpoint of the worker sends a launchexecutor serialization message
3.3 Worker Assignment Task dispatcher received Launchexecutor message in Worker.scala
Case Launchexecutor (MasterUrl, AppId, Execid, Appdesc, Cores_, Memory_) = if (masterurl! = Activemasterurl) { Logwarning ("Invalid Master (" + MasterUrl + ") attempted to launch executor.")} else {try {loginfo ("asked to launch executor%s/%d for%s". Format (AppId, Execid, Appdesc.name)) Create The executor ' s working directory val executordir = new File (workdir, appId + "/" + execid) if (!executordir.mkdirs ()) {throw new IOException ("Failed to create directory" + Executordir)}//Create local dirs F or the executor. These is passed to the executor via the//SPARK_EXECUTOR_DIRS environment variable, and deleted by the Worker W Hen the//application finishes. Val applocaldirs = Appdirectories.getorelse (AppId, Utils.getorcreatelocalrootdirs (conf). map {dir = Val appdir = utils.createdirectory (dir, nameprefix = "executor") Utils.chmod700 (Appdir) Appdir.getabsolutepath ()}.toseq) appdirectories (appId) = Applocald IRS val Manager = new Executorrunner (appId, Execid, appdesc.copy (Command = Work Er.maybeupdatesslsettings (Appdesc.command, conf)), Cores_, Memory_, self, work Erid, Host, Webui.boundport, Publicaddress, Sparkhome, Executordir, Workeruri, conf, Applocaldirs, executorstate.running) executors (appId + "/" + E XECID) = Manager Manager.start () coresused + = Cores_ memoryused + = Memory_ sendtomaste R (executorstatechanged (AppId, Execid, Manager.state, none, none))} catch {case e:exception = LogError (S "Failed to launch executor $appId/$execId for ${appdesc.name}.", E) if (Executors.contains (appId + "/" + Execid)) { Executors (appId + "/" + Execid). Kill () Executors-= AppId + "/" + Execid} Sen Dtomaster (executorstatechanged (AppId, Execid, executorstate.failed, Some (e.tostring), None)}}
created a working directory and started the Executorrunner
Private[worker] def start () { workerthread = new Thread ("Executorrunner for" + fullid) { override def run () {FET Chandrunexecutor ()} } Workerthread.start () //Shutdown hook that kills actors on Shutdown. Shutdownhook = Shutdownhookmanager.addshutdownhook {() = //It ' s possible that we arrive here before calling ' Fet Chandrunexecutor ', then ' state ' would //be ' executorstate.running '. In this case, we should set "state" to ' FAILED '. if (state = = executorstate.running) {state = executorstate.failed } killprocess (Some ("Worker shutting Down "))} }
In the Executorrunner.scala Start method, start the thread executorrunner for XXX, run the executor, is the method in application is running in this thread?
Private Def fetchandrunexecutor () { try { //Launch The process val builder = Commandutils.buildprocessbuilder (Appdesc.command, New SecurityManager (conf), memory, Sparkhome.getabsolutepath, Substitutevariables) val command = Builder.command () val Formattedcommand = Command.asScala.mkString ("\" "," \ "\" "," \ "") ..... Process = Builder.start () ... Val ExitCode = process.waitfor () state = executorstate.exited val message = "Command EXITED with code" + Exitco De worker.send (executorstatechanged (appId, Execid, State, Some (message), Some (ExitCode))) } catch { ...... } }
In the method of looking at Fetchandrunexecutor, we see Builder.start, which is a processbuilder, that is, the current thread initiates a
child process Run command
That's why we can't debug executor with the debug worker, because this is another process
4. Debugging executor process We just followed the code all the way, found that the master received the RegisterApplication message to the dispatch worker's Launchexecutor message, and did not process the message, The last child process Run command is obtained from applicationdescription commands, and we also know that Applicationdescription is 3.1 kinds of submit created, then go back tothe Start method of Standaloneschedulerbackend.scala
Val driverurl = rpcendpointaddress (Sc.conf.get ("Spark.driver.host"), Sc.conf.get ("Spark.driver.port"). ToInt, Coarsegrainedschedulerbackend.endpoint_name). toString val args = Seq ("--driver-url", Driverurl, "--execut Or-id "," {{executor_id}} ","--hostname "," {{hostname}} ","--cores "," {{cores}} ","--app-id "," {{app_id}} ", "--worker-url", "{{Worker_url}}") Val extrajavaopts = sc.conf.getOption ("Spark.executor.extraJavaOptions"). M AP (utils.splitcommandstring). Getorelse (seq.empty) Val classpathentries = Sc.conf.getOption (" Spark.executor.extraClassPath "). Map (_.split (java.io.File.pathSeparator). toseq). Getorelse (Nil) Val librarypathent Ries = Sc.conf.getOption ("Spark.executor.extraLibraryPath"). Map (_.split (java.io.File.pathSeparator). toseq). Getorelse (Nil)//When testing, expose the parent class path to the child. This was processed by//Compute-classpath. {Cmd,sh} and makes all needed jars available to child procesSES//When the assembly are built with the "*-provided" profiles enabled. Val Testingclasspath = if (Sys.props.contains ("spark.testing")) {Sys.props ("Java.class.path"). Split (JAVA.IO.F ile.pathseparator). Toseq} else {Nil}//Start executors with a few necessary configs for Registerin G with the scheduler Val sparkjavaopts = utils.sparkjavaopts (conf, sparkconf.isexecutorstartupconf) Val javaopts = s Parkjavaopts + + extrajavaopts
We see that executor's Java parameters are controlled in javaopts, which is
Val extrajavaopts = sc.conf.getOption ("Spark.executor.extraJavaOptions")
It turns out that the parameter Spark.executor.extraJavaOptions control, in turn, turns the spark document, though a little late
Spark.executor.extraJavaOptions (None) |
A string of extra JVM options to pass to executors. For instance, GC settings or other logging. Note that it was illegal to set the Spark properties or maximum heap size (-XMX) settings with this option. Spark properties should is set using a Sparkconf object or the spark-defaults.conf file used with the Spark-submit script. Maximum Heap Size settings can set with Spark.executor.memory. |
In this document, we can set the JVM parameters by setting conf to Spark_submit executor
--conf "Spark.executor.extrajavaoptions=-xdebug-xrunjdwp:server=y,transport=dt_socket,address=7001,suspend=y"
The entire run of the submit process is
/usr/java/jdk1.8.0_111/bin/java-cp/work/spark-2.1.0-bin-hadoop2.7/conf/:/work/spark-2.1.0-bin-hadoop2.7/jars/* -XDEBUG-XRUNJDWP:SERVER=Y,TRANSPORT=DT_SOCKET,ADDRESS=7000,SUSPEND=Y-XMX1G Org.apache.spark.deploy.SparkSubmit- -master spark://raintungmaster:7077--class rfcexample--jars/work/spark-2.1.0-bin-hadoop2.7/examples/jars/scopt_ 2.11-3.3.0.jar,/work/spark-2.1.0-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.1.0.jar--conf " Spark.executor.extrajavaoptions=-xdebug-xrunjdwp:server=y,transport=dt_socket,address=7001,suspend=y "/tmp/ Machinelearning.jar
Note: If your worker cannot afford multiple executor, the listening port can only be one on a machine.
Big Data: Spark Standalone cluster scheduling (i) Start with remote debugging and say application create