This article mainly describes the process of starting an application main class from a Bin/spark-submit script to a Sparksubmit class in standalone mode.
1 Calling Flowchart
2 Startup Scripts
2.1 Bin/spark-submit
# for Client mode, the driver'll be launched in the same JVM, this launches
# Sparksubmit, so we'll need to read the Properties file for any extra Class
# paths, library paths, Java options and memory early on. Otherwise, it'll # is too late by the time the
driver JVM has started.
if [["$SPARK _submit_deploy_mode" = "Client" &&-F "$SPARK _submit_properties_file"]]; Then
# Parse The properties file only if the special configs exist contains_special_configs=$
(grep-e
"Spark. Driver.extra*\|spark.driver.memory "" $SPARK _submit_properties_file | \
grep-v "^[[:space:]]*#"
)
if [-N "$contains _special_configs"]; then
export Spark_submit_ Bootstrap_driver=1
fi
fi
exec "$SPARK _home"/bin/spark-class org.apache.spark.deploy.SparkSubmit "$ {orig_args[@]} "
(1) In the client mode, and the SPARK property configuration file contains spark.driver* configuration information, then export spark_submit_bootstrap_driver=1; (2) Call Bin/spark-class;
2.2 Bin/spark-class
# in Spark submit client mode, the driver are launched in the same JVM as Spark submit itself. # Here we are must parse the properties file for relevant "spark.driver.*" configs before # the launching JVM driver. Instead of handling this complexity in Bash, we launch a separate JVM # to prepare the launch environment of this driver J VM. If [-n "$SPARK _submit_bootstrap_driver"]; Then # This are used only if the properties file actually contains these special Configs # Export the Environment Varia
Bles needed by Sparksubmitdriverbootstrapper export RUNNER export CLASSPATH export java_opts export Our_java_mem Export Spark_class=1 Shift # Ignore main CLASS (org.apache.spark.deploy.SparkSubmit) and use our own exec "$RUNNER" O Rg.apache.spark.deploy.SparkSubmitDriverBootstrapper "$@" Else # note:the format of this command are closely echoed in S Parksubmitdriverbootstrapper.scala if [-N "$SPARK _print_launch_command"];
Then Echo-n "Spark Command:" 1>&2 echo "$RUNNER"-CP "$CLASSPATH" $JAVA _opts "$@" 1>&2 echo-e "========================================\n" 1&G t;&2 fi exec "$RUNNER"-CP "$CLASSPATH" $JAVA _opts "$@" fi
(1) If the $spark_submit_bootstrap_driver is not empty, remove the sparksubmit from the argument list and start the Sparksubmitdriverbootstrapper ; Sparksubmitdriverbootstrapper will eventually call Sparksubmit; (2) direct start of sparksubmit;
3 How the application main class startsThe application main class has three startup methods: (1) A sparksubmit is started directly, (2) A new process driverwrapper is created on the worker node, which launches the application main class; (3) through standalonerestclient to the Standalonerestserver message, followed by the steps in the 2nd mode (this way is spark1.3 added).
4 Sparksubmit
4.1 main
def main (args:array[string]): unit = {
val Appargs = new sparksubmitarguments (args)
if (appargs.verbose) {
P Rintstream.println (Appargs)
}
appargs.action match {case
sparksubmitaction.submit => SUBMIT (Appargs Case
Sparksubmitaction.kill => KILL (appargs) case
Sparksubmitaction.request_status => RequestStatus (Appargs)
}
}
(1) Create the Sparksubmitarguments object and parse the parameters to initialize the object member; (2) Only the submit process is analyzed here.
4.2 sparksubmitargumentsThis class encapsulates the spark parameter.
Set parameters from command line arguments
parseOpts (args.tolist)
//Populate ' sparkproperties ' map from Prope Rties file
mergedefaultsparkproperties ()
//use ' sparkproperties ' map along with Env VARs to fill into any missing p Arameters
loadenvironmentarguments ()
validatearguments ()
(1) Parse command-line arguments, (2) The parameters configured in the Merged property file (default spark-defaults.conf), (3) The Loadenvironmentarguments method sets the action parameter, and the default is submit;
Action should be SUBMIT unless otherwise specified
action = Option (action). Getorelse (Submit)
(4) Verification parameters, different action has different methods of verification;
/** ensure that required fields exists. Call this is once all defaults are loaded. *
Private Def validatearguments (): unit = {
Action match {case
SUBMIT => validatesubmitarguments ()
Case KILL => validatekillarguments () case
request_status => validatestatusrequestarguments ()
}
}
4.3 spark attribute parameter precedenceSpark provides a variety of ways to set properties (priority from top to bottom): (1) Set the property parameter in the application, (2) set the parameter on the command line, (3) set the parameter in the configuration file (default is spark-defaults.conf), and (4) The default property value provided by Spark.
4.4 Sparksubmit.submit
Val (Childargs, Childclasspath, sysprops, Childmainclass) = Preparesubmitenvironment (args) def dorunmain (): unit = {if (Args.proxyuser!= null) {...} else {runmain (Childargs, Childclasspath, Sysprops, C Hildmainclass, Args.verbose)}//in standalone cluster mode there are two submission gateways:// (1) The traditional Akka gateway using o.a.s.deploy.client as a wrapper//(2) The new rest-based gateway Introdu CED in Spark 1.3//The latter is the default behavior as of Spark 1.3, but Spark submit would fail over//to us
E The Legacy Gateway if the master endpoint turns out to being not a REST server. if (Args.isstandalonecluster && args.userest) {try {printstream.println ("Running Spark using the R
EST application submission protocol. ") Dorunmain ()} catch {//Fail The Legacy Submission Gateway case E:submitrestconnection
Exception => Printwarning (S "Master endpoint ${args.master} is not a REST server.
"+" falling back to Legacy submission gateway instead. ") Args.userest = False Submit (args)}//In all other modes, just run the main class as prepared} E LSE {Dorunmain ()}
(1) Preparesubmitenvironment One of the responsibilities is to set up Childmainclass, which determines how the application main class is invoked, (2) calls the Dorunmain internal method, which calls the Runmain method.
4.4.1 Sparksubmit.preparesubmitenvironment(0) Set the application deployment mode:
Set the Deploy mode; Default is client mode
var deploymode:int = args.deploymode Match {case
"client" | null => client case
"C" Luster "=> CLUSTER case
_ => printerrorandexit (" Deploy mode must is either client or CLUSTER ");-1
}
(1) Set the Childmainclass to the application main class name:
In client mode, launch the application main class directly
//In addition, add the main application jar and any add Ed Jars (if any) to the Classpath
if (Deploymode = = CLIENT) {
Childmainclass = Args.mainclass
if (Isuserjar (AR Gs.primaryresource)) {
Childclasspath + = Args.primaryresource
}
if (args.jars!= null) {Childclasspath ++= Args.jars.split (",")}
if (Args.childargs!= null) {Childargs ++= Args.childargs}
}
(2) Set Childmainclass to Standalonerestclient or client:
/in standalone cluster mode, with the REST client to submit the application (Spark 1.3+).
All Spark parameters are expected to is passed to the client through System properties. if (args.isstandalonecluster) {if (args.userest) {childmainclass = "Org.apache.spark.deploy.rest.Standalon Erestclient "Childargs + = (Args.primaryresource, Args.mainclass)} else {//in Legacy standalone CL
Uster mode, use Client as a wrapper around the user class Childmainclass = "Org.apache.spark.deploy.Client" if (args.supervise) {Childargs = = "--supervise"} Option (Args.drivermemory). foreach {m => Childargs + = ("- Memory ", M)} Option (Args.drivercores). foreach {c => Childargs + = ("--cores ", c)} Childargs =" Launc
H "Childargs + = (args.master, Args.primaryresource, Args.mainclass)} if (Args.childargs!= null) { Childargs ++= Args.childargs}}
(3) Set the Childmainclass to Org.apache.spark.deploy.yarn.Client:
In yarn-cluster mode, use yarn. Client as a wrapper around the user class
if (isyarncluster) {
Childmainclass = "Org.apache.spark.deploy.yarn.Cli Ent "
...
4.4.2 Sparksubmit.runmain(0) method signature:
Private def runmain (
childargs:seq[string],
childclasspath:seq[string],
sysprops:map[string, String],
childmainclass:string,
verbose:boolean): unit
(1) The verbose parameter can be set on the command line to output the Runmain parameter to the console:
if (verbose) {
printstream.println (S "Main class:\n$childmainclass")
printstream.println (s) arguments:\n${ Childargs.mkstring ("\ n")} ")
Printstream.println (S" System properties:\n${sysprops.mkstring ("\ n")} ")
Printstream.println (S "Classpath elements:\n${childclasspath.mkstring (" \ n ")}")
printstream.println ("\ n")
}
(2) Load jar
for (jar <-childclasspath) {
Addjartoclasspath (jar, loader)
}
(3) Set the Spark property parameter to the System properties (many places take parameters from the system property):
For ((key, value) <-sysprops) {
system.setproperty (key, value)
}
(4) Create a Childmainclass class object:
try {
MainClass = Class.forName (Childmainclass, True, loader)
} catch {
...
}
(5) Get the Main method object:
Val Mainmethod = Mainclass.getmethod ("main", New Array[string] (0). GetClass)
(6) Invoke the Main method:
try {
mainmethod.invoke (null, Childargs.toarray)
} catch {case
t:throwable =>
throw Findcause (t) c6/>}
By this, the Childmainclass class set by the Preparesubmitenvironment method has been called. Childmainclass value is: Application main class name; org.apache.spark.deploy.rest.StandaloneRestClient;
Org.apache.spark.deploy.Client;
Org.apache.spark.deploy.yarn.Client. The next step is to start the driverwrapper process through master, worker, and then start the application main class.