When submitting a job with Spark-submit, the JAR program packaged with the SBT package can run well in client mode, when in cluster mode,
Always error: Exception in thread "main" java.lang.ClassNotFoundException. decided to use the SBT assembly plugin to make all dependencies into a jar.
My Engineering Structure:
Myproject/build.sbt
Myproject/project/assembly.sbt
myproject/src/main/scala/Com/lasclocker/java/sparkgopprocess.java
The brown part above is the package name of the Java source program.
Content of BUILD.SBT:
Lazy val root = (ProjectinchFile".")). Settings (Name: ="My-project", Version: ="1.0", scalaversion: ="2.11.7", MainClassinchCompile: = Some ("com.lasclocker.java.SparkGopProcess")//Here is the main class name) Librarydependencies + ="Org.apache.spark"%%"Spark-core"%"1.4.1"%"provided"//Here is dependent on the Spark-coreunmanagedbase: = Basedirectory.value/"Custom_spark_lib"//This is a third-party dependency package, which I placed directly under MyProject's Custom_spark_lib directory//Meta-inf discardingmergestrategyinchAssembly <<= (MergestrategyinchAssembly) {(old) = {case PathList ("Meta-inf", xs @ _*) = Mergestrategy.discard case x = Mergestrategy.first}}
The jar packages in the Custom_spark_lib directory are: Guava-10.0.1.jar, Hadoopcustominputformat.jar.
Content of ASSEMBLY.SBT:
Addsbtplugin ("com.eed3si9n""sbt-assembly " " 0.12.0")
Under the MyProject directory, execute:
SBT Clean Assembly
Finally build a fat jar package: Target/scala-2.11/my-project-assembly-1.0.jar.
Finally, I enclose my spark-submit cluster mode shell script (the IP place in the script is XX):
inpath=/lproutpath=/outputminpartitionnum=4sparkurl=spark://xx . xx.xx.xx:7077Hdfsfile=hdfs://xx. Xx.xx.xx:9000/user/rootldlib=/opt/hadoop/Lib # Here are some dynamic libraries, such as the. so file in Jni spark-Submit --class ${yourappclass }-- Master ${sparkurl} --driver-library-path $ldLib --deploy-mode cluster $ Hdfsfile/my-project-assembly-1.0. Jar $inPath $outPath $minPartitionNum
Reference: sbt-assembly, how to build a Uber jar (Fat JAR) using SBT within IntelliJ idea?
SBT assembly a fat jar for spark-submit cluster model