Standalone application (self-contained applications)
Now, based on a simple app, write a standalone application through the Spark API.
Programs written in Scala need to be compiled and packaged using SBT, and Java programs are packaged using Maven, and Python programs are submitted directly via Spark-submit.
PS: As if the spark2.0 supports a data set (DataSets) other than the RDD, the performance of Python processing is greatly improved, almost comparable to Scala performance.
cd ~ # 进入用户主文件夹
mkdir ./sparkapp # 创建应用程序根目录
mkdir -p ./sparkapp/src/main/scala # 创建所需的文件夹结构
Create a file named Simpleapp.scala under/sparkapp/src/main/scala:
/* SimpleApp.scala */
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
object SimpleApp {
def main(args: Array[String]) {
val logFile = "file:///usr/local/spark/README.md" // Should be some file on your system
val conf = new SparkConf().setAppName("Simple Application")
val sc = new SparkContext(conf)
val logData = sc.textFile(logFile, 2).cache()
val numAs = logData.filter(line => line.contains("a")).count()
val numBs = logData.filter(line => line.contains("b")).count()
println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
}
}
The program calculates the number of rows in the/usr/local/spark/readme file that contain "a" and the number of rows that contain "B".
The program relies on the Spark API, so we need to compile and package through SBT.
vim ./sparkapp/simple.sbt
Add to:
name := "Simple Project"
version := "1.0"
scalaVersion := "2.10.5"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.1"
File SIMPLE.SBT need to indicate the version of Spark and Scala.
When you start the Spark shell, you see
Installing SBT
sudo mkdir /usr/local/sbt
sudo chown -R hadoop /usr/local/sbt
cd /usr/local/sbt
cp /home/yuan/Downloads/sbt-launch\ \(1\).jar /usr/local/sbt/sbt-launch.jar
chmod u+x ./sbt
./sbt sbt-version
From for notes (Wiz)
Spark Configuration (5)-Standalone application