Spark Configuration (5)-Standalone application

Source: Internet
Author: User

Standalone application (self-contained applications)

Now, based on a simple app, write a standalone application through the Spark API.

Programs written in Scala need to be compiled and packaged using SBT, and Java programs are packaged using Maven, and Python programs are submitted directly via Spark-submit.

PS: As if the spark2.0 supports a data set (DataSets) other than the RDD, the performance of Python processing is greatly improved, almost comparable to Scala performance.

 
   
  
  1. cd ~           # 进入用户主文件夹
  2. mkdir ./sparkapp        # 创建应用程序根目录
  3. mkdir -p ./sparkapp/src/main/scala     # 创建所需的文件夹结构

Create a file named Simpleapp.scala under/sparkapp/src/main/scala:

  
 
  1. /* SimpleApp.scala */
  2. import org.apache.spark.SparkContext
  3. import org.apache.spark.SparkContext._
  4. import org.apache.spark.SparkConf
  5. object SimpleApp {
  6.  def main(args: Array[String]) {
  7.    val logFile = "file:///usr/local/spark/README.md" // Should be some file on your system
  8.    val conf = new SparkConf().setAppName("Simple Application")
  9.    val sc = new SparkContext(conf)
  10.    val logData = sc.textFile(logFile, 2).cache()
  11.    val numAs = logData.filter(line => line.contains("a")).count()
  12.    val numBs = logData.filter(line => line.contains("b")).count()
  13.    println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
  14.  }
  15. }

The program calculates the number of rows in the/usr/local/spark/readme file that contain "a" and the number of rows that contain "B".

The program relies on the Spark API, so we need to compile and package through SBT.

 
   
  
  1. vim ./sparkapp/simple.sbt

Add to:

  
 
  1. name := "Simple Project"
  2. version := "1.0"
  3. scalaVersion := "2.10.5"
  4. libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.1"

File SIMPLE.SBT need to indicate the version of Spark and Scala.

When you start the Spark shell, you see


Installing SBT

 
   
  
  1. sudo mkdir /usr/local/sbt
  2. sudo chown -R hadoop /usr/local/sbt    
  3. cd /usr/local/sbt

  
 
    1. cp /home/yuan/Downloads/sbt-launch\ \(1\).jar /usr/local/sbt/sbt-launch.jar

    2. chmod u+x ./sbt


 
   
  
  1. ./sbt sbt-version





From for notes (Wiz)

Spark Configuration (5)-Standalone application

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.