IDE Development Spark Program

Source: Internet
Author: User
Tags scala ide

Idea Eclipse
Download Scala


Scala.msi
Scala environment variable Configuration
(1) Set the Scala-home variable:, click New, enter in the Variable Name column: Scala-home variable Value column input: D:\Program Files\scala is the installation directory of SCALA, depending on the individual situation, if installed on the e-drive, will "D" Change to "E".
(2) Set the PATH variable: Locate "path" under the system variable and click Edit. In the "Variable Value" column, add the following code:%scala_home%\bin;%scala_home%\jre\bin; Note: The following semicolon; don't miss out.
(3) Set classpath variable: Find "classpath" under System variable, click Edit, if not, click "New",
"Variable name": ClassPath "Variable Value":
.; %scala_home%\bin;%scala_home%\lib\dt.jar;%scala_home%\lib\tools.jar.; Note: "Variable value" at the top of the.; Don't miss out. Finally, click OK.

Download Scala Ide,scal-sdk-4.4.1-vfinal-2.11-win32.win32.x86.64.zip


Unzip after download, click Eclipse, Run

First step: Modify the dependent Scala version to Scala 2.10.x (default 2.11.7, to be modified)


Step two: Join the spark 1.6.0 jar file Dependency
Download the spark-corresponding jar package, click 4, Download spark-1.6.1-bin-hadoop2.6.tgz



Download spark, find dependent files in Lib

Step three: Find the dependent spark jar file and import the jar dependencies into eclipse


Fourth step: Build the Spark Project package under SRC

Fifth step: Create the Scala entry class

Sixth step: Change class to object and write main entry method

There are two modes of the development program: local run and cluster run
Modify Font

 PackageCom.testImportOrg.apache.spark.SparkConfImportOrg.apache.spark.SparkContextImportOrg.apache.spark.rdd.RDD Object WordCount {  defMain (args:array[string]) {/** * First step: Create a Configuration object for Spark sparkconf, set the runtime configuration information for the SPARK program, * For example, by Setmaster to set the URL of the master of the Spark Cluster to which the program is connected, * if set to Lo Cal, on behalf of the Spark program, runs locally, especially for beginners with very poor machine configuration conditions * (e.g., only 1G of memory) */    Valconf =NewSparkconf ()//Create sparkconf object, no factory method required because there is only one sparkconf globallyConf.setappname ("Wow,my first spark app")//Set the name of the application, which can be seen in the program's monitoring interfaceConf.setmaster ("Local")//This time the program is running locally without the need to install the spark cluster    /** * Step two: Create a Sparkcontext Object * Sparkcontext is the only entry for all the functions of the spark program, including Scala, Java, Python, R, and so on must have a * sparkcontext * Sparkcontext Core role: Initialize the core components required for the spark application to run, including the dagscheduler,taskscheduler,schedulerbacked, * also responsible for the spark program to the master registration process Sparkcontext is one of the most critical objects in the entire SPARK application * *    ValSc=NewSparkcontext (CONF)//Create a Spackcontext object to customize the configuration information for the specific parameters of the spark run by passing in the sparkconf instance    /** * Step three: Create an RDD by sparkcontext based on a specific data source (HDFS,HBASE,LOCAL,FILESYSTEM,DB,S3) * The creation of the RDD is basically three ways, (1) based on external data sources (such as HDFS ) (2) According to the Scala collection (3) the data from the other Rdd is divided into a series of partitions, and the data assigned to each partition belongs to the processing category of a task */    //Read local file and set to a partition    ValLines=sc.textfile ("D://spark-1.6.1-bin-hadoop2.6//readme.md",1)//The first parameter is the local file path, the second parameter minpartitions is the minimum degree of parallelism, here is set to 1 //type inference, or you can write the following way  //Val lines:rdd[string] =sc.textfile ("d://spark-1.6.1-bin-hadoop2.6//readme.md", 1)     /** * Fourth: Transformation-level processing of the initial rdd, such as high-order function * programming such as Map,filter. For specific data calculations * Step 4.1: Split the string of each line into a single word */    //The Word splits the string of each row and merges the results of all rows into a large collection by flat    ValWords = lines.flatmap {line = Line.split (" ") }/** * Step 4.2 on the basis of word splitting, count 1 for each word instance, i.e. word=> (word,1) tuple */    ValPairs = Words.map {word = + (Word,1) }/** * Step 4.3 counts The total number of occurrences of each word in the text based on each word instance count of 1 * /    //Add value to the same key (including reduce at the local and reduce levels)    Valwordcounts = Pairs.reducebykey (_+_)//Print resultsWordcounts.foreach (Wordnumberpair = println (wordnumberpair._1 +":"+wordnumberpair._2))//Release ResourcesSc.stop ()}}

Run as->scala application
Run results

 PackageCom.testImportOrg.apache.spark.SparkConfImportOrg.apache.spark.SparkContextImportOrg.apache.spark.rdd.RDD Object wordcountcluster {  defMain (args:array[string]) {/** * First step: Create a Configuration object for Spark sparkconf, set the runtime configuration information for the SPARK program, * For example, by Setmaster to set the URL of the master of the Spark Cluster to which the program is connected, * if set to Lo Cal, on behalf of the Spark program, runs locally, especially for beginners with very poor machine configuration conditions * (e.g., only 1G of memory) */    Valconf =NewSparkconf ()//Create sparkconf object, no factory method required because there is only one sparkconf globallyConf.setappname ("Wow,my first spark app")//Set the name of the application, which can be seen in the program's monitoring interface   //Conf.setmaster ("spark://master:7077")//This time the program is in the spark cluster    /** * Step two: Create a Sparkcontext Object * Sparkcontext is the only entry for all the functions of the spark program, including Scala, Java, Python, R, and so on must have a * sparkcontext * Sparkcontext Core role: Initialize the core components required for the spark application to run, including the dagscheduler,taskscheduler,schedulerbacked, * also responsible for the spark program to the master registration process Sparkcontext is one of the most critical objects in the entire SPARK application * *    ValSc=NewSparkcontext (CONF)//Create a Spackcontext object to customize the configuration information for the specific parameters of the spark run by passing in the sparkconf instance    /** * Step three: Create an RDD by sparkcontext based on a specific data source (HDFS,HBASE,LOCAL,FILESYSTEM,DB,S3) * The creation of the RDD is basically three ways, (1) based on external data sources (such as HDFS ) (2) According to the Scala collection (3) the data from the other Rdd is divided into a series of partitions, and the data assigned to each partition belongs to the processing category of a task */    //Read the HDFs file and cut it into different partition   ValLines=sc.textfile ("Hdfs://master:9000/library/wordcount/input/data")//type inference, or you can write the following way  //Val lines:rdd[string] =sc.textfile ("d://spark-1.6.1-bin-hadoop2.6//readme.md", 1)     /** * Fourth: Transformation-level processing of the initial rdd, such as high-order function * programming such as Map,filter. For specific data calculations * Step 4.1: Split the string of each line into a single word */    //The Word splits the string of each row and merges the results of all rows into a large collection by flat    ValWords = lines.flatmap {line = Line.split (" ") }/** * Step 4.2 on the basis of word splitting, count 1 for each word instance, i.e. word=> (word,1) tuple */    ValPairs = Words.map {word = + (Word,1) }/** * Step 4.3 counts The total number of occurrences of each word in the text based on each word instance count of 1 * /    //Add value to the same key (including reduce at the local and reduce levels)    Valwordcounts = Pairs.reducebykey (_+_)//Print resultsWordcounts.foreach (Wordnumberpair = println (wordnumberpair._1 +":"+wordnumberpair._2))//Release ResourcesSc.stop ()}}

Packing
Right-click, Export,java, Jar File

chmod 777 wordcount.sh
./wordcount.sh

Cd/opt/cloudera/parcels/cdh-5.6.0-1.cdh5.6.0.p0.45/lib/spark/bin

IDE Development Spark Program

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.