Idea Eclipse
Download Scala
Scala.msi
Scala environment variable Configuration
(1) Set the Scala-home variable:, click New, enter in the Variable Name column: Scala-home variable Value column input: D:\Program Files\scala is the installation directory of SCALA, depending on the individual situation, if installed on the e-drive, will "D" Change to "E".
(2) Set the PATH variable: Locate "path" under the system variable and click Edit. In the "Variable Value" column, add the following code:%scala_home%\bin;%scala_home%\jre\bin; Note: The following semicolon; don't miss out.
(3) Set classpath variable: Find "classpath" under System variable, click Edit, if not, click "New",
"Variable name": ClassPath "Variable Value":
.; %scala_home%\bin;%scala_home%\lib\dt.jar;%scala_home%\lib\tools.jar.; Note: "Variable value" at the top of the.; Don't miss out. Finally, click OK.
Download Scala Ide,scal-sdk-4.4.1-vfinal-2.11-win32.win32.x86.64.zip
Unzip after download, click Eclipse, Run
First step: Modify the dependent Scala version to Scala 2.10.x (default 2.11.7, to be modified)
Step two: Join the spark 1.6.0 jar file Dependency
Download the spark-corresponding jar package, click 4, Download spark-1.6.1-bin-hadoop2.6.tgz
Download spark, find dependent files in Lib
Step three: Find the dependent spark jar file and import the jar dependencies into eclipse
Fourth step: Build the Spark Project package under SRC
Fifth step: Create the Scala entry class
Sixth step: Change class to object and write main entry method
There are two modes of the development program: local run and cluster run
Modify Font
PackageCom.testImportOrg.apache.spark.SparkConfImportOrg.apache.spark.SparkContextImportOrg.apache.spark.rdd.RDD Object WordCount { defMain (args:array[string]) {/** * First step: Create a Configuration object for Spark sparkconf, set the runtime configuration information for the SPARK program, * For example, by Setmaster to set the URL of the master of the Spark Cluster to which the program is connected, * if set to Lo Cal, on behalf of the Spark program, runs locally, especially for beginners with very poor machine configuration conditions * (e.g., only 1G of memory) */ Valconf =NewSparkconf ()//Create sparkconf object, no factory method required because there is only one sparkconf globallyConf.setappname ("Wow,my first spark app")//Set the name of the application, which can be seen in the program's monitoring interfaceConf.setmaster ("Local")//This time the program is running locally without the need to install the spark cluster /** * Step two: Create a Sparkcontext Object * Sparkcontext is the only entry for all the functions of the spark program, including Scala, Java, Python, R, and so on must have a * sparkcontext * Sparkcontext Core role: Initialize the core components required for the spark application to run, including the dagscheduler,taskscheduler,schedulerbacked, * also responsible for the spark program to the master registration process Sparkcontext is one of the most critical objects in the entire SPARK application * * ValSc=NewSparkcontext (CONF)//Create a Spackcontext object to customize the configuration information for the specific parameters of the spark run by passing in the sparkconf instance /** * Step three: Create an RDD by sparkcontext based on a specific data source (HDFS,HBASE,LOCAL,FILESYSTEM,DB,S3) * The creation of the RDD is basically three ways, (1) based on external data sources (such as HDFS ) (2) According to the Scala collection (3) the data from the other Rdd is divided into a series of partitions, and the data assigned to each partition belongs to the processing category of a task */ //Read local file and set to a partition ValLines=sc.textfile ("D://spark-1.6.1-bin-hadoop2.6//readme.md",1)//The first parameter is the local file path, the second parameter minpartitions is the minimum degree of parallelism, here is set to 1 //type inference, or you can write the following way //Val lines:rdd[string] =sc.textfile ("d://spark-1.6.1-bin-hadoop2.6//readme.md", 1) /** * Fourth: Transformation-level processing of the initial rdd, such as high-order function * programming such as Map,filter. For specific data calculations * Step 4.1: Split the string of each line into a single word */ //The Word splits the string of each row and merges the results of all rows into a large collection by flat ValWords = lines.flatmap {line = Line.split (" ") }/** * Step 4.2 on the basis of word splitting, count 1 for each word instance, i.e. word=> (word,1) tuple */ ValPairs = Words.map {word = + (Word,1) }/** * Step 4.3 counts The total number of occurrences of each word in the text based on each word instance count of 1 * / //Add value to the same key (including reduce at the local and reduce levels) Valwordcounts = Pairs.reducebykey (_+_)//Print resultsWordcounts.foreach (Wordnumberpair = println (wordnumberpair._1 +":"+wordnumberpair._2))//Release ResourcesSc.stop ()}}
Run as->scala application
Run results
PackageCom.testImportOrg.apache.spark.SparkConfImportOrg.apache.spark.SparkContextImportOrg.apache.spark.rdd.RDD Object wordcountcluster { defMain (args:array[string]) {/** * First step: Create a Configuration object for Spark sparkconf, set the runtime configuration information for the SPARK program, * For example, by Setmaster to set the URL of the master of the Spark Cluster to which the program is connected, * if set to Lo Cal, on behalf of the Spark program, runs locally, especially for beginners with very poor machine configuration conditions * (e.g., only 1G of memory) */ Valconf =NewSparkconf ()//Create sparkconf object, no factory method required because there is only one sparkconf globallyConf.setappname ("Wow,my first spark app")//Set the name of the application, which can be seen in the program's monitoring interface //Conf.setmaster ("spark://master:7077")//This time the program is in the spark cluster /** * Step two: Create a Sparkcontext Object * Sparkcontext is the only entry for all the functions of the spark program, including Scala, Java, Python, R, and so on must have a * sparkcontext * Sparkcontext Core role: Initialize the core components required for the spark application to run, including the dagscheduler,taskscheduler,schedulerbacked, * also responsible for the spark program to the master registration process Sparkcontext is one of the most critical objects in the entire SPARK application * * ValSc=NewSparkcontext (CONF)//Create a Spackcontext object to customize the configuration information for the specific parameters of the spark run by passing in the sparkconf instance /** * Step three: Create an RDD by sparkcontext based on a specific data source (HDFS,HBASE,LOCAL,FILESYSTEM,DB,S3) * The creation of the RDD is basically three ways, (1) based on external data sources (such as HDFS ) (2) According to the Scala collection (3) the data from the other Rdd is divided into a series of partitions, and the data assigned to each partition belongs to the processing category of a task */ //Read the HDFs file and cut it into different partition ValLines=sc.textfile ("Hdfs://master:9000/library/wordcount/input/data")//type inference, or you can write the following way //Val lines:rdd[string] =sc.textfile ("d://spark-1.6.1-bin-hadoop2.6//readme.md", 1) /** * Fourth: Transformation-level processing of the initial rdd, such as high-order function * programming such as Map,filter. For specific data calculations * Step 4.1: Split the string of each line into a single word */ //The Word splits the string of each row and merges the results of all rows into a large collection by flat ValWords = lines.flatmap {line = Line.split (" ") }/** * Step 4.2 on the basis of word splitting, count 1 for each word instance, i.e. word=> (word,1) tuple */ ValPairs = Words.map {word = + (Word,1) }/** * Step 4.3 counts The total number of occurrences of each word in the text based on each word instance count of 1 * / //Add value to the same key (including reduce at the local and reduce levels) Valwordcounts = Pairs.reducebykey (_+_)//Print resultsWordcounts.foreach (Wordnumberpair = println (wordnumberpair._1 +":"+wordnumberpair._2))//Release ResourcesSc.stop ()}}
Packing
Right-click, Export,java, Jar File
chmod 777 wordcount.sh
./wordcount.sh
Cd/opt/cloudera/parcels/cdh-5.6.0-1.cdh5.6.0.p0.45/lib/spark/bin
IDE Development Spark Program