The main content of this lecture: Environment installation, configuration, local mode, cluster mode, Automation script, web status monitoring
========== stand-alone ============
Development tools Development
Download the latest version of Scala for Eclipse
1, build the project, modify the Scala compiled version
650) this.width=650; "src="/e/u261/themes/default/images/spacer.gif "style=" Background:url ("/e/u261/lang/zh-cn/ Images/localimage.png ") no-repeat center;border:1px solid #ddd;" alt= "Spacer.gif"/>
2. Add Spark1.6.0 Jar File dependency
Download http://apache.opencas.org/spark/spark-1.6.0/spark-1.6.0-bin-hadoop2.6.tgz
Spark-assembly-1.6.0-hadoop2.6.0.jar
3. Find the dependent Spark jar file and import it into the jar dependency in eclipse
4. Build Spark Project package Com.dt.spark under SRC
5. Create a Scala entry class
650) this.width=650; "src="/e/u261/themes/default/images/spacer.gif "style=" Background:url ("/e/u261/lang/zh-cn/ Images/localimage.png ") no-repeat center;border:1px solid #ddd;" alt= "Spacer.gif"/>
6. Programming the Class object and writing the main entry method
Class Crtl+shift+o not found
Example code:
Package Com.dt.spark
Import org.apache.spark.SparkConf
Import Org.apache.spark.SparkContext
/**
* Use Scala to develop locally tested spark wordcount programs
* @author dt_ Big Data Dream Factory
* Sina Weibo: Http://weibo.com/ilovepains
* */
Object WordCount {
def main (args:array[string]) {
/**
* 1. Create the Spark Configuration Object sparkconf, set the program configuration information for the Spark program's runtime
* Example: Set the URL of the spark cluster to which the program will connect via Setmaster.
* If set to local, it is run locally on behalf of the Spark program, especially for beginners with very poor machine configuration conditions (e.g. 1G memory only)
*/
Val conf = new sparkconf ()//Create sparkconf Object
Conf.setappname ("My first Spark app!") Set the name of the application, which can be seen in the monitoring interface of the program run
Conf.setmaster ("local")//This time the program is running locally without the need to install the spark cluster
/**
* 2. Create Sparkcontext Object
* Sparkcontext is the only entry for all Spark program functions, whether Scala, Java, Python, R, etc. must be
* Sparkcontext Core role: Initialize the core components required by the spark application to run, including Dagscheduler, TaskScheduler, Schedulerbackend
* will also be responsible for the spark program to master registration program, etc.
* Sparkcontext is one of the most critical objects in the entire spark application
*/
Val sc = new Sparkcontext (conf)//Customize the specific parameters and configuration information of the spark run by passing in the sparkconf instance by creating a Sparkcontext object
/**
* 3. Use Sparkcontext to create RDD based on specific data sources (HDFS, HBase, Local FS, DB, S3, etc.)
* Rdd creation is basically three ways: depending on the external data source (for example, HDFs), according to the Scala collection, by the other RDD operation
* Data is divided into a series of partitions by the RDD, and the data assigned to each partition belongs to the processing category of a task.
*/
Sc.textfile (file path, minimum degree of parallelism)
Val lines:rdd[string], by type inference to get lines is a String type of RDD
Val lines:rdd[string] = sc.textfile ("f:/installation file/operating system/spark-1.6.0-bin-hadoop2.6/readme.md", 1)
Val lines = Sc.textfile ("f:/installation file/operating system/spark-1.6.0-bin-hadoop2.6/readme.md", 1)
/**
* 4, the initial RDD for the transformation level of processing, such as map, filter and other high-order functions of programming, to carry out specific data calculation
* 4.1. Split the string of each line into a single word
*/
Val words = lines.flatmap {line + line.split ("")}//a single split of the string for each row and merges the split result of all rows into one large single collection by flat
/**
* 4.2, on the basis of the word split to count each word instance is 1, that is word=> (word,1)
*/
Val pairs = Words.map {word = (word,1)}//actually programmed tuple (word,1)
/**
* 4.3. Count the total number of occurrences of each word in the file on the basis of counting each word instance to 1
*/
Val wordcounts = Pairs.reducebykey (_+_)//For the same key, the accumulation of value (including both local and reducer levels simultaneously reduce)
Wordcounts.foreach (Wordnumberpair=>println (wordnumberpair._1+ ":" +wordnumberpair._2))
/**
* 5. Release related Resources
*/
Sc.stop ()
}
}
My results are not the result of the teacher, for the time being
In addition the inside began to error, will be error, this mistake is correct, because when running to find Hadoop, not found, but this is not a program error, nor affect any of our features
========== Cluster ============
Package Com.dt.spark
Import org.apache.spark.SparkConf
Import Org.apache.spark.SparkContext
/**
* Spark WordCount Program for developing cluster tests using Scala
* @author dt_ Big Data Dream Factory
* Sina Weibo: Http://weibo.com/ilovepains
* */
Object Wordcount_cluster {
def main (args:array[string]) {
/**
* 1. Create the Spark Configuration Object sparkconf, set the program configuration information for the Spark program's runtime
* Example: Set the URL of the spark cluster to which the program will connect via Setmaster.
* If set to local, it is run locally on behalf of the Spark program, especially for beginners with very poor machine configuration conditions (e.g. 1G memory only)
*/
Val conf = new sparkconf ()//Create sparkconf Object
Conf.setappname ("My first Spark app!") Set the name of the application, which can be seen in the monitoring interface of the program run
Conf.setmaster ("spark://master:7077")//Must be run on master:7077 at this time, then run can be manually configured
/**
* 2. Create Sparkcontext Object
* Sparkcontext is the only entry for all Spark program functions, whether Scala, Java, Python, R, etc. must be
* Sparkcontext Core role: Initialize the core components required by the spark application to run, including Dagscheduler, TaskScheduler, Schedulerbackend
* will also be responsible for the spark program to master registration program, etc.
* Sparkcontext is one of the most critical objects in the entire spark application
*/
Val sc = new Sparkcontext (conf)//Customize the specific parameters and configuration information of the spark run by passing in the sparkconf instance by creating a Sparkcontext object
/**
* 3. Use Sparkcontext to create RDD based on specific data sources (HDFS, HBase, Local FS, DB, S3, etc.)
* Rdd creation is basically three ways: depending on the external data source (for example, HDFs), according to the Scala collection, by the other RDD operation
* Data is divided into a series of partitions by the RDD, and the data assigned to each partition belongs to the processing category of a task.
*/
Sc.textfile (file path, minimum degree of parallelism)
Val lines:rdd[string], by type inference to get lines is a String type of RDD
Val lines = Sc.textfile ("/library/wordcount/input/data", 1)//Read the HDFs file and cut it into different partition
Val lines = Sc.textfile ("/historyserverforspark/readme.md", 1)
/**
* 4, the initial RDD for the transformation level of processing, such as map, filter and other high-order functions of programming, to carry out specific data calculation
* 4.1. Split the string of each line into a single word
*/
Val words = lines.flatmap {line + line.split ("")}//a single split of the string for each row and merges the split result of all rows into one large single collection by flat
/**
* 4.2, on the basis of the word split to count each word instance is 1, that is word=> (word,1)
*/
Val pairs = Words.map {word = (word,1)}//actually programmed tuple (word,1)
/**
* 4.3. Count the total number of occurrences of each word in the file on the basis of counting each word instance to 1
*/
Val wordcounts = Pairs.reducebykey (_+_)//For the same key, the accumulation of value (including both local and reducer levels simultaneously reduce)
Wordcounts.collect in the cluster
WordCounts.collect.foreach (Wordnumberpair=>println (wordnumberpair._1+ ":" +wordnumberpair._2))
/**
* 5. Release related Resources
*/
Sc.stop ()
}
}
Package: Jar
Wordcount.jar
Http://spark.apache.org/docs/latest/submitting-applications.html
./bin/spark-submit \
--deploy-mode <deploy-mode> \
--conf <key>\
\
./spark-submit--class Com.dt.spark.WordCount_Cluster--master spark://master:7077/root/documents/sparkapps/ Wordcount.jar
Hadoop fs-put/usr/local/spark-1.6.0-bin-hadoop2.6/readme.md hdfs://192.168.145.131:9000/historyserverforspark/ can upload files but not use??
Jobs: Write ads in Eclipse click Ranking Program and test well
Liaoliang Teacher's card:
China Spark first person
Sina Weibo: Http://weibo.com/ilovepains
Public Number: Dt_spark
Blog: http://blog.sina.com.cn/ilovepains
Mobile: 18610086859
qq:1740415547
Email: [Email protected]
This article from "a Flower proud Cold" blog, declined reprint!
Developing Scala under Eclipse (dt Big Data Dream Factory)