Developing Scala under Eclipse (dt Big Data Dream Factory)

Source: Internet
Author: User
Tags hadoop fs

The main content of this lecture: Environment installation, configuration, local mode, cluster mode, Automation script, web status monitoring

========== stand-alone ============

Development tools Development

Download the latest version of Scala for Eclipse

1, build the project, modify the Scala compiled version

650) this.width=650; "src="/e/u261/themes/default/images/spacer.gif "style=" Background:url ("/e/u261/lang/zh-cn/ Images/localimage.png ") no-repeat center;border:1px solid #ddd;" alt= "Spacer.gif"/>

2. Add Spark1.6.0 Jar File dependency

Download http://apache.opencas.org/spark/spark-1.6.0/spark-1.6.0-bin-hadoop2.6.tgz

Spark-assembly-1.6.0-hadoop2.6.0.jar


3. Find the dependent Spark jar file and import it into the jar dependency in eclipse

4. Build Spark Project package Com.dt.spark under SRC

5. Create a Scala entry class

650) this.width=650; "src="/e/u261/themes/default/images/spacer.gif "style=" Background:url ("/e/u261/lang/zh-cn/ Images/localimage.png ") no-repeat center;border:1px solid #ddd;" alt= "Spacer.gif"/>

6. Programming the Class object and writing the main entry method

Class Crtl+shift+o not found

Example code:

Package Com.dt.spark

Import org.apache.spark.SparkConf

Import Org.apache.spark.SparkContext

/**

* Use Scala to develop locally tested spark wordcount programs

* @author dt_ Big Data Dream Factory

* Sina Weibo: Http://weibo.com/ilovepains

* */

Object WordCount {

def main (args:array[string]) {

/**

* 1. Create the Spark Configuration Object sparkconf, set the program configuration information for the Spark program's runtime

* Example: Set the URL of the spark cluster to which the program will connect via Setmaster.

* If set to local, it is run locally on behalf of the Spark program, especially for beginners with very poor machine configuration conditions (e.g. 1G memory only)

*/

Val conf = new sparkconf ()//Create sparkconf Object

Conf.setappname ("My first Spark app!") Set the name of the application, which can be seen in the monitoring interface of the program run

Conf.setmaster ("local")//This time the program is running locally without the need to install the spark cluster

/**

* 2. Create Sparkcontext Object

* Sparkcontext is the only entry for all Spark program functions, whether Scala, Java, Python, R, etc. must be

* Sparkcontext Core role: Initialize the core components required by the spark application to run, including Dagscheduler, TaskScheduler, Schedulerbackend

* will also be responsible for the spark program to master registration program, etc.

* Sparkcontext is one of the most critical objects in the entire spark application

*/

Val sc = new Sparkcontext (conf)//Customize the specific parameters and configuration information of the spark run by passing in the sparkconf instance by creating a Sparkcontext object

/**

* 3. Use Sparkcontext to create RDD based on specific data sources (HDFS, HBase, Local FS, DB, S3, etc.)

* Rdd creation is basically three ways: depending on the external data source (for example, HDFs), according to the Scala collection, by the other RDD operation

* Data is divided into a series of partitions by the RDD, and the data assigned to each partition belongs to the processing category of a task.

*/

Sc.textfile (file path, minimum degree of parallelism)

Val lines:rdd[string], by type inference to get lines is a String type of RDD

Val lines:rdd[string] = sc.textfile ("f:/installation file/operating system/spark-1.6.0-bin-hadoop2.6/readme.md", 1)

Val lines = Sc.textfile ("f:/installation file/operating system/spark-1.6.0-bin-hadoop2.6/readme.md", 1)

/**

* 4, the initial RDD for the transformation level of processing, such as map, filter and other high-order functions of programming, to carry out specific data calculation

* 4.1. Split the string of each line into a single word

*/

Val words = lines.flatmap {line + line.split ("")}//a single split of the string for each row and merges the split result of all rows into one large single collection by flat

/**

* 4.2, on the basis of the word split to count each word instance is 1, that is word=> (word,1)

*/

Val pairs = Words.map {word = (word,1)}//actually programmed tuple (word,1)

/**

* 4.3. Count the total number of occurrences of each word in the file on the basis of counting each word instance to 1

*/

Val wordcounts = Pairs.reducebykey (_+_)//For the same key, the accumulation of value (including both local and reducer levels simultaneously reduce)

Wordcounts.foreach (Wordnumberpair=>println (wordnumberpair._1+ ":" +wordnumberpair._2))

/**

* 5. Release related Resources

*/

Sc.stop ()

}

}

My results are not the result of the teacher, for the time being

In addition the inside began to error, will be error, this mistake is correct, because when running to find Hadoop, not found, but this is not a program error, nor affect any of our features

========== Cluster ============

Package Com.dt.spark

Import org.apache.spark.SparkConf

Import Org.apache.spark.SparkContext

/**

* Spark WordCount Program for developing cluster tests using Scala

* @author dt_ Big Data Dream Factory

* Sina Weibo: Http://weibo.com/ilovepains

* */

Object Wordcount_cluster {

def main (args:array[string]) {

/**

* 1. Create the Spark Configuration Object sparkconf, set the program configuration information for the Spark program's runtime

* Example: Set the URL of the spark cluster to which the program will connect via Setmaster.

* If set to local, it is run locally on behalf of the Spark program, especially for beginners with very poor machine configuration conditions (e.g. 1G memory only)

*/

Val conf = new sparkconf ()//Create sparkconf Object

Conf.setappname ("My first Spark app!") Set the name of the application, which can be seen in the monitoring interface of the program run

Conf.setmaster ("spark://master:7077")//Must be run on master:7077 at this time, then run can be manually configured

/**

* 2. Create Sparkcontext Object

* Sparkcontext is the only entry for all Spark program functions, whether Scala, Java, Python, R, etc. must be

* Sparkcontext Core role: Initialize the core components required by the spark application to run, including Dagscheduler, TaskScheduler, Schedulerbackend

* will also be responsible for the spark program to master registration program, etc.

* Sparkcontext is one of the most critical objects in the entire spark application

*/

Val sc = new Sparkcontext (conf)//Customize the specific parameters and configuration information of the spark run by passing in the sparkconf instance by creating a Sparkcontext object

/**

* 3. Use Sparkcontext to create RDD based on specific data sources (HDFS, HBase, Local FS, DB, S3, etc.)

* Rdd creation is basically three ways: depending on the external data source (for example, HDFs), according to the Scala collection, by the other RDD operation

* Data is divided into a series of partitions by the RDD, and the data assigned to each partition belongs to the processing category of a task.

*/

Sc.textfile (file path, minimum degree of parallelism)

Val lines:rdd[string], by type inference to get lines is a String type of RDD

Val lines = Sc.textfile ("/library/wordcount/input/data", 1)//Read the HDFs file and cut it into different partition

Val lines = Sc.textfile ("/historyserverforspark/readme.md", 1)

/**

* 4, the initial RDD for the transformation level of processing, such as map, filter and other high-order functions of programming, to carry out specific data calculation

* 4.1. Split the string of each line into a single word

*/

Val words = lines.flatmap {line + line.split ("")}//a single split of the string for each row and merges the split result of all rows into one large single collection by flat

/**

* 4.2, on the basis of the word split to count each word instance is 1, that is word=> (word,1)

*/

Val pairs = Words.map {word = (word,1)}//actually programmed tuple (word,1)

/**

* 4.3. Count the total number of occurrences of each word in the file on the basis of counting each word instance to 1

*/

Val wordcounts = Pairs.reducebykey (_+_)//For the same key, the accumulation of value (including both local and reducer levels simultaneously reduce)

Wordcounts.collect in the cluster

WordCounts.collect.foreach (Wordnumberpair=>println (wordnumberpair._1+ ":" +wordnumberpair._2))

/**

* 5. Release related Resources

*/

Sc.stop ()

}

}

Package: Jar

Wordcount.jar

Http://spark.apache.org/docs/latest/submitting-applications.html

./bin/spark-submit \
 --deploy-mode <deploy-mode> \
 --conf <key>\
   \
 

./spark-submit--class Com.dt.spark.WordCount_Cluster--master spark://master:7077/root/documents/sparkapps/ Wordcount.jar

Hadoop fs-put/usr/local/spark-1.6.0-bin-hadoop2.6/readme.md hdfs://192.168.145.131:9000/historyserverforspark/ can upload files but not use??

Jobs: Write ads in Eclipse click Ranking Program and test well


Liaoliang Teacher's card:

China Spark first person

Sina Weibo: Http://weibo.com/ilovepains

Public Number: Dt_spark

Blog: http://blog.sina.com.cn/ilovepains

Mobile: 18610086859

qq:1740415547

Email: [Email protected]


This article from "a Flower proud Cold" blog, declined reprint!

Developing Scala under Eclipse (dt Big Data Dream Factory)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.