Build Scala+spark development environment with Eclipse and idea, respectively

Source: Internet
Author: User

Install jdk1.7.0_60 and scala2.10.4 on the development machine and configure the relevant environment variables. Many online data, installation process ignored. In addition, Eclipse uses Luna4.4.1,idea to use the 14.0.2 version.

1. Eclipse Development Environment Building 1.1. Install the Scala plugin

Installing the Eclipse-scala-plugin plug-in, http://scala-ide.org/download/prev-stable.html

After unzipping, copy the plugins and features to the Eclipse directory and restart Eclipse.

Windows---open perspective, other ..., opens Scala, stating that the installation was successful.

1.2. Create a MAVEN project

Open file, New, and other ..., select maven Project:

Click Next to enter the project storage path:

Click Next and select Org.scala-tools.archetypes:

Click Next to enter artifact related information:

Click Finish. The project directory structure created by default is as follows:

To modify the Pom.xml file:

At this point, a default Scala project is completed.

2. Spark development Environment Build 2.1. Installing the Scala plugin

The idea version used by the development machine is IntelliJ Ieda 14.0.2. To enable the idea to support Scala development, you need to install the Scala plugin,

After the plug-in installation is complete, IntelliJ idea will require a reboot.

2.2. Create a MAVEN project

Click Create New Project to select the JDK installation directory in the Project SDK (it is recommended that the JDK version in the development environment be consistent with the JDK version on the Spark cluster). Click Maven on the left, tick create from archetype, select Org.scala-tools.archetypes:scala-archetype-simple:

After clicking Next, you can fill in Groupid,artifactid and version on demand (please ensure that Maven has been installed before). When you click Finish, MAVEN will automatically generate pom.xml and download dependent packages. The Scala version in Pom.xml needs to be modified in the same way as in the 1.2 section to create a MAVEN project under Eclipse.

At this point, one of the default Scala projects under idea is created.

3. WordCount Example Program 3.1. Modify the Pom file

Add spark and Hadoop dependent packages to the Pom file:

<!--Spark -<Dependency><groupId>Org.apache.spark</groupId><Artifactid>spark-core_2.10</Artifactid><version>1.1.0</version></Dependency><!--Spark Steaming -<Dependency><groupId>Org.apache.spark</groupId><Artifactid>spark-streaming_2.10</Artifactid><version>1.1.0</version></Dependency><!--HDFS -<Dependency><groupId>Org.apache.hadoop</groupId><Artifactid>Hadoop-client</Artifactid><version>2.6.0</version></Dependency>
View Code

Use the Maven-assembly-plugin plugin in <build></build> to package the dependent jar as well.

<plugin><Artifactid>Maven-assembly-plugin</Artifactid><version>2.5.5</version><Configuration><Appendassemblyid>False</Appendassemblyid><Descriptorrefs><Descriptorref>Jar-with-dependencies</Descriptorref></Descriptorrefs><Archive><Manifest><MainClass>Com.ccb.WordCount</MainClass></Manifest></Archive></Configuration><executions><Execution><ID>make-assembly</ID><Phase>Package</Phase><Goals><goal>Assembly</goal></Goals></Execution></executions></plugin>
View Code3.2. WordCount Example

WordCount is used to count the occurrences of all the words in the input file, code reference:

 PackageCOM.CCBImportOrg.apache.spark. {sparkconf, sparkcontext}Importorg.apache.spark.sparkcontext._/*** Count The total number of occurrences of all words in the input directory*/Object WordCount {def main (args:array[string]) {val Dirin= "Hdfs://192.168.62.129:9000/user/vm/count_in"Val dirout= "Hdfs://192.168.62.129:9000/user/vm/count_out"Val conf=Newsparkconf () Val SC=Newsparkcontext (Conf) Val line=Sc.textfile (Dirin) Val cnt= Line.flatmap (_.split ("")). Map ((_, 1)). Reducebykey (_ + _)//file split by space, count the number of wordsVal sortedcnt= Cnt.map (x = = (x._2, x._1)). Sortbykey (Ascending =false). Map (x = (x._2, x._1))//Sort by number of occurrences from highest to lowestsortedcnt.collect (). foreach (println)//Console Outputsortedcnt.saveastextfile (dirout)//writing to a text filesc.stop ()}}
View Code3.3. Submit Spark Execution

Use the MAVEN pacakge package to get the Sparktest-1.0-snapshot.jar and commit to the spark cluster to run.

To perform a command reference:

./spark-submit--name Wordcountdemo--class com.ccb.WordCount Sparktest-1.0-snapshot.jar

can get statistical results.

Build Scala+spark development environment with Eclipse and idea, respectively

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.