http://dongxicheng.org/framework-on-yarn/spark-eclipse-ide/
The previous article "Apache Spark Learning: Deploying Spark to Hadoop 2.2.0" describes how to use MAVEN compilation to build a spark jar package that can run directly on Hadoop 2.2.0, and this article builds on this Describes how to build a spark integrated development environment with eclipse. It is not recommended to use Eclipse to develop spark programs and read the source code, it is recommended to use IntelliJ idea, specific reference article: Apache Spark Quest: Build the development environment with IntelliJ idea.
(1) Preparatory work
Before the formal introduction, the following hardware and software preparation:
Software Preparation:
Eclipse Juno Version (4.2 version), can be downloaded directly here: Eclipse 4.2
Scala 2.9.3 version, window Installer can be downloaded directly here: Scala 2.9.3
The Eclipse Scala IDE plugin can be downloaded directly here: Scala IDE (for Scala 2.9.x and Eclipse Juno)
Hardware Preparation
A machine with Linux or a Windows operating system
(2) Building the Spark integrated development environment
I am operating under the Windows operating system, the process is as follows:
Step 1: Install Scala 2.9.3: Click Install directly.
Step 2: Copy all files from the features and plugins two directories in the Eclipse Scala IDE plugin to the corresponding directory after Eclipse decompression
Step 3: Restart Eclipse, click the box button in the upper right corner of Eclipse, as shown below, expand, click "Other ..." to see if there is a "Scala", if so, click Open, otherwise proceed to step 4.
Step 4: In Eclipse, select "Help" –> "Install New software ..." and fill in the Open card http://download.scala-ide.org/sdk/e38/scala29/ Stable/site, and press ENTER, you can see the following content, select the first two to install. (Since step 3 has already copied the jar package into eclipse, install it quickly, just dredge it) once the installation is complete, repeat steps 3 again.
(3) Developing spark programs using the Scala language
In Eclipse, select "File" –> "New" –> "Other ..." –> "Scala Wizard" –> "Scala Project", create a Scala project and name "Sparkscala".
Right-click on the "Saprkscala" project, select "Properties", in the pop-up box, select "Java Build Path" –> "libraties" –> "Add External JARs ..." in the following image, and then import the article " Apache Spark Learning: Deploying Spark to Hadoop 2.2.0
assembly/target/scala-2.9.3/ Directory of Spark-assembly-0.8.1-incubating-hadoop2.2.0.jar, this jar package can also be compiled by itself spark generated, placed in the Spark directory of the assembly/target/ The SCALA-2.9.3/directory.
Similar to creating a Scala project, add a Scala Class to the project named: WordCount, the entire project structure is as follows:
WordCount is the most classic word frequency statistics program, which will count the total number of occurrences of all words in the input directory, Scala code is as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
Import Org.apache.spark. _ Import Sparkcontext. _ Object WordCount {def main (args:array[string]) {if (args.length! = 3) { |