This article first describes how to configure the Maven+scala development environment in Eclipse, and then describes how to implement the spark local run. Finally, the spark program written by Scala is successfully run.
At first, my Eclipse+maven environment was well configured.
System: Win7
Eclipse version: Luna release (4.4.0)
MAVEN is installed from the Eclipsemarket, as shown in Figure 1.
When the Eclipse+maven environment was built, only the first one was installed.
Instead of rushing to install Maven here, Maven for Eclipse is also available below when you install Maven for Scala.
Figure 1-eclipse installed M2E plug-ins
First, configure Eclipse + maven + Scala environment
1. Install the Scala IDE in Eclipse market
Figure 2-eclipse Install Scala IDE
2. Install M2e-scala
As shown in Figure 3, the URL in the figure is: http://alchim31.free.fr/m2e-scala/update-site/
As you can see from the plugin name found in Figure 3, M2E is also configured here, which is the MAVEN plug-in that Eclipse needs. If eclipse does not have an Eclipse plug-in, you can select all of the installations; If you have one, you can install the third MAVEN integration for Scala IDE separately.
After the installation completes the mavenintegration for Scala IDE, and then enter the URL above, there is no MAVEN integration for Scala IDE in the installation list.
(PS: Here I will mavenintegration for Scala IDE Uninstall after the screenshot)
(PS: If you look at Figure 1 again, there is a mavenintegration for Eclipse (Luna) 1.5.0, in addition to the first mavenintegration for Eclipse (Luna and newer) 1.5. This is when I install M2e-scala using the URL above, not noticing that it also contains mavenintegration for Eclipse, resulting in the installation of two versions of Maven integration for Eclipse)
(PS: Although I have installed the mavenintegration for eclipse in the URL above and did not uninstall it, the option for the MAVEN integration for Eclipse is still shown in Figure 3 because its version is updated.) You can see from it that the latest version is 1.5.1, and the version is updated if you continue to install the MAVEN integration for Eclipse. )
(PS: In Figure 1 There is also a mavenintegration for Eclipse WTP (Juno) 1.0.1 does not know how to install it at the moment)
Figure 3-Installing M2e-scala
Second, the test Eclipse+maven+scala operating environment
1. First to test the Eclipse+scala
To create a new project named Scala, right-click to add a Scala Object named Test, with the following code:
Package Test
Object Test {
def main (args:array[string]) {
println ("Hello World")
}
}
The final example is shown in Figures 4 and 5.
Figure 4-New Scalaproject
、
Figure 5-scala Engineering Catalogue
Right-click Test.scala,run as...-> Scala application, successfully outputting Hello World at the terminal.
As you can see from Figure 5, the Scala version of the scalaide we installed is 2.11.5.
(PS: If Scala is not used as a command line in the terminal, it seems that you can not download the Scala package separately and set the environment variable) 2. Let's test it again, Ecliipse+scala+maven.
The process of creating a new scala+maven can be like this, as shown in Figure 6.
New MAVEN project, do not tick Createa simple project, choose Scala-related archetype.
Eclipse's archetype is a template, giving the impression that the directory architecture and related files (such as Pom.xml) are constructed in a pattern such as Scala maven. If you choose the 1.2-version Scala-related archetype in Figure 6, the new MAVEN project has a directory structure for the Scala maven project, Pom.xml is also configured, and there are several Scala code files.
However, there are some errors that the compilation cannot pass. I think this is mainly because of the Scala version problem, which is seen in the pom.xml of the project, which is built on Scala 2.7.0. And the Scala IDE we installed is based on Scala 2.11.5.
Figure 6-New Scala maven project
Scala's new version of compatibility with older versions does not seem to be good. You can fix the Pom.xml file yourself, but the estimated code may also need to be modified.
I'm here to download a ready-made MAVEN project based on scala2.11.5 from Git.
Git URL: https://github.com/scala/scala-module-dependency-sample
After using Git clone, import the MAVEN project (Maven-sample) in Eclipse.
As can be seen from its pom.xml, it is based on scala-2.11.5. There is only one code file, which is Xmlhelloworld.scala. As long as you can smoothly pull to the pom.xml in the dependency pack, you can directly right-click Xmlhelloworld.scala, Run as-> Scala application.
At this point, Ecipse+scala+maven was built. Next, configure the spark local running environment. Third, configure the spark local run
1. Configure the required dependency pack
Here I am in the Maven-sample project based on the configuration of Spark.
Add Spark-core to the Pom.xml.
<dependency>
<groupId>org.apache.spark</groupId>
<artifactid>spark-core_2.11 </artifactId>
<version>1.2.1</version>
</dependency>
Add a scala Object–simpleapp to the default package. The code is as follows:
/* Simpleapp.scala * Import
org.apache.spark.SparkContext
import org.apache.spark.sparkcontext._
Import org.apache.spark.SparkConf
object Simpleapp {
def main (args:array[string]) {
val logFile = " Test.txt "//Should is some file on your system
val conf = new sparkconf (). Setappname (" Simple Application "). Setmaster ( "Local[2]")
val sc = new Sparkcontext (conf)
val logdata = Sc.textfile (LogFile, 2). Cache ()
val numas = Logdata . Filter (Line => line.contains ("a")). Count ()
val numbs = logdata.filter (line => line.contains ("B")). Count ()
println ("Lines with a:%s, Lines with B:%s". Format (Numas, numbs))
}
}
At this point, the compilation has passed, but if run as–> Scala application, there will be classdefnotfound exceptions.
This is because Spark-core actually relies on a lot of other jar packs to run, but these packages are not in the Spark-core package and are not in our classpath.
We can easily find Spark-core packages from the online Maven library.
URL: http://search.maven.org/
Figure 7-spark-core
We click on the following POM link to see the Spark-core pom.xml file, which relies on a lot of packages.
Copy all of the dependency to the pom.xml of our Maven-sample project. When you rebuild the project, you need to download a lot of packages, which can take a long time.
PS: From the Spark-core pom.xml file can be very intuitive to see, where the org.scala-lang.scala-library components we already have, but a slightly different version, you can delete, you can not delete. )
In fact, you can also rely on the classdefnotfound of their own step-by-step to add a dependent package, however, you need to add a lot.
And, because Spark-core relies on a lot of packages, it relies on other packages. Therefore, you may find that some of the packages that the class relies on are not seen in the Spark-core pom.xml file in the process of adding a dependency pack based on the classdefnotfound exception. This is because they exist in the dependency pack of the Spark-core dependency pack. Also, Spark-core's dependency packages are dependent on packages, and there are versions of conflicts that may encounter methodnotfound errors during manual additions.
The easiest way to do this is to copy all the dependencies. 2. Test run
Now, let's first configure the Test.txt file in the engineering directory.
Figure 8-Add Test.txt file
The contents of the document are as follows:
A
b
c
ab
abab
D
Right-click Simpleapp.scala,run as-> Scala application and found a lot of logs.
I can only understand the last line.
Lineswith A:3, Lines with B:3
is the correct output for the spark program to run.
However, you can see that there is still an exception in the log.
Figure 9-hadoop exception
This exception does not seem to affect the proper operation of the program, but first consider resolving the exception.
For this exception, refer to the
Http://www.srccodes.com/p/article/39/error-util-shell-failed-locate-winutils-binary-hadoop-binary-path
The main content of this web page is very short, meaning that hadoop2.2.0 's release version does not support Windows. The Web page provides links to "build, Install, Configure and Run Apache Hadoop 2.2.0 in MicrosoftWindows OS", as well as out-of-the-box compiled packages.
I downloaded the package directly, set up the Null/bin directory under the engineering directory, and copied all the files in the download package into the Null/bin directory.
Figure 10-bin Directory
Next, then run the Simpleapp program, there is no exception.
However, it is not known whether this method resolves the exception in nature. Because I found that even if you manually build a Null/bin/winutils.exe file, as long as the path and file name is the same, you can eliminate the above about hadoop2.2.0 exception.