Spark Local Development Environment setup
Currently, many spark development environments on the Internet are built based on idea. I am used to eclipse, or use eclipse to build a development environment. Preparations to download Scala IDE for Eclipse: http://scala-ide.org/
Scala engineering version
This method is similar to the Java project.
Create a scala Project
Remove the built-in scala version library from the project
Add spark library spark-assembly-1.1.0-cdh5.2.0-Hadoop2.5.0-cdh5.2.0.jar
Modify scala compilation version in the project
Right-click --> Scala --> set the Scala Installation
Yes.
Right-click the project --> Properties --> Scala Compiler --> Use project Setting and select the scala version corresponding to spark. Here, select Lastest2.10 bundle.
Write scala code
Export jar package
Maven project version
Currently, projects in the community are basically managed using Maven, so learning through Maven development is still necessary.
To create a spark project, let's first learn how to create a scala project using maven!
Scala Maven Project
Create a maven project and modify pom. xml
<Project xmlns = "http://maven.apache.org/POM/4.0.0" xmlns: xsi = "http://www.w3.org/2001/XMLSchema-instance"
Xsi: schemaLocation = "http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<ModelVersion> 4.0.0 </modelVersion>
<GroupId> com. wankun. scala </groupId>
<ArtifactId> test3 </artifactId>
<Version> 1.0 </version>
<Packaging> jar </packaging>
<Name> test3 </name>
<Url> http://maven.apache.org </url>
<Properties>
<Project. build. sourceEncoding> UTF-8 </project. build. sourceEncoding>
<Maven-compiler-plugin.version> 3.1 </maven-compiler-plugin.version>
<Org. scala-tools.version> 2.3.2 </org. scala-tools.version>
<Scala. library. version> 2.10.4 </scala. library. version>
</Properties>
<Dependencies>
<Dependency>
<GroupId> org. scala-lang </groupId>
<ArtifactId> scala-library </artifactId>
<Version >$ {scala. library. version} </version>
</Dependency>
<Dependency>
<GroupId> junit </groupId>
<ArtifactId> junit </artifactId>
<Version> 4.10 </version>
<Scope> test </scope>
</Dependency>
</Dependencies>
<Repositories>
<Repository>
Scala-tools.org
<Name> Scala-tools Maven2 Repository </name>
<Url> http://scala-tools.org/repo-releases </url>
</Repository>
</Repositories>
<PluginRepositories>
<PluginRepository>
Scala-tools.org
<Name> Scala-tools Maven2 Repository </name>
<Url> http://scala-tools.org/repo-releases </url>
</PluginRepository>
</PluginRepositories>
<Build>
<Plugins>
<Plugin>
<ArtifactId> maven-compiler-plugin </artifactId>
<Version >$ {maven-compiler-plugin.version} </version>
<Configuration>
<Sources> 1.6 </source>
<Target> 1.6 </target>
<Encoding> UTF-8 </encoding>
</Configuration>
</Plugin>
<Plugin>
<GroupId> org. scala-tools </groupId>
<ArtifactId> maven-scala-plugin </artifactId>
<Executions>
<Execution>
<Goals>
<Goal> compile </goal>
<Goal> testCompile </goal>
</Goals>
</Execution>
</Executions>
<Configuration>
<SourceDir> src/main/scala </sourceDir>
<JvmArgs>
<JvmArg>-Xms64m </jvmArg>
<JvmArg>-Xmx1024m </jvmArg>
</JvmArgs>
</Configuration>
</Plugin>
</Plugins>
</Build>
</Project>
The main changes to pom are as follows:
- Add scala-tools plug-in Repository
- Add the maven-scala-plugin plug-in
- Add scala-tools repository
- Add scala-library
Other maven plug-ins can be added as needed.
Some other scala class libraries can be added when necessary.
<Dependency>
<GroupId> org. scala-lang </groupId>
<ArtifactId> scala-compiler </artifactId>
<Version >$ {scala. version} </version>
</Dependency>
<Dependency>
<GroupId> org. scala-lang </groupId>
<ArtifactId> scala-reflect </artifactId>
<Version >$ {scala. version} </version>
</Dependency>
<Dependency>
<GroupId> org. scala-lang </groupId>
<ArtifactId> jline </artifactId>
<Version >$ {scala. version} </version>
</Dependency>
<Dependency>
<GroupId> org. scala-lang </groupId>
<ArtifactId> scala-library </artifactId>
<Version >$ {scala. version} </version>
</Dependency>
<Dependency>
<GroupId> org. scala-lang </groupId>
<ArtifactId> scala-actors </artifactId>
<Version >$ {scala. version} </version>
</Dependency>
<Dependency>
<GroupId> org. scala-lang </groupId>
<ArtifactId> scalap </artifactId>
<Version >$ {scala. version} </version>
</Dependency>
Add the src/main/scala source code directory and develop scala programs
Spark Maven Project
Create a scala maven project and add the spark dependency package. Because the version of cloudera that has been used for a long time is released, the cloudera package is imported here.
<Repository>
<Id> cloudera </id>
<Url> https://repository.cloudera.com/cloudera/cloudera-repos </url>
</Repository>
<Dependency>
<GroupId> org. apache. spark </groupId>
<ArtifactId> spark-core_2.10 </artifactId>
<Version >$ {spark. version} </version>
</Dependency>
Note: scala is found during testing. math. random cannot be normally introduced because I have written packagecom. wankun. As a result, importcom. wankun. scala. math should be caused by the difference between the package mechanism in scala and the package mechanism in Java.
For more Spark tutorials, see the following:
Install and configure Spark in CentOS 7.0
Spark1.0.0 Deployment Guide
Install Spark0.8.0 in CentOS 6.2 (64-bit)
Introduction to Spark and its installation and use in Ubuntu
Install the Spark cluster (on CentOS)
Hadoop vs Spark Performance Comparison
Spark installation and learning
Spark Parallel Computing Model
Spark details: click here
Spark: click here
This article permanently updates the link address: