Spark Local Development Environment setup

Source: Internet
Author: User

Spark Local Development Environment setup

Currently, many spark development environments on the Internet are built based on idea. I am used to eclipse, or use eclipse to build a development environment. Preparations to download Scala IDE for Eclipse: http://scala-ide.org/

Scala engineering version

This method is similar to the Java project.

Create a scala Project

Remove the built-in scala version library from the project

Add spark library spark-assembly-1.1.0-cdh5.2.0-Hadoop2.5.0-cdh5.2.0.jar

Modify scala compilation version in the project

Right-click --> Scala --> set the Scala Installation

Yes.

Right-click the project --> Properties --> Scala Compiler --> Use project Setting and select the scala version corresponding to spark. Here, select Lastest2.10 bundle.

Write scala code

Export jar package

Maven project version

Currently, projects in the community are basically managed using Maven, so learning through Maven development is still necessary.

To create a spark project, let's first learn how to create a scala project using maven!

Scala Maven Project

Create a maven project and modify pom. xml

<Project xmlns = "http://maven.apache.org/POM/4.0.0" xmlns: xsi = "http://www.w3.org/2001/XMLSchema-instance"

Xsi: schemaLocation = "http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<ModelVersion> 4.0.0 </modelVersion>
 
<GroupId> com. wankun. scala </groupId>
<ArtifactId> test3 </artifactId>
<Version> 1.0 </version>
<Packaging> jar </packaging>
 
<Name> test3 </name>
<Url> http://maven.apache.org </url>
 
<Properties>
<Project. build. sourceEncoding> UTF-8 </project. build. sourceEncoding>
<Maven-compiler-plugin.version> 3.1 </maven-compiler-plugin.version>
<Org. scala-tools.version> 2.3.2 </org. scala-tools.version>
<Scala. library. version> 2.10.4 </scala. library. version>
</Properties>
<Dependencies>
<Dependency>
<GroupId> org. scala-lang </groupId>
<ArtifactId> scala-library </artifactId>
<Version >$ {scala. library. version} </version>
</Dependency>
<Dependency>
<GroupId> junit </groupId>
<ArtifactId> junit </artifactId>
<Version> 4.10 </version>
<Scope> test </scope>
</Dependency>
</Dependencies>
<Repositories>
<Repository>
Scala-tools.org
<Name> Scala-tools Maven2 Repository </name>
<Url> http://scala-tools.org/repo-releases </url>
</Repository>
</Repositories>
<PluginRepositories>
<PluginRepository>
Scala-tools.org
<Name> Scala-tools Maven2 Repository </name>
<Url> http://scala-tools.org/repo-releases </url>
</PluginRepository>
</PluginRepositories>
<Build>
<Plugins>
<Plugin>
<ArtifactId> maven-compiler-plugin </artifactId>
<Version >$ {maven-compiler-plugin.version} </version>
<Configuration>
<Sources> 1.6 </source>
<Target> 1.6 </target>
<Encoding> UTF-8 </encoding>
</Configuration>
</Plugin>
<Plugin>
<GroupId> org. scala-tools </groupId>
<ArtifactId> maven-scala-plugin </artifactId>
<Executions>
<Execution>
<Goals>
<Goal> compile </goal>
<Goal> testCompile </goal>
</Goals>
</Execution>
</Executions>
<Configuration>
<SourceDir> src/main/scala </sourceDir>
<JvmArgs>
<JvmArg>-Xms64m </jvmArg>
<JvmArg>-Xmx1024m </jvmArg>
</JvmArgs>
</Configuration>
</Plugin>
</Plugins>
</Build>
</Project>

The main changes to pom are as follows:

  • Add scala-tools plug-in Repository
  • Add the maven-scala-plugin plug-in
  • Add scala-tools repository
  • Add scala-library

Other maven plug-ins can be added as needed.

Some other scala class libraries can be added when necessary.

<Dependency>
<GroupId> org. scala-lang </groupId>
<ArtifactId> scala-compiler </artifactId>
<Version >$ {scala. version} </version>
</Dependency>
<Dependency>
<GroupId> org. scala-lang </groupId>
<ArtifactId> scala-reflect </artifactId>
<Version >$ {scala. version} </version>
</Dependency>
<Dependency>
<GroupId> org. scala-lang </groupId>
<ArtifactId> jline </artifactId>
<Version >$ {scala. version} </version>
</Dependency>
<Dependency>
<GroupId> org. scala-lang </groupId>
<ArtifactId> scala-library </artifactId>
<Version >$ {scala. version} </version>
</Dependency>
<Dependency>
<GroupId> org. scala-lang </groupId>
<ArtifactId> scala-actors </artifactId>
<Version >$ {scala. version} </version>
</Dependency>
<Dependency>
<GroupId> org. scala-lang </groupId>
<ArtifactId> scalap </artifactId>
<Version >$ {scala. version} </version>
</Dependency>

Add the src/main/scala source code directory and develop scala programs

Spark Maven Project

Create a scala maven project and add the spark dependency package. Because the version of cloudera that has been used for a long time is released, the cloudera package is imported here.

<Repository>
<Id> cloudera </id>
<Url> https://repository.cloudera.com/cloudera/cloudera-repos </url>
</Repository>

<Dependency>
<GroupId> org. apache. spark </groupId>
<ArtifactId> spark-core_2.10 </artifactId>
<Version >$ {spark. version} </version>
</Dependency>

Note: scala is found during testing. math. random cannot be normally introduced because I have written packagecom. wankun. As a result, importcom. wankun. scala. math should be caused by the difference between the package mechanism in scala and the package mechanism in Java.

For more Spark tutorials, see the following:

Install and configure Spark in CentOS 7.0

Spark1.0.0 Deployment Guide

Install Spark0.8.0 in CentOS 6.2 (64-bit)

Introduction to Spark and its installation and use in Ubuntu

Install the Spark cluster (on CentOS)

Hadoop vs Spark Performance Comparison

Spark installation and learning

Spark Parallel Computing Model

Spark details: click here
Spark: click here

This article permanently updates the link address:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.