Build Spark under WIN10

Last Update:2018-07-09 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

To build spark under WIN10, you need to install Java Jdk,scala,spark,hadoop.

First, install the configuration JDK

Download JDK version: Jdk-8u151-windows-x64.exe

Add 2 Environment variables:

java_home E : \java\jdk1.8.0_151 ( note that the directory where the JDK is installed is not the default directory for the C drive, there can be no spaces in the path )

CLASSPATH%java_home%\lib;%java_home%\lib\tools.jar

in the system environment variable path, on the original basis add:%java_home%\bin

Ii. Installation and Configuration Scala

Visit the official address http://www.scala-lang.org/download/2.11.8.html

Download: Scala-2.11.8.msi

In the system environment variable Path, add: C:\Program Files (x86) \scala\bin

Third, install the configuration spark

1. Download Spark

Visit the official address http://spark.apache.org/downloads.html

Download file: spark-2.2.0-bin-hadoop2.7.tgz

2. Unzip the tgz file

I extracted the files to the directory: D:\spark-2.2.0-bin-hadoop2.7

In this directory, there are bins and other folders.

3. Configuration

To add a system environment variable:

Spark_home D:\spark-2.2.0-bin-hadoop2.7

In the system environment variable path increased:%spark_home%\bin

Iv. Installation Configuration Hadoop

1. Download Hadoop

Visit the official http://hadoop.apache.org/releases.html

You can download binary files in version 2.7.6

However, I was in the installation, direct Baidu, looking for hadoop2.7.1 compressed files.

In the Bin directory, contains: Hadoop.dll, Winutils.exe, these 2 files are enough.

Then unzip to: D:\hadoop2.7.1

2. Configuration

Add System Environment variables:

Hadoop_home D:\hadoop2.7.1

In the system environment variable path increased:%hadoop_home%\bin

3. Download Winutils

Download path: https://github.com/steveloughran/winutils

V. Configuration Pyspark

Before setting up the spark environment, Anaconda was installed, including Python, in order to use the Pyspark:

1, copy the D:\spark-2.2.0-bin-hadoop2.7\python to the E:\Anaconda3\Lib\site-packages path.

2. Install py4j via pip install py4j.

3, modify the permissions Winutils.exe chmod 777 D:\tmp\Hive, create a directory D:\tmp\Hive before running the command.

4. Configuration

Add system Environment variables: PYTHONPATH%spark_home%\python\lib\py4j;%spark_home%\python\lib\pyspark; E:\Anaconda3;

In the system environment variable path, add: E:\Anaconda3

VI. Verification

Start cmd, input: Pyspark

or enter: Spark-shell

Build Spark under WIN10

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Build Spark under WIN10

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support