Build Spark under WIN10

Source: Internet
Author: User

To build spark under WIN10, you need to install Java Jdk,scala,spark,hadoop.

First, install the configuration JDK

Download JDK version: Jdk-8u151-windows-x64.exe

Add 2 Environment variables:

java_home E : \java\jdk1.8.0_151 ( note that the directory where the JDK is installed is not the default directory for the C drive, there can be no spaces in the path )

CLASSPATH%java_home%\lib;%java_home%\lib\tools.jar

in the system environment variable path, on the original basis add:%java_home%\bin

Ii. Installation and Configuration Scala

Visit the official address http://www.scala-lang.org/download/2.11.8.html

Download: Scala-2.11.8.msi

In the system environment variable Path, add: C:\Program Files (x86) \scala\bin

Third, install the configuration spark

1. Download Spark

Visit the official address http://spark.apache.org/downloads.html

Download file: spark-2.2.0-bin-hadoop2.7.tgz

2. Unzip the tgz file

I extracted the files to the directory: D:\spark-2.2.0-bin-hadoop2.7

In this directory, there are bins and other folders.

3. Configuration

To add a system environment variable:

Spark_home D:\spark-2.2.0-bin-hadoop2.7

In the system environment variable path increased:%spark_home%\bin

  

Iv. Installation Configuration Hadoop

1. Download Hadoop

Visit the official http://hadoop.apache.org/releases.html

You can download binary files in version 2.7.6

However, I was in the installation, direct Baidu, looking for hadoop2.7.1 compressed files.

In the Bin directory, contains: Hadoop.dll, Winutils.exe, these 2 files are enough.

Then unzip to: D:\hadoop2.7.1

2. Configuration

Add System Environment variables:

Hadoop_home D:\hadoop2.7.1

In the system environment variable path increased:%hadoop_home%\bin

3. Download Winutils

Download path: https://github.com/steveloughran/winutils

V. Configuration Pyspark

Before setting up the spark environment, Anaconda was installed, including Python, in order to use the Pyspark:

1, copy the D:\spark-2.2.0-bin-hadoop2.7\python to the E:\Anaconda3\Lib\site-packages path.

2. Install py4j via pip install py4j.

3, modify the permissions Winutils.exe chmod 777 D:\tmp\Hive, create a directory D:\tmp\Hive before running the command.

4. Configuration

Add system Environment variables: PYTHONPATH%spark_home%\python\lib\py4j;%spark_home%\python\lib\pyspark; E:\Anaconda3;

In the system environment variable path, add: E:\Anaconda3

VI. Verification

Start cmd, input: Pyspark

or enter: Spark-shell

Build Spark under WIN10

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.