To build spark under WIN10, you need to install Java Jdk,scala,spark,hadoop.
First, install the configuration JDK
Download JDK version: Jdk-8u151-windows-x64.exe
Add 2 Environment variables:
java_home E : \java\jdk1.8.0_151 ( note that the directory where the JDK is installed is not the default directory for the C drive, there can be no spaces in the path )
CLASSPATH%java_home%\lib;%java_home%\lib\tools.jar
in the system environment variable path, on the original basis add:%java_home%\bin
Ii. Installation and Configuration Scala
Visit the official address http://www.scala-lang.org/download/2.11.8.html
Download: Scala-2.11.8.msi
In the system environment variable Path, add: C:\Program Files (x86) \scala\bin
Third, install the configuration spark
1. Download Spark
Visit the official address http://spark.apache.org/downloads.html
Download file: spark-2.2.0-bin-hadoop2.7.tgz
2. Unzip the tgz file
I extracted the files to the directory: D:\spark-2.2.0-bin-hadoop2.7
In this directory, there are bins and other folders.
3. Configuration
To add a system environment variable:
Spark_home D:\spark-2.2.0-bin-hadoop2.7
In the system environment variable path increased:%spark_home%\bin
Iv. Installation Configuration Hadoop
1. Download Hadoop
Visit the official http://hadoop.apache.org/releases.html
You can download binary files in version 2.7.6
However, I was in the installation, direct Baidu, looking for hadoop2.7.1 compressed files.
In the Bin directory, contains: Hadoop.dll, Winutils.exe, these 2 files are enough.
Then unzip to: D:\hadoop2.7.1
2. Configuration
Add System Environment variables:
Hadoop_home D:\hadoop2.7.1
In the system environment variable path increased:%hadoop_home%\bin
3. Download Winutils
Download path: https://github.com/steveloughran/winutils
V. Configuration Pyspark
Before setting up the spark environment, Anaconda was installed, including Python, in order to use the Pyspark:
1, copy the D:\spark-2.2.0-bin-hadoop2.7\python to the E:\Anaconda3\Lib\site-packages path.
2. Install py4j via pip install py4j.
3, modify the permissions Winutils.exe chmod 777 D:\tmp\Hive, create a directory D:\tmp\Hive before running the command.
4. Configuration
Add system Environment variables: PYTHONPATH%spark_home%\python\lib\py4j;%spark_home%\python\lib\pyspark; E:\Anaconda3;
In the system environment variable path, add: E:\Anaconda3
VI. Verification
Start cmd, input: Pyspark
or enter: Spark-shell
Build Spark under WIN10