Spark Learning note--spark environment under Windows

Source: Internet
Author: User
Tags windows download java se

One, JDK installation 1, 1 download JDK

You first need to install the JDK, and configure the environment variables, if you have installed the old driver can be ignored. The JDK(full name is JAVATM Platform standard Edition development Kit) installation, to the Oracle website to download, is the Java SE Downloads.

The two places marked in red are clickable, and you can see some more detailed information about the latest version when you click inside, as shown in:

After downloading, we can install directly jdk,jdk the installation in Windows is very simple, according to the normal software installation ideas to double-click the downloaded EXE file, and then set your own installation directory (this installation directory in the setting of environment variables need to use) can be.

1, 2 JDK environment variable settings

Next, set the appropriate environment variables, set the method: Right-click on the desktop "computer"-"Properties"-"Advanced system Settings", and then select "Advanced"-"environment variable" in the System properties, and then find the "Path" variable in the system variable, and select "Edit" button to come out a dialog box, you can add in the previous step in the JDK directory installed under the Bin folder path name, I here the Bin folder path name is: C:\Program Files\java\jre1.8.0_92\bin, So add this to the path name, notice the semicolon in English ";" To split. :

Once this is set up, you can run the following command under the CMD command-line window that opens in any directory. See if the setting is successful.

Java-version

See if you can output the relevant Java version information, and if you can output it, the JDK installation step is all over. :

Second, the installation of Scala

We download Scala from the official website: http://www.scala-lang.org/, the latest version is 2.12.3,

Because we are in the Windows environment, this is also the purpose of this article, we choose the corresponding version of Windows download,:

Once you have downloaded the Scala MSI file, you can double-click to perform the installation. Once the installation is successful, the Scala bin directory is added to the path system variable by default (if not, the bin directory path under the Scala installation directory is added to the system variable path, similar to the above JDK installation step), In order to verify that the installation was successful, open a new CMD window, enter it, scala and return it, if you can enter the Scala Interactive command environment, the installation is successful. As shown in the following:

Note: If you cannot display version information and do not enter Scala's interactive command line, there are usually two possibilities:
1. The path name of the Bin folder under the Scala installation directory is not added correctly in the path system variable, and can be added as described in the JDK installation.
2, Scala is not able to install correctly, repeat the above steps.

Third, the installation of Spark

We download it on the Spark website: http://spark.apache.org/, we choose Spark with Hadoop version:

Downloaded about 200M of files: spark-2.2.0-bin-hadoop2.7

Here is the version of pre-built, meaning that it has been compiled well, download to use directly, Spark also has source code can be downloaded, but you have to manually compile before you can use. After the download completes the file decompression (may need to extract two times), it is best to extract to a disk root directory, and renamed to Spark, simple and easy error. It is also important to note that there are no spaces in the file directory pathname of Spark, and folder names like "program Files" are not allowed. We create a new spark folder in the C drive to store:

After the decompression basically almost can be run to the cmd command line. But this time every time you run Spark-shell (Spark's command-Line interactive window), you need cd to go to the installation directory of Spark, which is cumbersome, so you can add the spark's Bin directory to the system variable PATH . For example, my spark here in the bin directory path D:\Spark\bin , then the path name is added to the system variable path, the method and the JDK installation process environment variable settings consistent, after setting the system variables, in any directory of the cmd command line, directly execute spark-shellcommand, you can turn on the interactive command-line mode of Spark.

After the system variable is set up, you can run Spark-shell in cmd in any current directory, but this time it is possible to encounter various errors, mainly because Spark is based on Hadoop, so it is also necessary to configure a Hadoop operating environment. Error:

Next, we also need to install Hadoop.

Iv. installation of Hadoop

The various historical versions of Hadoop can be seen in Hadoop releases, because the downloaded Spark is based on Hadoop 2.7 (the first step in the spark installation, which we chose Pre-built for Hadoop 2.7 ), I choose the 2.7.1 version here, select the appropriate version and click on, go to the detailed download page, as shown in:

Select the red mark in the image to download, here the SRC version is the source code, need to make changes to Hadoop or want to compile themselves can download the corresponding src file, I download here is the compiled version, that is, the "hadoop-2.7.1.tar.gz" file in the diagram.

Download and unzip to the specified directory, and here I am C:\Hadoop,:

Then go to the Environment Variables section to set the hadoop_home to the HADOOP extract directory:

Then set the bin directory under this directory to the PATH of the system variable, I here is C:\Hadoop\bin, if the hadoop_home system variable has been added, it can also be used%hadoop_home%\ Bin to specify the Bin folder path name. When these two system variables are set up, open a new CMD window and enter the spark-shell command directly. :

Under normal circumstances, you can run successfully and enter the command-line environment of Spark, but for some users you might encounter a null pointer error. This time, the main reason is because there is no Winutils.exe file in the bin directory of Hadoop. The solution here is:

You can go to Https://github.com/steveloughran/winutils select the Hadoop version number you installed, and then go to the Bin directory to find the winutils.exe file, the download method is to click on the winutils.exe file, After entering, there is a button in the upper right part of the page Download , click Download. :

Download Winutils.exe file


After the download is ready winutils.exe , put this file into the bin directory of Hadoop, and here I am C:\Hadoop\hadoop-2.7.1\bin.


In the open cmd, enter

C:\Hadoop\hadoop-2.7.1\bin\winutils.exe chmod 777/tmp/hive  //Modify permissions, 777 is get all permissions

But we found that we reported some other mistakes (this error will also occur in Linux)

1 <console>:14:error:not found:value spark2        import spark.implicits._3               ^4 <console>:14:error:not Found:value spark5        Import spark.sql

The reason for this is that there is no permission to write metastore_db this file in Spark.

How to handle: We grant 777 permission

Linux Environment , we operate under root:

1 sudo chmod 777/home/hadoop/spark2 3 #为了方便 that can give all the permissions 4 sudo chmod a+w/home/hadoop/spark

Window Environment:

The folder where spark is stored cannot be set to read-only and hidden:

To grant Full Control permissions:

After these steps, then open a new CMD window again, and if normal, you should be able to run spark through direct input spark-shell . The normal operating interface should look like the following:

Spark Learning note--spark environment under Windows (GO)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.