Ubuntu Spark Environment Setup

Source: Internet
Author: User

Before installing spark, we need to install the JDK and Scala in our own system.
can go to the corresponding official Web download:
Jdk:http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
scala:http://www.scala-lang.org/download/
After the download is complete, you can get two compressed packages

Installing the JDK

First we'll install the JDK,

sudo mkdir /usr/lib/jdk
    • 1

Use this statement to create the JDK installation directory, where we plan to install it in the/USR/LIB/JDK directory,
Then switch to the directory where the JDK compressed package is located, for example, where we put the package in the ~/desktop directory

cd ~/Desktop
    • 1

Perform the extract command to extract the compressed package into the/USR/LIB/JDK directory

-zxvf jdk-8u91-linux-x64.tar.gz -C /usr/lib/jdk
    • 1

Note that this requires root access. Otherwise, there's no way to write data to the/USR/LIB/JDK directory.
Then we need to configure the path path so that the JDK command can execute directly under any path

sudo vim /etc/profile
    • 1

Open the configuration file, some tutorials will let you edit the ~/.BASHRC file in your own directory, the. bashrc file changes will only affect the current user, and/etc/profile changes will work for all users after the restart
At the end of the configuration file, add

export JAVA_HOME=/usr/lib/jdk/jdk1.8.0_91   export JRE_HOME=${JAVA_HOME}/jre  export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib export PATH=${JAVA_HOME}/bin:$PATH
    • 1
    • 2
    • 3
    • 4

The path to the Java_home is configured according to the directory that you unzipped.
Then the introduction of VIM, if you do not use VIM, you can appear in all commands in the vim of the place with Gedit to replace, the following is the same.

source /etc/profile
    • 1

The current terminal is re-loaded into the/etc/profile configuration file and executed

java
    • 1

If there is a large list of things, then congratulations on your JDK installed successfully, or it is likely that your environment configuration is a problem, please check carefully.

Install Scala

And then we need to install Scala. The specific installation process is similar to the JDK,
The first is to create the installation directory

sudo mkdir /usr/lib/scala
    • 1

Then, unzip the package.

-zxvf scala-2.11.8.tgz -C /usr/lib/scala
    • 1

Finally, open the/etc/profile and add the configuration at the end

export SCALA_HOME=/usr/lib/scala/scala-2.11.8export PATH=${SCALA_HOME}/bin:$PATH
    • 1
    • 2

After exiting, proceed to Source/etc/profile
Then execute Scala and the following interface shows that the installation was successful

Can be entered here
:quit
Exit Scala

Install Spark

After installing the above, we need to install today's lead spark, first to download the package we need
Http://spark.apache.org/downloads.html
This is it, it is important to note that we are in choose a package type: Here is pre-build for Hadoop2.6

Then click on the Download Spark link below to start the download.
This file will appear when finished
Also we need to give spark an installation directory

sudo mkdir /usr/lib/spark
    • 1

Unzip the file

-zxvf spark-1.6.1-bin-hadoop2.6.tgz -C /usr/lib/spark
    • 1

Configuring in/etc/profile

export SPARK_HOME=/usr/lib/spark/spark-1.6.1-bin-hadoop2.6export PATH=${SPARK_HOME}/bin:$PATH
    • 1
    • 2

source /etc/profile
After that, the execution
pyspark

This shows that the installation is complete and you can enter the appropriate Python code here to perform the operation.

Using Pyspark in Python

Of course, it's not possible to say that we're developing in such an interpreter in the later development process, so what we're going to do next is let Python load the spark library.

So we need to add the Pyspark to the Python search directory, and we need to edit the/etc/profile file in the end, add

PYTHONPATH=/usr/lib/spark/spark-1.6.1-bin-hadoop2.6/python:/usr/bin/python
    • 1

This adds the Python library under the Spark directory to the Python search directory.

But since Python needs to call the Java library, we need to add a py4j folder under the/usr/lib/spark/spark-1.6.1-bin-hadoop2.6/python path, which can be/usr/lib/ Spark/spark-1.6.1-bin-hadoop2.6/python/lib directory, in this directory there is a py4j-0.9-src.zip compression package, unzip him and put it into
You can do it under the/usr/lib/spark/spark-1.6.1-bin-hadoop2.6/python/directory.

Of course, this operation needs to be done under root authority.

Enter Python at this time in any directory

And then enter it here

import pyspark
    • 1

See if Pyspark can be imported correctly, and if there are no prompts, it means Pyspark can import normally.

In this way, you can write the. py file anywhere, and you need to import it in Pyspark place.

Pycharm Import Pyspark

Of course some users like to use Pycharm to write Python, so for pycharm use Pyspark also do a description

First we need to click the dropdown box in the top right corner and choose Edit configurations ...

Then in the popup dialog box, click enviroment Variables: Edit button on the right

Click the plus sign to add two new data,
Pythonpath and
Spark_home
The data content is the same as the content in the/etc/profile

Then test it with the following code

import pysparkconf = pyspark.SparkConf().setAppName("sparkDemo").setMaster("local")sc = pyspark.SparkContext(conf=conf)
    • 1
    • 2
    • 3
    • 4

Appear
Note that pycharm can also be loaded into the pyspark normally.

Transferred from: http://blog.csdn.net/u010171031/article/details/51849562

Ubuntu Spark Environment Setup (RPM)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.