Ubuntu Spark Environment Setup

Source: Internet
Author: User
Tags mkdir root access pyspark in python

Before installing spark, we need to install the JDK and Scala in our own system.
can go to the corresponding official Web download:
Jdk:http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
scala:http://www.scala-lang.org/download/
After the download is complete, you can get two compressed packages
Installing the JDK

First we'll install the JDK,

sudo mkdir/usr/lib/jdk

Use this statement to create the JDK installation directory, where we plan to install it in the/USR/LIB/JDK directory,
Then switch to the directory where the JDK compressed package is located, for example, where we put the package in the ~/desktop directory

CD ~/desktop

Perform the extract command to extract the compressed package into the/USR/LIB/JDK directory

sudo tar-zxvf jdk-8u91-linux-x64.tar.gz-c/usr/lib/jdk

Note that this requires root access. Otherwise, there's no way to write data to the/USR/LIB/JDK directory.
Then we need to configure the path path so that the JDK command can execute directly under any path

sudo vim/etc/profile

Open the configuration file, some tutorials will let you edit the ~/.BASHRC file in your own directory, the. bashrc file changes will only affect the current user, and/etc/profile changes will work for all users after the restart
At the end of the configuration file, add

Export java_home=/usr/lib/jdk/jdk1.8.0_91 export   
jre_home=${java_home}/jre  
export Classpath=.:${java_home }/lib:${jre_home}/lib  
export Path=${java_home}/bin: $PATH

The path to the Java_home is configured according to the directory that you unzipped.
Then the introduction of VIM, if you do not use VIM, you can appear in all commands in the vim of the place with Gedit to replace, the following is the same.

Source/etc/profile

The current terminal is re-loaded into the/etc/profile configuration file and executed

Java

If there is a large list of things, then congratulations on your JDK installed successfully, or it is likely that your environment configuration is a problem, please check carefully. Install Scala

And then we need to install Scala. The specific installation process is similar to the JDK,
The first is to create the installation directory

sudo mkdir/usr/lib/scala

Then, unzip the package.

sudo tar-zxvf scala-2.11.8.tgz-c/usr/lib/scala

Finally, open the/etc/profile and add the configuration at the end

Export scala_home=/usr/lib/scala/scala-2.11.8
export path=${scala_home}/bin: $PATH

After exiting, proceed to Source/etc/profile
Then execute Scala and the following interface shows that the installation was successful

Can be entered here
: Quit
Exit Scala install spark

After installing the above, we need to install today's lead spark, first to download the package we need
Http://spark.apache.org/downloads.html
This is its download address, it is important to note that we are in choose a package type: Here is the pre-build for Hadoop2.6

Then click on the Download Spark link below to start the download.
This file will appear when finished
Also we need to give spark an installation directory

sudo mkdir/usr/lib/spark

Unzip the file

sudo tar-zxvf spark-1.6.1-bin-hadoop2.6.tgz-c/usr/lib/spark

Configuring in/etc/profile

Export spark_home=/usr/lib/spark/spark-1.6.1-bin-hadoop2.6
export path=${spark_home}/bin: $PATH

Source/etc/profile
After that, the execution
Pyspark

This shows that the installation is complete and you can enter the appropriate Python code here to perform the operation. using Pyspark in Python

Of course, it's not possible to say that we're developing in such an interpreter in the later development process, so what we're going to do next is let Python load the spark library.

So we need to add the Pyspark to the Python search directory, and we need to edit the/etc/profile file in the end, add

Export Pythonpath=/usr/lib/spark/spark-1.6.1-bin-hadoop2.6/python:/usr/bin/python

This adds the Python library under the Spark directory to the Python search directory.

But since Python needs to call the Java library, we need to add a py4j folder under the/usr/lib/spark/spark-1.6.1-bin-hadoop2.6/python path, which can be/usr/lib/ Spark/spark-1.6.1-bin-hadoop2.6/python/lib directory, in this directory there is a py4j-0.9-src.zip compression package, unzip him and put it into
You can do it under the/usr/lib/spark/spark-1.6.1-bin-hadoop2.6/python/directory.

Of course, this operation needs to be done under root authority.

Enter Python at this time in any directory

And then enter it here

Import Pyspark

See if Pyspark can be imported correctly, and if there are no prompts, it means Pyspark can import normally.

In this way, you can write the. py file anywhere, and you need to import it in Pyspark place. pycharm Import Pyspark

Of course some users like to use Pycharm to write Python, so for pycharm use Pyspark also do a description

First we need to click the dropdown box in the top right corner and choose Edit configurations ...

Then in the popup dialog box, click enviroment Variables: Edit button on the right

Click the plus sign to add two new data,
Pythonpath and
Spark_home
The data content is the same as the content in the/etc/profile

Then test it with the following code

Import pyspark

conf = Pyspark. Sparkconf (). Setappname ("Sparkdemo"). Setmaster ("local")
sc = Pyspark. Sparkcontext (conf=conf)

Appear
Note that pycharm can also be loaded into the pyspark normally.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.