Build a Spark development environment in Ubuntu

Source: Internet
Author: User

Build a Spark development environment in Ubuntu

Configure Ubuntu to use Python to develop Spark applications

Ubuntu 64-bit basic environment Configuration

Install JDK, download jdk-8u45-linux-x64.tar.gz, and decompress it to/opt/jdk1.8.0 _ 45.

: Http://www.Oracle.com/technetwork/java/javase/downloads/index.html

Install scala, download scala-2.11.6.tgz, unzip to/opt/scala-2.11.6

Address: http://www.scala-lang.org/

Install Spark, download spark-1.3.1-bin-Hadoop2.6.tgz, unzip to/opt/spark-hadoop

: Http://spark.apache.org/downloads.html,

Configure environment variables, edit/etc/profile, and execute the following command

Python @ ubuntu :~ $ Sudo gedit/etc/profile

Add at most files:

# Seeting JDK Environment Variables

Export JAVA_HOME =/opt/jdk1.8.0 _ 45

Export JRE_HOME =$ {JAVA_HOME}/jre

Export CLASSPATH =. :$ {JAVA_HOME}/lib :$ {JRE_HOME}/lib

Export PATH =$ {JAVA_HOME}/bin :$ {JRE_HOME}/bin: $ PATH

# Seeting Scala environment variable

Export SCALA_HOME =/opt/scala-2.11.6

Export PATH =$ {SCALA_HOME}/bin: $ PATH

# Setting Spark environment variable

Export SPARK_HOME =/opt/spark-hadoop/

# PythonPath: add the Python Environment added to the pySpark module in Spark

Export PYTHONPATH =/opt/spark-hadoop/python

Restart the computer to make the/etc/profile take effect permanently and take effect temporarily. Open the command window and execute source/etc/profile to take effect in the current window.

  • Test installation result

    • Open the command window and switch to the Spark root directory.

    • Run./bin/spark-shell to open the connection window from Scala to Spark.

No error message during startup. scala> is displayed. the startup is successful.

    • Run./bin/pyspark to open the connection window from Python to Spark.

There is no error during the startup process. If the preceding error occurs, the startup is successful.

    • Access through a browser: The following page appears

Test SPark availability.

  • Python amfa Spark Application

    • You have set PYTHONPATH to add pyspark to the search path of Python.

    • Open the Spark installation directory, and copy py4j under the Python-build folder to the Python directory,

    • Open the command line window and enter python. The Python version is 2.7.6. Note that Spark does not support Python3.

    • Enter import pyspark, as shown in, to prove that the work is completed before development.

    • Use Pycharm to create a project and use the code in the red box to test the project:

For more Spark tutorials, see the following:

Install and configure Spark in CentOS 7.0

Spark1.0.0 Deployment Guide

Install Spark0.8.0 in CentOS 6.2 (64-bit)

Introduction to Spark and its installation and use in Ubuntu

Install the Spark cluster (on CentOS)

Hadoop vs Spark Performance Comparison

Spark installation and learning

Spark Parallel Computing Model

Spark details: click here
Spark: click here

This article permanently updates the link address:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.