Deploy an Apache Spark cluster in Ubuntu
1. Software Environment
This article describes how to deploy an Apache Spark Standalone Cluster on Ubuntu. The required software is as follows:
- Ubuntu 15.10x64
- Apache Spark 1.5.1
2. everything required for Installation
# sudo apt-get install git -y# sudo apt-add-repository ppa:webupd8team/java -y# sudo apt-get update -y# sudo apt-get install Oracle-java8-installer -y# sudo apt-get install oracle-java8-set-default # sudo apt-get install maven gradle -y# sudo apt-get install sbt -y# sudo wget http://d3kbcqa49mib13.cloudfront.net/spark-1.5.1-bin-Hadoop2.6.tgz# sudo tar -xvf spark*.tgz# sudo chmod 755 spark*# sudo apt-get update# sudo apt-get install -y openjdk-7-jdk# sudo apt-get install -y autoconf libtool# sudo apt-get -y install build-essential python-dev python-boto libcurl4-nss-dev libsasl2-dev maven libapr1-dev libsvn-dev# sudo apt-key adv --keyserver keyserver.ubuntu.com --recv E56151BFDISTRO=$(lsb_release -is | tr '[:upper:]' '[:lower:]')CODENAME=$(lsb_release -cs)
Add to software Repository:
# echo "deb http://repos.mesosphere.io/${DISTRO} ${CODENAME} main" | \ sudo tee /etc/apt/sources.list.d/mesosphere.list# sudo apt-get -y update# sudo apt-get -y install mesos
Apache Mesos is also installed to facilitate the upgrade of the Spark cluster from the independent cluster mode in the future.
Spark-1.5.1-bin-hadoop2.6 is used for Spark standalone Clusters
conf/spark-env.sh#!/usr/bin/env bashexport SPARK_LOCAL_IP=MYIP
3. Start a node
# sbin/start-slave.sh masterIP:7077
For more information, see:
- Http://spark.apache.org/docs/latest/running-on-mesos.html
- Https://mesosphere.com/downloads/
- Https://spark.apache.org/downloads.html
4. install other tools and servers
1) install MongoDB 3.0.4
# sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 7F0CEB10# echo "deb http://repo.mongodb.org/apt/ubuntu "$(lsb_release -sc)"/mongodb-org/3.0 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-3.0.list# sudo apt-get update# sudo apt-get install -y mongodb-org# sudo apt-get install -y mongodb-org=3.0.4 mongodb-org-server=3.0.4 mongodb-org-shell=3.0.4 mongodb-org-mongos=3.0.4 mongodb-org-tools=3.0.4# sudo service mongod start# sudo tail -5000 /var/log/mongodb/mongod.log
2) install PostgreSQL
For more information, see:
Https://www.digitalocean.com/community/tutorials/how-to-install-and-use-postgresql-on-ubuntu-14-04
# sudo apt-get update# sudo apt-get install postgresql postgresql-contrib
3) install Redis
For more information, see:
Https://www.digitalocean.com/community/tutorials/how-to-install-and-use-redis
# sudo apt-get install build-essential# sudo apt-get install tcl8.5# sudo wget http://download.redis.io/releases/redis-stable.tar.gz# sudo tar xzf redis-stable.tar.gz# cd redis-stable# make# make test# sudo make install# cd utils# sudo ./install_server.sh# sudo service redis_6379 start# redis-cli
4) install Scala 2.11.7
For more information, see:
- Http://blog.prabeeshk.com/blog/2014/10/31/install-apache-spark-on-ubuntu-14-dot-04/
- Http://www.scala-lang.org/download/2.11.7.html
Run the following command:
# sudo wget http://downloads.typesafe.com/scala/2.11.7/scala-2.11.7.deb# sudo dpkg -i scala-2.11.7.deb
For more information, see:
Http://www.scala-sbt.org/0.13/tutorial/Installing-sbt-on-Linux.html
# echo "deb http://dl.bintray.com/sbt/debian /" | sudo tee -a /etc/apt/sources.list.d/sbt.list# sudo apt-get update# sudo apt-get install sbt# sudo apt-get install unzip# curl -s get.gvmtool.net | bash# source "/root/.gvm/bin/gvm-init.sh"# gvm install gradle
For more Spark tutorials, see the following:
Install and configure Spark in CentOS 7.0
Spark1.0.0 Deployment Guide
Install Spark0.8.0 in CentOS 6.2 (64-bit)
Introduction to Spark and its installation and use in Ubuntu
Install the Spark cluster (on CentOS)
Hadoop vs Spark Performance Comparison
Spark installation and learning
Spark Parallel Computing Model
Spark details: click here
Spark: click here
This article permanently updates the link address: