CentOS 6.4 + Hadoop2.2.0 Spark pseudo-distributed Installation

Last Update:2016-03-10 Source: Internet

Author: User

Tags pyspark

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

CentOS 6.4 + Hadoop2.2.0 Spark pseudo-distributed Installation

Hadoop is a stable version of 2.2.0.
Spark version: spark-0.9.1-bin-hadoop2 http://spark.apache.org/downloads.html
Spark has three versions:

For Hadoop 1 (HDP1, CDH3): find an Apache mirror or direct file download
For CDH4: find an Apache mirror or direct file download
For Hadoop 2 (HDP2, CDH5): find an Apache mirror or direct file download
My hadoop version is hadoop2.2.0, so the download is for hadoop2

For introduction to spark, see http://spark.apache.org/
Apache Spark is a fast and general engine for large-scale data processing.

Spark runtime requires scala environment, download the latest scala http://www.scala-lang.org/

Scala is a scalable language that is a multi-paradigm programming language, a programming similar to java, designed to integrate various features of object-oriented programming and functional programming. Scala runs on JVM. Scala is a pure object-oriented programming language that seamlessly integrates imperative and functional programming styles.

OK to start configuring spark:

I installed it under the hadoop installer, so edit/home/hadoop/. bashrc directly here.

[Hadoop @ localhost ~] $ Cat. bashrc
#. Bashrc

# Source global definitions
If [-f/etc/bashrc]; then
./Etc/bashrc
Fi

# User specific aliases and functions
Export HADOOP_HOME =/home/hadoop
Export HBASE_HOME =/home/hadoop/hbase
Export HIVE_HOME =/home/hadoop/hive
Export HADOOP_CONF_DIR = $ HADOOP_HOME/etc/hadoop
Export YARN_HOME =/etc/home/hadoop
Export YARN_CONF_DIR = $ HADOOP_HOME/etc/hadoop
Export SCALA_HOME =/home/hadoop/scala
Export SPARK_HOME =/home/hadoop/spark

Export PATH =$ {PATH }:$ HADOOP_HOME/bin: $ HADOOP_HOME/sbin: $ HBASE_HOME/bin: $ HIVE_HOME/bin: $ SCALA_HOME/bin: $ SPARK_HOME/bin
Export CLASSPATH = $ CLASSPATH: $ HADOOP/lib: $ HBASE_HOME/lib

1. scala installation:
Decompress scala to the hadoop root directory.
Ln-ls scala-2.11.0 scala # build soft links
Lrwxrwxrwx. 1 hadoop 12 May 21 09:15 scala-> scala-2.11.0
Drwxrwxr-x. 6 hadoop 4096 Apr 17 scala-2.11.0

Edit. bashrc to add export SCALA_HOME =/home/hadoop/scala
Export PATH =$ {PATH }:$ HADOOP_HOME/bin: $ HADOOP_HOME/sbin: $ HBASE_HOME/bin: $ HIVE_HOME/bin: $ SCALA_HOME/bin: $ SPARK_HOME/bin
Save and make environment variables take effect source. bashrc
Verify installation:
[Hadoop @ localhost ~] $ Scala-version
Scala code runner version 2.11.0 -- Copyright 2002-2013, LAMP/EPFL
The version is displayed normally, indicating that the installation is successful.

2: spark Configuration:
Tar-xzvf spark-0.9.1-bin-hadoop2.tgz
Ln-s spark-0.9.1-bin-hadoop2 spark
Configure. bashrc
Export SPARK_HOME =/home/hadoop/spark
Export PATH =$ {PATH }:$ HADOOP_HOME/bin: $ HADOOP_HOME/sbin: $ HBASE_HOME/bin: $ HIVE_HOME/bin: $ SCALA_HOME/bin: $ SPARK_HOME/bin

Source. bashrc is edited to make the environment variable take effect.

Spark-env.sh Configuration:
Spark-env.sh is nonexistent and needs to be generated from cat spark-env.sh.template> spark-env.sh

Then edit spark-env.sh

Add content
Export SCALA_HOME =/home/hadoop/scala
Export JAVA_HOME =/usr/java/jdk
Export SPARK_MASTER = localhost
Export SPARK_LOCAL_IP = localhost
Export HADOOP_HOME =/home/hadoop
Export SPARK_HOME =/home/hadoop/spark
Export SPARK_LIBARY_PATH =.: $ JAVA_HOME/lib: $ JAVA_HOME/jre/lib: $ HADOOP_HOME/lib/native
Export YARN_CONF_DIR = $ HADOOP_HOME/etc/hadoop

Save and exit

3. Start spark
Similar to the directory structure of hadoop, shell files for startup and shutdown are stored in sbin under spark.
-Rwxrwxr-x. 1 hadoop 2504 Mar 27 slaves. sh
-Rwxrwxr-x. 1 hadoop 1403 Mar 27 spark-config.sh
-Rwxrwxr-x. 1 hadoop 4503 Mar 27 spark-daemon.sh
-Rwxrwxr-x. 1 hadoop 1176 Mar 27 spark-daemons.sh
-Rwxrwxr-x. 1 hadoop 965 Mar 27 spark-executor
-Rwxrwxr-x. 1 hadoop 1263 Mar 27 start-all.sh
-Rwxrwxr-x. 1 hadoop 2384 Mar 27 start-master.sh
-Rwxrwxr-x. 1 hadoop 1520 Mar 27 start-slave.sh
-Rwxrwxr-x. 1 hadoop 2258 Mar 27 start-slaves.sh
-Rwxrwxr-x. 1 hadoop 1047 Mar 27 stop-all.sh
-Rwxrwxr-x. 1 hadoop 1124 Mar 27 stop-master.sh
-Rwxrwxr-x. 1 hadoop 1427 Mar 27 stop-slaves.sh
[Hadoop @ localhost sbin] $ pwd
/Home/hadoop/spark/sbin

You only need to run start-all here ~~~
[Hadoop @ localhost sbin] $./start-all.sh
Rsync from localhost
Rsync: change_dir "/home/hadoop/spark-0.9.1-bin-hadoop2/sbin/localhost" failed: No such file or directory (2)
Rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main. c (1039) [sender = 3.0.6]
Starting org. apache. spark. deploy. master. Master, logging to/home/hadoop/spark/logs/spark-hadoop-org.apache.spark.deploy.master.Master-1-localhost.out
Localhost: rsync from localhost
Localhost: rsync: change_dir "/home/hadoop/spark-0.9.1-bin-hadoop2/localhost" failed: No such file or directory (2)
Localhost: rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main. c (1039) [sender = 3.0.6]
Localhost: starting org. apache. spark. deploy. worker. Worker, logging to/home/hadoop/spark/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-localhost.out

Check whether the startup is successful through jps:
[Hadoop @ localhost sbin] $ jps
Jps 4706
3692 DataNode
3876 SecondaryNameNode
4637 Worker
4137 NodeManager
4517 Master
4026 ResourceManager
3587 NameNode

You can see that there is a Master and Worker Process indicating that the startup is successful.
You can view the spark cluster status through http: // localhost: 8080 /.

4. Run the spark Program
First, enter the bin directory under spark:
[Hadoop @ localhost sbin] $ ll ../bin/
Total 56
-Rw-r --. 1 hadoop 2601 Mar 27 compute-classpath.cmd
-Rwxrwxr-x. 1 hadoop 3330 Mar 27 compute-classpath.sh
-Rwxrwxr-x. 1 hadoop 2070 Mar 27 pyspark
-Rw-r --. 1 hadoop 1827 Mar 27 pyspark2.cmd
-Rw-r --. 1 hadoop 1000 Mar 27 pyspark. cmd
-Rwxrwxr-x. 1 hadoop 3055 Mar 27 run-example
-Rw-r --. 1 hadoop 2046 Mar 27 run-example2.cmd
-Rw-r --. 1 hadoop 1012 Mar 27 run-example.cmd
-Rwxrwxr-x. 1 hadoop 5151 Mar 27 spark-class
-Rwxrwxr-x. 1 hadoop 3212 Mar 27 spark-class2.cmd
-Rw-r --. 1 hadoop 1010 Mar 27 spark-class.cmd
-Rwxrwxr-x. 1 hadoop 3184 Mar 27 spark-shell
-Rwxrwxr-x. 1 hadoop 941 Mar 27 spark-shell.cmd

Run-example org. apache. spark. examples. SparkLR spark: // localhost: 7077

Run-example org. apache. spark. examples. SparkPi spark: // localhost: 7077

Hadoop + Zookeeper)

Hadoop2.7 fully distributed cluster construction and task Testing

Step by step teach you how to install and configure Hadoop multi-node Clusters

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More