CentOS 6.4 + Hadoop2.2.0 Spark pseudo-distributed Installation

Source: Internet
Author: User
Tags pyspark

CentOS 6.4 + Hadoop2.2.0 Spark pseudo-distributed Installation

Hadoop is a stable version of 2.2.0.
Spark version: spark-0.9.1-bin-hadoop2 http://spark.apache.org/downloads.html
Spark has three versions:

For Hadoop 1 (HDP1, CDH3): find an Apache mirror or direct file download
For CDH4: find an Apache mirror or direct file download
For Hadoop 2 (HDP2, CDH5): find an Apache mirror or direct file download
My hadoop version is hadoop2.2.0, so the download is for hadoop2

For introduction to spark, see http://spark.apache.org/
Apache Spark is a fast and general engine for large-scale data processing.

Spark runtime requires scala environment, download the latest scala http://www.scala-lang.org/

Scala is a scalable language that is a multi-paradigm programming language, a programming similar to java, designed to integrate various features of object-oriented programming and functional programming. Scala runs on JVM. Scala is a pure object-oriented programming language that seamlessly integrates imperative and functional programming styles.

OK to start configuring spark:

I installed it under the hadoop installer, so edit/home/hadoop/. bashrc directly here.

[Hadoop @ localhost ~] $ Cat. bashrc
#. Bashrc

# Source global definitions
If [-f/etc/bashrc]; then
./Etc/bashrc
Fi

# User specific aliases and functions
Export HADOOP_HOME =/home/hadoop
Export HBASE_HOME =/home/hadoop/hbase
Export HIVE_HOME =/home/hadoop/hive
Export HADOOP_CONF_DIR = $ HADOOP_HOME/etc/hadoop
Export YARN_HOME =/etc/home/hadoop
Export YARN_CONF_DIR = $ HADOOP_HOME/etc/hadoop
Export SCALA_HOME =/home/hadoop/scala
Export SPARK_HOME =/home/hadoop/spark

Export PATH =$ {PATH }:$ HADOOP_HOME/bin: $ HADOOP_HOME/sbin: $ HBASE_HOME/bin: $ HIVE_HOME/bin: $ SCALA_HOME/bin: $ SPARK_HOME/bin
Export CLASSPATH = $ CLASSPATH: $ HADOOP/lib: $ HBASE_HOME/lib

1. scala installation:
Decompress scala to the hadoop root directory.
Ln-ls scala-2.11.0 scala # build soft links
Lrwxrwxrwx. 1 hadoop 12 May 21 09:15 scala-> scala-2.11.0
Drwxrwxr-x. 6 hadoop 4096 Apr 17 scala-2.11.0

Edit. bashrc to add export SCALA_HOME =/home/hadoop/scala
Export PATH =$ {PATH }:$ HADOOP_HOME/bin: $ HADOOP_HOME/sbin: $ HBASE_HOME/bin: $ HIVE_HOME/bin: $ SCALA_HOME/bin: $ SPARK_HOME/bin
Save and make environment variables take effect source. bashrc
Verify installation:
[Hadoop @ localhost ~] $ Scala-version
Scala code runner version 2.11.0 -- Copyright 2002-2013, LAMP/EPFL
The version is displayed normally, indicating that the installation is successful.

2: spark Configuration:
Tar-xzvf spark-0.9.1-bin-hadoop2.tgz
Ln-s spark-0.9.1-bin-hadoop2 spark
Configure. bashrc
Export SPARK_HOME =/home/hadoop/spark
Export PATH =$ {PATH }:$ HADOOP_HOME/bin: $ HADOOP_HOME/sbin: $ HBASE_HOME/bin: $ HIVE_HOME/bin: $ SCALA_HOME/bin: $ SPARK_HOME/bin

Source. bashrc is edited to make the environment variable take effect.

Spark-env.sh Configuration:
Spark-env.sh is nonexistent and needs to be generated from cat spark-env.sh.template> spark-env.sh

Then edit spark-env.sh

Add content
Export SCALA_HOME =/home/hadoop/scala
Export JAVA_HOME =/usr/java/jdk
Export SPARK_MASTER = localhost
Export SPARK_LOCAL_IP = localhost
Export HADOOP_HOME =/home/hadoop
Export SPARK_HOME =/home/hadoop/spark
Export SPARK_LIBARY_PATH =.: $ JAVA_HOME/lib: $ JAVA_HOME/jre/lib: $ HADOOP_HOME/lib/native
Export YARN_CONF_DIR = $ HADOOP_HOME/etc/hadoop

Save and exit

3. Start spark
Similar to the directory structure of hadoop, shell files for startup and shutdown are stored in sbin under spark.
-Rwxrwxr-x. 1 hadoop 2504 Mar 27 slaves. sh
-Rwxrwxr-x. 1 hadoop 1403 Mar 27 spark-config.sh
-Rwxrwxr-x. 1 hadoop 4503 Mar 27 spark-daemon.sh
-Rwxrwxr-x. 1 hadoop 1176 Mar 27 spark-daemons.sh
-Rwxrwxr-x. 1 hadoop 965 Mar 27 spark-executor
-Rwxrwxr-x. 1 hadoop 1263 Mar 27 start-all.sh
-Rwxrwxr-x. 1 hadoop 2384 Mar 27 start-master.sh
-Rwxrwxr-x. 1 hadoop 1520 Mar 27 start-slave.sh
-Rwxrwxr-x. 1 hadoop 2258 Mar 27 start-slaves.sh
-Rwxrwxr-x. 1 hadoop 1047 Mar 27 stop-all.sh
-Rwxrwxr-x. 1 hadoop 1124 Mar 27 stop-master.sh
-Rwxrwxr-x. 1 hadoop 1427 Mar 27 stop-slaves.sh
[Hadoop @ localhost sbin] $ pwd
/Home/hadoop/spark/sbin

You only need to run start-all here ~~~
[Hadoop @ localhost sbin] $./start-all.sh
Rsync from localhost
Rsync: change_dir "/home/hadoop/spark-0.9.1-bin-hadoop2/sbin/localhost" failed: No such file or directory (2)
Rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main. c (1039) [sender = 3.0.6]
Starting org. apache. spark. deploy. master. Master, logging to/home/hadoop/spark/logs/spark-hadoop-org.apache.spark.deploy.master.Master-1-localhost.out
Localhost: rsync from localhost
Localhost: rsync: change_dir "/home/hadoop/spark-0.9.1-bin-hadoop2/localhost" failed: No such file or directory (2)
Localhost: rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main. c (1039) [sender = 3.0.6]
Localhost: starting org. apache. spark. deploy. worker. Worker, logging to/home/hadoop/spark/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-localhost.out

Check whether the startup is successful through jps:
[Hadoop @ localhost sbin] $ jps
Jps 4706
3692 DataNode
3876 SecondaryNameNode
4637 Worker
4137 NodeManager
4517 Master
4026 ResourceManager
3587 NameNode

You can see that there is a Master and Worker Process indicating that the startup is successful.
You can view the spark cluster status through http: // localhost: 8080 /.

4. Run the spark Program
First, enter the bin directory under spark:
[Hadoop @ localhost sbin] $ ll ../bin/
Total 56
-Rw-r --. 1 hadoop 2601 Mar 27 compute-classpath.cmd
-Rwxrwxr-x. 1 hadoop 3330 Mar 27 compute-classpath.sh
-Rwxrwxr-x. 1 hadoop 2070 Mar 27 pyspark
-Rw-r --. 1 hadoop 1827 Mar 27 pyspark2.cmd
-Rw-r --. 1 hadoop 1000 Mar 27 pyspark. cmd
-Rwxrwxr-x. 1 hadoop 3055 Mar 27 run-example
-Rw-r --. 1 hadoop 2046 Mar 27 run-example2.cmd
-Rw-r --. 1 hadoop 1012 Mar 27 run-example.cmd
-Rwxrwxr-x. 1 hadoop 5151 Mar 27 spark-class
-Rwxrwxr-x. 1 hadoop 3212 Mar 27 spark-class2.cmd
-Rw-r --. 1 hadoop 1010 Mar 27 spark-class.cmd
-Rwxrwxr-x. 1 hadoop 3184 Mar 27 spark-shell
-Rwxrwxr-x. 1 hadoop 941 Mar 27 spark-shell.cmd

Run-example org. apache. spark. examples. SparkLR spark: // localhost: 7077

Run-example org. apache. spark. examples. SparkPi spark: // localhost: 7077

Hadoop + Zookeeper)

Hadoop2.7 fully distributed cluster construction and task Testing

Step by step teach you how to install and configure Hadoop multi-node Clusters

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.