CentOS 6.4 + Hadoop2.2.0 Spark pseudo-distributed Installation
Hadoop is a stable version of 2.2.0.
Spark version: spark-0.9.1-bin-hadoop2 http://spark.apache.org/downloads.html
Spark has three versions:
For Hadoop 1 (HDP1, CDH3): find an Apache mirror or direct file download
For CDH4: find an Apache mirror or direct file download
For Hadoop 2 (HDP2, CDH5): find an Apache mirror or direct file download
My hadoop version is hadoop2.2.0, so the download is for hadoop2
For introduction to spark, see http://spark.apache.org/
Apache Spark is a fast and general engine for large-scale data processing.
Spark runtime requires scala environment, download the latest scala http://www.scala-lang.org/
Scala is a scalable language that is a multi-paradigm programming language, a programming similar to java, designed to integrate various features of object-oriented programming and functional programming. Scala runs on JVM. Scala is a pure object-oriented programming language that seamlessly integrates imperative and functional programming styles.
OK to start configuring spark:
I installed it under the hadoop installer, so edit/home/hadoop/. bashrc directly here.
[Hadoop @ localhost ~] $ Cat. bashrc
#. Bashrc
# Source global definitions
If [-f/etc/bashrc]; then
./Etc/bashrc
Fi
# User specific aliases and functions
Export HADOOP_HOME =/home/hadoop
Export HBASE_HOME =/home/hadoop/hbase
Export HIVE_HOME =/home/hadoop/hive
Export HADOOP_CONF_DIR = $ HADOOP_HOME/etc/hadoop
Export YARN_HOME =/etc/home/hadoop
Export YARN_CONF_DIR = $ HADOOP_HOME/etc/hadoop
Export SCALA_HOME =/home/hadoop/scala
Export SPARK_HOME =/home/hadoop/spark
Export PATH =$ {PATH }:$ HADOOP_HOME/bin: $ HADOOP_HOME/sbin: $ HBASE_HOME/bin: $ HIVE_HOME/bin: $ SCALA_HOME/bin: $ SPARK_HOME/bin
Export CLASSPATH = $ CLASSPATH: $ HADOOP/lib: $ HBASE_HOME/lib
1. scala installation:
Decompress scala to the hadoop root directory.
Ln-ls scala-2.11.0 scala # build soft links
Lrwxrwxrwx. 1 hadoop 12 May 21 09:15 scala-> scala-2.11.0
Drwxrwxr-x. 6 hadoop 4096 Apr 17 scala-2.11.0
Edit. bashrc to add export SCALA_HOME =/home/hadoop/scala
Export PATH =$ {PATH }:$ HADOOP_HOME/bin: $ HADOOP_HOME/sbin: $ HBASE_HOME/bin: $ HIVE_HOME/bin: $ SCALA_HOME/bin: $ SPARK_HOME/bin
Save and make environment variables take effect source. bashrc
Verify installation:
[Hadoop @ localhost ~] $ Scala-version
Scala code runner version 2.11.0 -- Copyright 2002-2013, LAMP/EPFL
The version is displayed normally, indicating that the installation is successful.
2: spark Configuration:
Tar-xzvf spark-0.9.1-bin-hadoop2.tgz
Ln-s spark-0.9.1-bin-hadoop2 spark
Configure. bashrc
Export SPARK_HOME =/home/hadoop/spark
Export PATH =$ {PATH }:$ HADOOP_HOME/bin: $ HADOOP_HOME/sbin: $ HBASE_HOME/bin: $ HIVE_HOME/bin: $ SCALA_HOME/bin: $ SPARK_HOME/bin
Source. bashrc is edited to make the environment variable take effect.
Spark-env.sh Configuration:
Spark-env.sh is nonexistent and needs to be generated from cat spark-env.sh.template> spark-env.sh
Then edit spark-env.sh
Add content
Export SCALA_HOME =/home/hadoop/scala
Export JAVA_HOME =/usr/java/jdk
Export SPARK_MASTER = localhost
Export SPARK_LOCAL_IP = localhost
Export HADOOP_HOME =/home/hadoop
Export SPARK_HOME =/home/hadoop/spark
Export SPARK_LIBARY_PATH =.: $ JAVA_HOME/lib: $ JAVA_HOME/jre/lib: $ HADOOP_HOME/lib/native
Export YARN_CONF_DIR = $ HADOOP_HOME/etc/hadoop
Save and exit
3. Start spark
Similar to the directory structure of hadoop, shell files for startup and shutdown are stored in sbin under spark.
-Rwxrwxr-x. 1 hadoop 2504 Mar 27 slaves. sh
-Rwxrwxr-x. 1 hadoop 1403 Mar 27 spark-config.sh
-Rwxrwxr-x. 1 hadoop 4503 Mar 27 spark-daemon.sh
-Rwxrwxr-x. 1 hadoop 1176 Mar 27 spark-daemons.sh
-Rwxrwxr-x. 1 hadoop 965 Mar 27 spark-executor
-Rwxrwxr-x. 1 hadoop 1263 Mar 27 start-all.sh
-Rwxrwxr-x. 1 hadoop 2384 Mar 27 start-master.sh
-Rwxrwxr-x. 1 hadoop 1520 Mar 27 start-slave.sh
-Rwxrwxr-x. 1 hadoop 2258 Mar 27 start-slaves.sh
-Rwxrwxr-x. 1 hadoop 1047 Mar 27 stop-all.sh
-Rwxrwxr-x. 1 hadoop 1124 Mar 27 stop-master.sh
-Rwxrwxr-x. 1 hadoop 1427 Mar 27 stop-slaves.sh
[Hadoop @ localhost sbin] $ pwd
/Home/hadoop/spark/sbin
You only need to run start-all here ~~~
[Hadoop @ localhost sbin] $./start-all.sh
Rsync from localhost
Rsync: change_dir "/home/hadoop/spark-0.9.1-bin-hadoop2/sbin/localhost" failed: No such file or directory (2)
Rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main. c (1039) [sender = 3.0.6]
Starting org. apache. spark. deploy. master. Master, logging to/home/hadoop/spark/logs/spark-hadoop-org.apache.spark.deploy.master.Master-1-localhost.out
Localhost: rsync from localhost
Localhost: rsync: change_dir "/home/hadoop/spark-0.9.1-bin-hadoop2/localhost" failed: No such file or directory (2)
Localhost: rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main. c (1039) [sender = 3.0.6]
Localhost: starting org. apache. spark. deploy. worker. Worker, logging to/home/hadoop/spark/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-localhost.out
Check whether the startup is successful through jps:
[Hadoop @ localhost sbin] $ jps
Jps 4706
3692 DataNode
3876 SecondaryNameNode
4637 Worker
4137 NodeManager
4517 Master
4026 ResourceManager
3587 NameNode
You can see that there is a Master and Worker Process indicating that the startup is successful.
You can view the spark cluster status through http: // localhost: 8080 /.
4. Run the spark Program
First, enter the bin directory under spark:
[Hadoop @ localhost sbin] $ ll ../bin/
Total 56
-Rw-r --. 1 hadoop 2601 Mar 27 compute-classpath.cmd
-Rwxrwxr-x. 1 hadoop 3330 Mar 27 compute-classpath.sh
-Rwxrwxr-x. 1 hadoop 2070 Mar 27 pyspark
-Rw-r --. 1 hadoop 1827 Mar 27 pyspark2.cmd
-Rw-r --. 1 hadoop 1000 Mar 27 pyspark. cmd
-Rwxrwxr-x. 1 hadoop 3055 Mar 27 run-example
-Rw-r --. 1 hadoop 2046 Mar 27 run-example2.cmd
-Rw-r --. 1 hadoop 1012 Mar 27 run-example.cmd
-Rwxrwxr-x. 1 hadoop 5151 Mar 27 spark-class
-Rwxrwxr-x. 1 hadoop 3212 Mar 27 spark-class2.cmd
-Rw-r --. 1 hadoop 1010 Mar 27 spark-class.cmd
-Rwxrwxr-x. 1 hadoop 3184 Mar 27 spark-shell
-Rwxrwxr-x. 1 hadoop 941 Mar 27 spark-shell.cmd
Run-example org. apache. spark. examples. SparkLR spark: // localhost: 7077
Run-example org. apache. spark. examples. SparkPi spark: // localhost: 7077
Hadoop + Zookeeper)
Hadoop2.7 fully distributed cluster construction and task Testing
Step by step teach you how to install and configure Hadoop multi-node Clusters