Spark1.2 cluster Environment (STANDALONE+HA) 4G memory 5 nodes are also pretty hard to fight.

Source: Internet
Author: User
Tags directory create scp command

Preparatory work:

1, notebook 4G memory, operating system WIN7

2, tools VMware Workstation

3, Virtual machine: CentOS6.4 Total Five

4, build a good Hadoop cluster (convenient for spark to read files from HDSF, test)

Lab Environment:

Hadoop ha cluster:

Ip

Hostname

Role

192.168.249.130

SY-0130

Activenamenode

192.168.249.131

SY-0131

Standbynamenode

192.168.249.132

SY-0132

DataNode1

192.168.249.133

SY-0133

DataNode2

Spark ha cluster:

Ip

Hostname

Role

192.168.249.134

SY-0134

Master

192.168.249.130

SY-0130

StandBy Master

192.168.249.131

SY-0131

Worker

192.168.249.132

SY-0132

Worker

192.168.249.133

SY-0133

Worker

The experimental environment is only for learning, 4G of memory is indeed very hard, the resources are very limited. Next week switch to a few desktops for the cluster.

The above SY-0134 is a newly cloned virtual machine, as master in the environment of Spark, the 4 nodes originally belonging to the Hadoop cluster, respectively, as the standbymaster and worker roles.

About Virtual machine environment settings, network configuration, Hadoop cluster setup See "Hadoop2.6 Cluster Environment building"

This article focuses on Spark1.2 environment, zookeeper environment simple construction, only for learning and experimental prototype, and does not involve too much theoretical knowledge.

Software Installation:

(Note: User Hadoop login SY-0134)

1, in the node Sy-0134,hadoop User Directory Create Toolkit folder, used to save all software installation packages, the establishment of LABSP files as the experimental environment directory.

[Email protected] ~]$ mkdir labsp

[Email protected]~]$ mkdir Toolkit

I'll store the downloaded packages in toolkit as follows

[Email protected] toolkit]$ lshadoop-2.5.2.tar.gz  hadoop-2.6.0.tar.gz  jdk-7u71-linux-i586.gz  Scala-2.10.3.tgz  spark-1.2.0-bin-hadoop2.3.tgz  zookeeper-3.4.6.tar.gz

2, This experiment I downloaded the spark package is spark-1.2.0-bin-hadoop2.3.tgz, the Scala version is 2.10.3,zookeeper is 3.4.6. It is important to note that spark and Scala have version correspondence, which can be found in the Spark website for the version of Scala supported by Spark.

3. JDK installation and environment variable settings

[Email protected] ~]$ mkdir Lab

#我将jdk7安装在lab目录

[Email protected] jdk1.7.0_71]$ pwd

/home/hadoop/lab/jdk1.7.0_71

#环境变量设置:

[Email protected] ~]$ Vi. bash_profile

# User specific environment and startup Programsexport java_home=/home/hadoop/lab/jdk1.7.0_71path= $JAVA _home/bin:$ PATH: $HOME/binexport pathexport classpath=.: $JAVA _home/lib/dt.jar: $JAVA _home/lib/tools.jar

#设置生效

[[email protected] ~]$ source. bash_profile

4. Scala installation and environment variable settings

I unzipped Scala to the/home/hadoop/labsp/scala-2.10.3 location.

Modify the. bash_profile file

Added: Export scala_home=/home/hadoop/labsp/scala-2.10.3

Modified: path= $JAVA _home/bin: $PATH: $HOME/bin: $SCALA _home/bin

#设置生效

[[email protected] ~]$ source. bash_profile

Verify that Scala is installed:

[Email protected] ~]$ Scala

Welcome to Scala version 2.10.3 (Java HotSpot (TM) Client VM, Java 1.7.0_71).

The above display installs successfully.

5. Spark Installation and Environment configuration

I unzipped the spark to the/home/hadoop/labsp/spark1.2_hadoop2.3 location. The package that is downloaded is a precompiled package.

Modify the. bash_profile file

Added: Export spark_home=/home/hadoop/labsp/spark1.2_hadoop2.3

Modified: path= $JAVA _home/bin: $PATH: $HOME/bin: $SCALA _home/bin: $SPARK _home/bin

#设置生效

[[email protected] ~]$ source. bash_profile

#修改spark-env.sh

[Email protected] conf]$ pwd

/home/hadoop/labsp/spark1.2_hadoop2.3/conf

[[email protected] conf] $vi spark-env.sh

Core configuration:

Export java_home=/home/hadoop/lab/jdk1.7.0_71

Export scala_home=/home/hadoop/labsp/scala-2.10.3

Export spark_daemon_java_opts= "-dspark.deploy.recoverymode=zookeeper-dspark.deploy.zookeeper.url=sy-0134:2181, Sy-0130:2181,sy-0131:2181,sy-0132:2181,sy-0133:2181-dspark.deploy.zookeeper.dir=/spark "

At this point Jdk,scala,spark installation and environment variable settings, of course, the above configuration steps can also be modified once completed.

6. Zookeeper installation

I unzipped the zookeeper to the/home/hadoop/labsp/zookeeper-3.4.6 position.

#配置zoo. cfg file

[Email protected] zookeeper-3.4.6]$ pwd

/home/hadoop/labsp/zookeeper-3.4.6

[Email protected] zookeeper-3.4.6]$ mkdir data

[Email protected] zookeeper-3.4.6]$ mkdir Datalog

[Email protected] zookeeper-3.4.6]$ CD conf

[email protected] conf]$ CP zoo_sample.cfg ZOO.CFG

[Email protected] conf]$ VI zoo.cfg

# The number of milliseconds of each tickticktime=2000# the number of ticks, the initial # synchronization phase can T akeinitlimit=10# the number of ticks so can pass between # Sending a request and getting an acknowledgementsynclimit=5# The directory where the snapshot is stored.# does not use/tmp for storage,/tmp here is just # example sakes.datadir=/home/ hadoop/labsp/zookeeper-3.4.6/datadatalogdir=/home/hadoop/labsp/zookeeper-3.4.6/datalog# the port at which the Clients would connectclientport=2181# the maximum number of client connections.# increase this if you need to handle more C lients#maxclientcnxns=60## Be sure to read the maintenance sections of the # Administrator guide before turning on Autopurg e.## http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance## The number of snapshots to retain in datadir#autopurge.snapretaincount=3# Purge task interval in hours# Set to ' 0 ' to disable auto Purge Feature#autopurge.purg Einterval=1server.1=sy-0134:2888:3888server.2=sy-0130:2888:3888server.3=sy-0131:2888:3888server.4=sy-0132:2888:3888server.5=sy-0133:2888:3888 

#配置myid文件

[Email protected] data]$ pwd

/home/hadoop/labsp/zookeeper-3.4.6/data

Enter 1 myID file into SY-0134 's zookeeper

echo "1" > Home/hadoop/labsp/zookeeper-3.4.6/data/myid

7. SSH Password-free login

Although in the Hadoop cluster, SY-0130 can password-free login to sy-0131,sy-0132,sy-0133.

But in this spark cluster, master is SY-0134, and he needs to be able to login to sy-0130,sy-0131,sy-0132,sy-0133 without a password.

#我是先在SY-0134, generate the public key.

[[email protected] ~]$ ssh-keygen-t RSA

[[Email protected] ~]$ CD. SSH

[email protected]. ssh]$ ls

Id_rsa id_rsa.pub known_hosts

Copy #将id_rsa. pub file to SY-0130

[email protected]. ssh]$ SCP id_rsa.pub [email Protected]:~/.ssh/authorized_keys

#在SY-0130, generate the public key.

[[email protected] ~]$ ssh-keygen-t RSA

[[Email protected] ~]$ CD. SSH

[email protected]. ssh]$ ls

Authorized_keys Id_rsa id_rsa.pub known_hosts

The contents of the #将id_rsa. pub file are appended to the Authorized_keys. A little bit special.

[email protected]. ssh]$ Cat Id_rsa.pub >>authorized_keys

The Authorized_keys file under #将SY-0130 is copied to sy-0131,sy-0132,sy-0133 using the SCP command.

8, the other node Spark,scala, zookeeper installation

The above 7 steps only completed the installation of SY-0134, Spark,scala, zookeeper, the three installation files directory SCP command must be copied to the sy-0130,sy-0131,sy-132,sy-0133 directory, and the same environment variables.

[[email protected] labsp]$ ls

scala-2.10.3 spark1.2_hadoop2.3 zookeeper-3.4.6

Another point, Zoookeeper server on different nodes, myID file content is not the same.

echo "1" > Home/hadoop/labsp/zookeeper-3.4.6/data/myid #SY-0134

echo "2" > Home/hadoop/labsp/zookeeper-3.4.6/data/myid #SY-0130

echo "3" > Home/hadoop/labsp/zookeeper-3.4.6/data/myid #SY-0131

echo "4" > Home/hadoop/labsp/zookeeper-3.4.6/data/myid #SY-0132

echo "5" > Home/hadoop/labsp/zookeeper-3.4.6/data/myid #SY-0133

Cluster boot test:

1, start zookeeper on 5 nodes respectively.

[[email protected] zookeeper-3.4.6]$ bin/zkserver.sh start

[[email protected] zookeeper-3.4.6]$ bin/zkserver.sh start

[[email protected] zookeeper-3.4.6]$ bin/zkserver.sh start

[[email protected] zookeeper-3.4.6]$ bin/zkserver.sh start

[[email protected] zookeeper-3.4.6]$ bin/zkserver.sh start

2. start Spark Master in SY-0134

[Email protected] spark1.2_hadoop2.3]$ sbin/start-all.shstarting org.apache.spark.deploy.master.Master, logging to/ home/hadoop/labsp/spark1.2_hadoop2.3/sbin/. /logs/spark-hadoop-org.apache.spark.deploy.master.master-1-sy-0134.outsy-0133:starting Org.apache.spark.deploy.worker.Worker, logging to/home/hadoop/labsp/spark1.2_hadoop2.3/sbin/. /logs/spark-hadoop-org.apache.spark.deploy.worker.worker-1-sy-0133.outsy-0132:starting Org.apache.spark.deploy.worker.Worker, logging to/home/hadoop/labsp/spark1.2_hadoop2.3/sbin/. /logs/spark-hadoop-org.apache.spark.deploy.worker.worker-1-sy-0132.outsy-0131:starting Org.apache.spark.deploy.worker.Worker, logging to/home/hadoop/labsp/spark1.2_hadoop2.3/sbin/. /logs/spark-hadoop-org.apache.spark.deploy.worker.worker-1-sy-0131.out

3. start Standby Spark Master in SY-0130

[Email protected] spark1.2_hadoop2.3]$ sbin/start-master.sh

Starting Org.apache.spark.deploy.master.Master, logging to/lab/labsp/spark1.2_hadoop2.3/sbin/. /logs/spark-hadoop-org.apache.spark.deploy.master.master-1-sy-0130.out

With this experimental environment, you can continue to learn more about Spark's operational architecture, Sparksql, and more.

This blog post except special statement, all are original!

may be reproduced, but must be in the form of hyperlinks to the original source of the article and the author's information and copyright notice.

Respect the original, reproduced please specify: Reproduced from Jackyken (http://www.cnblogs.com/xiejin)

This article link address: " Spark1.2 cluster Environment (STANDALONE+HA) 4G Memory 5 nodes is also pretty hard to spell "(http://www.cnblogs.com/xiejin/p/4213082.html)

Spark1.2 cluster Environment (STANDALONE+HA) 4G memory 5 nodes are also pretty hard to fight.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.