Build a zookeeper-based spark cluster starting from 0

Last Update:2016-02-22 Source: Internet

Author: User

Tags change domain name

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Build a spark cluster entirely from 0

Note: This step, only suitable for the use of root to build, formal environment should have permission classes of things behind another experiment to write tutorials

1, install each software, set environment variables (each software needs to download separately)

Export java_home=/usr/java/jdk1.8.0_71

Export Java_bin=/usr/java/jdk1.8.0_71/bin

Export path= $JAVA _home/bin: $PATH

Export classpath=.: $JAVA _home/lib/dt.jar: $JAVA _home/lib/tools.jar

Export Java_home java_bin PATH CLASSPATH

Export hadoop_home=/usr/local/hadoop-2.6.0

Export Hadoop_conf_dir=${hadoop_home}/etc/hadoop

Export Hadoop_common_lib_native_dir=${hadoop_home}/lib/native

Export hadoop_opts= "-djava.library.path=${hadoop_home}/lib"

Export Path=${hadoop_home}/bin:${hadoop_home}/sbin: $PATH

Export scala_home=/usr/local/scala-2.10.4

Export Path=${scala_home}/bin: $PATH

Export spark_home=/usr/local/spark/spark-1.6.0-bin-hadoop2.6

Export Path=${spark_home}/bin:${spark_home}/sbin: $PATH

Export zookeeper_home=/usr/local/zookeeper-3.4.6

2. SSH settings

Ssh-keygen-t Dsa-p '-f ~/.SSH/ID_DSA//Generate key to ~/.SSH/ID_DSA

Cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys//Append to Key

3. Host name and domain name settings

Vi/etc/hostname change to Master or Worker1, 2, 3, 4

Vim/etc/hosts change domain name, each system IP corresponding domain name

4. Configuration of Hadoop

1) Change of CD $HADOOP _home/etc/hadoop/Core-site.xml

<name>fs.defaultFS</name>

<value>hdfs://Master:9000</value>

</property>

<name>hadoop.tmp.dir</name>

<value>/usr/local/hadoop/hadoop-2.6.0/tmp</value>

</property>

<name>hadoop.native.lib</name>

<description>should native Hadoop libraries,if present,be used</description>

</property>

</configuration>

2) or the CD $HADOOP _home/etc/hadoop/under Change hdfs-site.xml

<name>dfs.replication</name>

</property>

<name>dfs.namenode.secondary.http-address</name>

<value>Master:50090</value>

<description>the Secondary Namenode HTTP server address and port</description>

</property>

<name>dfs.namenode.name.dir</name>

<value>/usr/local/hadoop/hadoop-2.6.0/dfs/name</value>

</property>

<name>dfs.datanode.dir</name>

<value>/usr/local/hadoop/hadoop-2.6.0/dfs/data</value>

</property>

<name>dfs.namenode.checkpoint.dir</name>

<value>file:///usr/local/hadoop/hadoop-2.6.0/dfs/namesecondary</value>

<description>determines where on the local filesystem the Dfssecondary name node should store th temporary images to Merge. If This is acomma-delimited list of directories and the image is replicated in all of the irectories foe redundancy.</ Description>

</property>

</configuration>

3) or the CD $HADOOP _home/etc/hadoop/under Change mapred-site.xml

<name>mapreduce.framework.name</name>

</property>

</configuration>

4) or the CD $HADOOP _home/etc/hadoop/under Change yarn-site.xml

<name>yarn.resourcemanager.hostname</name>

<value>Master</value>

</property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

</configuration>

5) or the CD $HADOOP _home/etc/hadoop/under Change hadoop-env.sh

The JDK directory corresponding to export java_home=/usr/java/jdk1.8.0_71

If you want to put your master also as a node, you can also master join, but the machine is not enough driver if master,driver and other programs such as Web queries are running, it is not recommended to run master as a node.

======= a machine to this level, then start copying the machine, and then proceed to the following Operation =============

6) First Look at the 3rd step inside change the domain name, to each machine on the host name and domain name changed

7) or the CD $HADOOP _home/etc/hadoop/under Change slaves

See how many slaves you have and add the domain names of those machines, like

Worker1

Worker2

Worker3

Then replicate to several machines:

SCP Slaves [Email protected]:/usr/local/hadoop-2.6.0/etc/hadoop/slaves

SCP Slaves [Email protected]/usr/local/hadoop-2.6.0/etc/hadoop/slaves

8) or CD $HADOOP _home/etc/hadoop/change master, content is master

In the case master does not do the cluster, you need to copy the master to each machine, in fact, should be copied, so if you do not start the cluster, can also run

If Master is a cluster, ZooKeeper, configure it in ZooKeeper

SCP Master [Email protected]:/usr/local/hadoop-2.6.0/etc/hadoop/master

SCP Master [Email protected]/usr/local/hadoop-2.6.0/etc/hadoop/master

9) Format the system on Master

Mkdir/usr/local/hadoop/hadoop-2.6.0/tmp if the original exists, delete

HDFs Namenode-format

10) Start Dfs

CD $HADOOP _home/sbin

./start-dfs.sh

And then

Http://Master:50070/dfshealth.html can see DFS file status

Not seen, such as configured capacity only 0B, try each machine firewall off:

Systemctl Stop Firewalld.service

Systemctl Disable Firewalld.service

But this is only suitable for the development machine, the actual production environment need to carefully look at what port to determine.

If you just do spark, it's enough to be here, Hadoop says ****************

5, the configuration of Spark

1)spark-env.sh

CD $SPARK _home/conf

CP out spark-env.sh

Export java_home=/usr/java/jdk1.8.0_71

Export scala_home=/usr/local/scala-2.10.4

Export hadoop_home=/usr/local/hadoop-2.6.0

Export Hadoop_conf_dir=/usr/local/hadoop-2.6.0/etc/hadoop

#export spark_classpath= $SPARK _classpath: $SPARK _home/lib/ojdbc-14.jar: $SPARK _home/lib/ Jieyi-tools-1.2.0.7.release.jar

#export Spark_master_ip=master

Export SPARK_WORKER_MEMORY=2G

Export SPARK_EXCUTOR_MEMORY=2G

Export SPARK_DRIVER_MEMORY=2G

Export spark_worker_cores=8

Export spark_daemon_java_opts= "-dspark.deploy.recoverymode=zookeeper-dspark.deploy.zookeeper.url=master:2181, Worker1:2181,worker2:2181-dspark.deploy.zookeeper.dir=/spark "

Explanation of the parameter meaning:

Export java_home=/usr/java/jdk1.8.0_71

Export scala_home=/usr/local/scala-2.10.4

< Span style= "Background-color:rgb (255,255,255);" >

< Span style= "Background-color:rgb (255,255,255);" >/usr/local/hadoop-2.6.0 /etc/hadoop //Run must be configured in yarn mode

Export Spark_master_ip=master//SAPRK running Master IP

Export SPARK_WORKER_MEMORY=2G//specific machines

Export SPARK_EXCUTOR_MEMORY=2G//specific calculations

Export Spark_driver_ memory=2g

Export spark_worker_cores=8//thread pool concurrency number

Where export spark_master_ip=master is used as a stand-alone time, the export spark_daemon_java_opts= "-dspark.deploy.recoverymode=zookeeper- Dspark.deploy.zookeeper.url=master:2181,worker1:2181,worker2:2181-dspark.deploy.zookeeper.dir=/spark " is configured as the time of the cluster

After the change is complete, sync:

SCP spark-env.sh [Email protected]:/usr/local/spark/spark-1.6.0-bin-hadoop2.6/conf/spark-env.sh

2) Slaves

CD $SPARK _home/conf

CP out Slaves

The contents are as follows:

Worker1

Worker2

Worker3

After the change is complete, sync:

SCP Slaves [Email protected]:/usr/local/spark/spark-1.6.0-bin-hadoop2.6/conf/slaves

3) spark-defaults.conf

CD $SPARK _home/conf

CP spark-defaults.conf out

Spark.executor.extrajavaoptions-xx:+printgcdetails-dkey=value-dnumbers= "one Three"

spark.eventLog.enabled true

Spark.eventLog.dir Hdfs://master:9000/historyserverforspark1

Spark.yarn.historyServer.address master:18080

Spark.history.fs.logDirectory Hdfs://master:9000/historyserverforspark1

After the change is complete, sync:

SCP spark-defaults.conf [Email protected]:/usr/local/spark/spark-1.6.0-bin-hadoop2.6/conf/spark-defaults.conf

Or the above three steps together:

CD $SPARK _home

Scp-r./spark-1.6.0-bin-hadoop2.6/[Email Protected]:/usr/local/spark

4) Create history directory (first installation must be done)

Hadoop Dfs-rmr/historyserverforspark

Hadoop Dfs-mkdir/historyserverforspark

And here it is:

650) this.width=650; "src="/e/u261/themes/default/images/spacer.gif "style=" Background:url ("/e/u261/lang/zh-cn/ Images/localimage.png ") no-repeat center;border:1px solid #ddd;" alt= "Spacer.gif"/>650 "this.width=650;" src= "http ://s2.51cto.com/wyfs02/m01/7b/4f/wkiol1bkxwcydkohaabypn7nmws172.png "title=" Image111111111.png "alt=" Wkiol1bkxwcydkohaabypn7nmws172.png "/>

5) Start spark

CD $SPARK _home/sbin

./start-all.sh

Look at the Web console

master:8080/

6) service to start historical information

CD $SPARK _home/sbin

./start-history-server.sh

7) The algorithm of Pi under experiment:

./spark-submit --class org.apache.spark.examples.SparkPi --master spark://master:7077. /lib/spark-Examples-1.6.0-hadoop2.6.0.jar100

Start your magical journey to spark!

If Spark is a stand-alone, it's enough to start adding zookeeper to the cluster ****************

6, zookeeper install the cluster of things

1) first on the first machine to extract zookeeper, directory according to the environment variable at the beginning of the decompression can be

Go to zookeeper, create data and logs two directories

[Email protected]:/usr/local/zookeeper-3.4.6# mkdir Data

[Email protected]:/usr/local/zookeeper-3.4.6# mkdir Logs

2) from the Zoo_sample.cfg CP out ZOO.CFG and set

[Email protected]:/usr/local/zookeeper-3.4.6/conf# cp zoo_sample.cfg ZOO.CFG

[Email protected]:/usr/local/zookeeper-3.4.6/conf# VI zoo.cfg

Modify (Make a cluster of 3 machines)

Datadir=/usr/local/zookeeper-3.4.6/data

Datalogdir=/usr/local/zookeeper-3.4.6/logs

server.0=master:2888:3888

server.1=worker1:2888:3888

server.2=worker2:2888:3888

3) Number the machine under data

[Email protected]:/usr/local/zookeeper-3.4.6/conf# CD. /data/

Number The Machine

[Email protected]:/usr/local/zookeeper-3.4.6/data# Echo 0>myid

[Email protected]:/usr/local/zookeeper-3.4.6/data# Echo 0>>myid

[Email protected]:/usr/local/zookeeper-3.4.6/data# ls

myID

[Email protected]:/usr/local/zookeeper-3.4.6/data# cat myID

[Email protected]:/usr/local/zookeeper-3.4.6/data# vi myID here to write a 0

[Email protected]:/usr/local/zookeeper-3.4.6/data# cat myID

By this time, a machine has been configured.

4) Copy to other two machines and change myID

[Email protected]:/usr/local# scp-r/zookeeper-3.4.6 [email protected]:/usr/local

Then go in separately Worker1 and Worker2 change myID to 1 and 2

By this time, the zookeeper of 3 machines have been configured.

5) The next step is to have spark support zookeeper ha

Configure in Spark-env.sh

[Email protected]:/usr/local/spark-1.6.0-bin-hadoop2.6/conf# VI spark-env.sh

The state of the entire cluster is maintained and restored through zookeeper, the status information is (the following paragraph is the above is commented on the thing, to cut the machine and the cluster depends on this)

Export spark_daemon_java_opts= "-dspark.deploy.recoverymode=zookeeper-dspark.deploy.zookeeper.url=master : 2181,worker1:2181, worker2:2181 -dspark.deploy.zookeeper.dir=/spark "

The cluster has been configured, so also note

#export Spark_master_ip=master

[Email protected]:/usr/local/spark-1.6.0-bin-hadoop2.6/conf# SCP spark-env.sh [email protected]:/usr/local/spark/ spark-1.6.0-bin-hadoop2.6/conf/spark-env.sh

spark-env.sh 100% 0.5kb/s 00:00

[Email protected]:/usr/local/spark-1.6.0-bin-hadoop2.6/conf# SCP spark-env.sh [email protected]:/usr/local/spark/ spark-1.6.0-bin-hadoop2.6/conf/spark-env.sh

spark-env.sh 100% 0.5kb/s 00:00

By this time, 3 machines of spark have been configured, the following is the start

6) Overall Start step

Start Hadoop HDFs

CD $HADOOP _home/sbin

./start-dfs.sh

Three zookeeper machines start the zookeeper separately:

CD $ZOOKEEPER _home/bin

./zkserver.sh Start

Start Spark

In master boot:

CD $SPARK _home/sbin

./start-all.sh

./start-history-server.sh

On the other two machines boot up:

CD $SPARK _home/sbin

./start-mastser.sh

JPS viewing processes on three machines, respectively

or watch the console.

The entire cluster is up and ready.

7) If you want to experiment with cluster effects

Can be started./spark-shell--master spark://master:7077,worker1:7077,worker2:7077

The master process is then used./stop-master stop, over a period of time (depending on the machine a few seconds to a few minutes) automatically switch to another machine

This article is from the "a flower proud of the Cold" blog, please be sure to keep this source http://feiweihy.blog.51cto.com/6389397/1744024

Build a zookeeper-based spark cluster starting from 0

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More