Build a spark cluster entirely from 0
Note: This step, only suitable for the use of root to build, formal environment should have permission classes of things behind another experiment to write tutorials
1, install each software, set environment variables (each software needs to download separately)
Export java_home=/usr/java/jdk1.8.0_71
Export Java_bin=/usr/java/jdk1.8.0_71/bin
Export path= $JAVA _home/bin: $PATH
Export classpath=.: $JAVA _home/lib/dt.jar: $JAVA _home/lib/tools.jar
Export Java_home java_bin PATH CLASSPATH
Export hadoop_home=/usr/local/hadoop-2.6.0
Export Hadoop_conf_dir=${hadoop_home}/etc/hadoop
Export Hadoop_common_lib_native_dir=${hadoop_home}/lib/native
Export hadoop_opts= "-djava.library.path=${hadoop_home}/lib"
Export Path=${hadoop_home}/bin:${hadoop_home}/sbin: $PATH
Export scala_home=/usr/local/scala-2.10.4
Export Path=${scala_home}/bin: $PATH
Export spark_home=/usr/local/spark/spark-1.6.0-bin-hadoop2.6
Export Path=${spark_home}/bin:${spark_home}/sbin: $PATH
Export zookeeper_home=/usr/local/zookeeper-3.4.6
2. SSH settings
Ssh-keygen-t Dsa-p '-f ~/.SSH/ID_DSA//Generate key to ~/.SSH/ID_DSA
Cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys//Append to Key
3. Host name and domain name settings
Vi/etc/hostname change to Master or Worker1, 2, 3, 4
Vim/etc/hosts change domain name, each system IP corresponding domain name
4. Configuration of Hadoop
1) Change of CD $HADOOP _home/etc/hadoop/Core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://Master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/hadoop-2.6.0/tmp</value>
</property>
<property>
<name>hadoop.native.lib</name>
<value>true</value>
<description>should native Hadoop libraries,if present,be used</description>
</property>
</configuration>
2) or the CD $HADOOP _home/etc/hadoop/under Change hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>Master:50090</value>
<description>the Secondary Namenode HTTP server address and port</description>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/usr/local/hadoop/hadoop-2.6.0/dfs/name</value>
</property>
<property>
<name>dfs.datanode.dir</name>
<value>/usr/local/hadoop/hadoop-2.6.0/dfs/data</value>
</property>
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>file:///usr/local/hadoop/hadoop-2.6.0/dfs/namesecondary</value>
<description>determines where on the local filesystem the Dfssecondary name node should store th temporary images to Merge. If This is acomma-delimited list of directories and the image is replicated in all of the irectories foe redundancy.</ Description>
</property>
</configuration>
3) or the CD $HADOOP _home/etc/hadoop/under Change mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
4) or the CD $HADOOP _home/etc/hadoop/under Change yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>Master</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
5) or the CD $HADOOP _home/etc/hadoop/under Change hadoop-env.sh
The JDK directory corresponding to export java_home=/usr/java/jdk1.8.0_71
If you want to put your master also as a node, you can also master join, but the machine is not enough driver if master,driver and other programs such as Web queries are running, it is not recommended to run master as a node.
======= a machine to this level, then start copying the machine, and then proceed to the following Operation =============
6) First Look at the 3rd step inside change the domain name, to each machine on the host name and domain name changed
7) or the CD $HADOOP _home/etc/hadoop/under Change slaves
See how many slaves you have and add the domain names of those machines, like
Worker1
Worker2
Worker3
Then replicate to several machines:
SCP Slaves [Email protected]:/usr/local/hadoop-2.6.0/etc/hadoop/slaves
SCP Slaves [Email protected]:/usr/local/hadoop-2.6.0/etc/hadoop/slaves
SCP Slaves [Email protected]/usr/local/hadoop-2.6.0/etc/hadoop/slaves
8) or CD $HADOOP _home/etc/hadoop/change master, content is master
In the case master does not do the cluster, you need to copy the master to each machine, in fact, should be copied, so if you do not start the cluster, can also run
If Master is a cluster, ZooKeeper, configure it in ZooKeeper
SCP Master [Email protected]:/usr/local/hadoop-2.6.0/etc/hadoop/master
SCP Master [Email protected]:/usr/local/hadoop-2.6.0/etc/hadoop/master
SCP Master [Email protected]/usr/local/hadoop-2.6.0/etc/hadoop/master
9) Format the system on Master
Mkdir/usr/local/hadoop/hadoop-2.6.0/tmp if the original exists, delete
HDFs Namenode-format
10) Start Dfs
CD $HADOOP _home/sbin
./start-dfs.sh
And then
Http://Master:50070/dfshealth.html can see DFS file status
Not seen, such as configured capacity only 0B, try each machine firewall off:
Systemctl Stop Firewalld.service
Systemctl Disable Firewalld.service
But this is only suitable for the development machine, the actual production environment need to carefully look at what port to determine.
If you just do spark, it's enough to be here, Hadoop says ****************
5, the configuration of Spark
1)spark-env.sh
CD $SPARK _home/conf
CP out spark-env.sh
Export java_home=/usr/java/jdk1.8.0_71
Export scala_home=/usr/local/scala-2.10.4
Export hadoop_home=/usr/local/hadoop-2.6.0
Export Hadoop_conf_dir=/usr/local/hadoop-2.6.0/etc/hadoop
#export spark_classpath= $SPARK _classpath: $SPARK _home/lib/ojdbc-14.jar: $SPARK _home/lib/ Jieyi-tools-1.2.0.7.release.jar
#export Spark_master_ip=master
Export SPARK_WORKER_MEMORY=2G
Export SPARK_EXCUTOR_MEMORY=2G
Export SPARK_DRIVER_MEMORY=2G
Export spark_worker_cores=8
Export spark_daemon_java_opts= "-dspark.deploy.recoverymode=zookeeper-dspark.deploy.zookeeper.url=master:2181, Worker1:2181,worker2:2181-dspark.deploy.zookeeper.dir=/spark "
Explanation of the parameter meaning:
Export java_home=/usr/java/jdk1.8.0_71
Export scala_home=/usr/local/scala-2.10.4
< Span style= "Background-color:rgb (255,255,255);" >
< Span style= "Background-color:rgb (255,255,255);" >/usr/local/hadoop-2.6.0 /etc/hadoop //Run must be configured in yarn mode
Export Spark_master_ip=master//SAPRK running Master IP
Export SPARK_WORKER_MEMORY=2G//specific machines
Export SPARK_EXCUTOR_MEMORY=2G//specific calculations
Export Spark_driver_ memory=2g
Export spark_worker_cores=8//thread pool concurrency number
Where export spark_master_ip=master is used as a stand-alone time, the export spark_daemon_java_opts= "-dspark.deploy.recoverymode=zookeeper- Dspark.deploy.zookeeper.url=master:2181,worker1:2181,worker2:2181-dspark.deploy.zookeeper.dir=/spark " is configured as the time of the cluster
After the change is complete, sync:
SCP spark-env.sh [Email protected]:/usr/local/spark/spark-1.6.0-bin-hadoop2.6/conf/spark-env.sh
SCP spark-env.sh [Email protected]:/usr/local/spark/spark-1.6.0-bin-hadoop2.6/conf/spark-env.sh
SCP spark-env.sh [Email protected]:/usr/local/spark/spark-1.6.0-bin-hadoop2.6/conf/spark-env.sh
2) Slaves
CD $SPARK _home/conf
CP out Slaves
The contents are as follows:
Worker1
Worker2
Worker3
After the change is complete, sync:
SCP Slaves [Email protected]:/usr/local/spark/spark-1.6.0-bin-hadoop2.6/conf/slaves
SCP Slaves [Email protected]:/usr/local/spark/spark-1.6.0-bin-hadoop2.6/conf/slaves
SCP Slaves [Email protected]:/usr/local/spark/spark-1.6.0-bin-hadoop2.6/conf/slaves
3) spark-defaults.conf
CD $SPARK _home/conf
CP spark-defaults.conf out
Spark.executor.extrajavaoptions-xx:+printgcdetails-dkey=value-dnumbers= "one Three"
spark.eventLog.enabled true
Spark.eventLog.dir Hdfs://master:9000/historyserverforspark1
Spark.yarn.historyServer.address master:18080
Spark.history.fs.logDirectory Hdfs://master:9000/historyserverforspark1
After the change is complete, sync:
SCP spark-defaults.conf [Email protected]:/usr/local/spark/spark-1.6.0-bin-hadoop2.6/conf/spark-defaults.conf
SCP spark-defaults.conf [Email protected]:/usr/local/spark/spark-1.6.0-bin-hadoop2.6/conf/spark-defaults.conf
SCP spark-defaults.conf [Email protected]:/usr/local/spark/spark-1.6.0-bin-hadoop2.6/conf/spark-defaults.conf
Or the above three steps together:
CD $SPARK _home
Scp-r./spark-1.6.0-bin-hadoop2.6/[Email Protected]:/usr/local/spark
4) Create history directory (first installation must be done)
Hadoop Dfs-rmr/historyserverforspark
Hadoop Dfs-mkdir/historyserverforspark
And here it is:
650) this.width=650; "src="/e/u261/themes/default/images/spacer.gif "style=" Background:url ("/e/u261/lang/zh-cn/ Images/localimage.png ") no-repeat center;border:1px solid #ddd;" alt= "Spacer.gif"/>650 "this.width=650;" src= "http ://s2.51cto.com/wyfs02/m01/7b/4f/wkiol1bkxwcydkohaabypn7nmws172.png "title=" Image111111111.png "alt=" Wkiol1bkxwcydkohaabypn7nmws172.png "/>
5) Start spark
CD $SPARK _home/sbin
./start-all.sh
Look at the Web console
master:8080/
6) service to start historical information
CD $SPARK _home/sbin
./start-history-server.sh
7) The algorithm of Pi under experiment:
./spark-submit --class org.apache.spark.examples.SparkPi --master spark://master:7077. /lib/spark-Examples-1.6.0-hadoop2.6.0.jar100
Start your magical journey to spark!
If Spark is a stand-alone, it's enough to start adding zookeeper to the cluster ****************
6, zookeeper install the cluster of things
1) first on the first machine to extract zookeeper, directory according to the environment variable at the beginning of the decompression can be
Go to zookeeper, create data and logs two directories
[Email protected]:/usr/local/zookeeper-3.4.6# mkdir Data
[Email protected]:/usr/local/zookeeper-3.4.6# mkdir Logs
2) from the Zoo_sample.cfg CP out ZOO.CFG and set
[Email protected]:/usr/local/zookeeper-3.4.6/conf# cp zoo_sample.cfg ZOO.CFG
[Email protected]:/usr/local/zookeeper-3.4.6/conf# VI zoo.cfg
Modify (Make a cluster of 3 machines)
Datadir=/usr/local/zookeeper-3.4.6/data
Datalogdir=/usr/local/zookeeper-3.4.6/logs
server.0=master:2888:3888
server.1=worker1:2888:3888
server.2=worker2:2888:3888
3) Number the machine under data
[Email protected]:/usr/local/zookeeper-3.4.6/conf# CD. /data/
Number The Machine
[Email protected]:/usr/local/zookeeper-3.4.6/data# Echo 0>myid
[Email protected]:/usr/local/zookeeper-3.4.6/data# Echo 0>>myid
[Email protected]:/usr/local/zookeeper-3.4.6/data# ls
myID
[Email protected]:/usr/local/zookeeper-3.4.6/data# cat myID
[Email protected]:/usr/local/zookeeper-3.4.6/data# vi myID here to write a 0
[Email protected]:/usr/local/zookeeper-3.4.6/data# cat myID
0
By this time, a machine has been configured.
4) Copy to other two machines and change myID
[Email protected]:/usr/local# scp-r/zookeeper-3.4.6 [email protected]:/usr/local
[Email protected]:/usr/local# scp-r/zookeeper-3.4.6 [email protected]:/usr/local
Then go in separately Worker1 and Worker2 change myID to 1 and 2
By this time, the zookeeper of 3 machines have been configured.
5) The next step is to have spark support zookeeper ha
Configure in Spark-env.sh
[Email protected]:/usr/local/spark-1.6.0-bin-hadoop2.6/conf# VI spark-env.sh
The state of the entire cluster is maintained and restored through zookeeper, the status information is (the following paragraph is the above is commented on the thing, to cut the machine and the cluster depends on this)
Export spark_daemon_java_opts= "-dspark.deploy.recoverymode=zookeeper-dspark.deploy.zookeeper.url=master : 2181,worker1:2181, worker2:2181 -dspark.deploy.zookeeper.dir=/spark "
The cluster has been configured, so also note
#export Spark_master_ip=master
[Email protected]:/usr/local/spark-1.6.0-bin-hadoop2.6/conf# SCP spark-env.sh [email protected]:/usr/local/spark/ spark-1.6.0-bin-hadoop2.6/conf/spark-env.sh
spark-env.sh 100% 0.5kb/s 00:00
[Email protected]:/usr/local/spark-1.6.0-bin-hadoop2.6/conf# SCP spark-env.sh [email protected]:/usr/local/spark/ spark-1.6.0-bin-hadoop2.6/conf/spark-env.sh
spark-env.sh 100% 0.5kb/s 00:00
By this time, 3 machines of spark have been configured, the following is the start
6) Overall Start step
Start Hadoop HDFs
CD $HADOOP _home/sbin
./start-dfs.sh
Three zookeeper machines start the zookeeper separately:
CD $ZOOKEEPER _home/bin
./zkserver.sh Start
Start Spark
In master boot:
CD $SPARK _home/sbin
./start-all.sh
./start-history-server.sh
On the other two machines boot up:
CD $SPARK _home/sbin
./start-mastser.sh
JPS viewing processes on three machines, respectively
or watch the console.
The entire cluster is up and ready.
7) If you want to experiment with cluster effects
Can be started./spark-shell--master spark://master:7077,worker1:7077,worker2:7077
The master process is then used./stop-master stop, over a period of time (depending on the machine a few seconds to a few minutes) automatically switch to another machine
This article is from the "a flower proud of the Cold" blog, please be sure to keep this source http://feiweihy.blog.51cto.com/6389397/1744024
Build a zookeeper-based spark cluster starting from 0