Hadoop2.6 + HA + Zookeeper3.4.6 + HBase1.0.0 cluster Installation
Detailed records of cluster Installation Process of Hadoop2.6 + HA + Zookeeper3.4.6 + HBase1.0.0.
Install hadoop2.6 + HA
1. Prepare a CentOS6.4 System
2. Five servers in the CentOS6.4 Environment
Machine name IP address installation software running process
Master1 192.168.3.141 hadoop, Zookeeper, hbase NN, RM, DFSZKFC, journalNode, HMaster, QuorumPeerMain
Master2 192.168.3.142 hadoop, Zookeeper, hbase NN, RM, DFSZKFC, journalNode, HRegionServer, QuorumPeerMain
Slave1 192.168.3.143 hadoop, Zookeeper, hbase DN, NM, journalNode, HRegionServer, QuorumPeerMain
Slave2 192.168.3.144 hadoop, Zookeeper, hbase DN, NM, journalNode, HRegionServer, QuorumPeerMain
Slave3 192.168.3.145 hadoop, Zookeeper, hbase DN, NM, journalNode, HRegionServer, QuorumPeerMain
3. Modify the first machine, and clone the machine later.
4. Modify the/etc/hosts file
Modify/etc/sysconfig/network
5. Restart the machine
Or
Temporarily modify hostname
6. Install JDK
Decompress JDK
Edit/etc/profile to add jdk path
Save and exit
Modify java priority in CentOS
JDK has been installed.
7. Decompress hadoop and modify Environment Variables
8. modify the configuration file
8.1 modify the $ HADOOP_HOME/etc/hadoop/slaves File
Hostname of all slave nodes added
8.2 modify $ HADOOP_HOME/etc/hadoop/hadoop-env.sh File
Modify JAVA_HOME path
8.3 modify $ HADOOP_HOME/etc/hadoop/yarn-env.sh File
Modify JAVA_HOME path
8.4 modify hadoop h ome/etc/hadoop/core −site. xml file see Attachment 8.5 modify HADOOP_HOME/etc/hadoop/hdfs-site.xml File
For more information, see the attachment.
8.6 modify hadoop h ome/etc/hadoop/mapred − site. xml file see Attachment 8.7 modify HADOOP_HOME/etc/hadoop/yarn-site.xml File
For details, see the attachment (the attribute value of yarn. resourcemanager. ha. id must be changed to rm2 on master2)
8.8 Add the $ HADOOP_HOME/etc/hadoop/fairscheduler. xml file
For more information, see the attachment.
8.9 create related folders
Create a folder Based on the xml configuration file
Now that the Hadoop + HA configuration file has been configured, the ssh password-free login + formatting of the Hadoop system is poor.
After installing all the software (Zookeeper + hbase), clone the machine and then perform ssh password-free login and Hadoop formatting. After cloning, you also need to change the hostname in/etc/sysconfig/network for each node, and change the yarn of $ HADOOP_HOME/etc/hadoop/yarn-site.xml file in master2. resourcemanager. ha. the id property value is rm2.
Install Zookeeper3.4.6
1. Decompress Zookeeper
2. Configure Zookeeper Environment Variables
Add Zookeeper path
3. Change the configuration file
Change conf/zoo_sample.cfg to conf/zoo. cfg.
Then modify zoo. cfg.
4. Create a myid file in the DataDir path
Based on the dataLogDir path of the configuration file
Create/soft/zookeeper-3.4.6/var/datalog folder
Create/soft/zookeeper-3.4.6/var folder
Create the/soft/zookeeper-3.4.6/var/data folder again
Create the/soft/zookeeper-3.4.6/var/data/myid file again
Enter the number 1 (corresponding to the number after the zoo. cfg file server)
After cloning, other nodes should be changed according to the corresponding values in zoo. cfg.
Install Hbase1.0.0
1. modify local configuration
2. modify the configuration file
Conf/hbase-site.xml see the attachment (note that the value of hbase. rootdir is expressed by IP address)
Create a folder Based on the hbase. tmp. dir value in conf/hbase-site.xml
3. Create a connection
4. Edit the regionserver File
5. overwrite the hadoop *. jar file in the lib folder of hbase.
(Due to an error in later hbase installation, I exported all jar packages under hadoop to hbase/lib)
If the jar package of zookeeper In the lib folder of hbase does not match the jar package in Zookeeper, replace the package.
Now hbase has been installed. After cloning, You need to unify the Time of the cluster nodes.
Clone a machine
1. At this time, the machine clones four copies.
Change the IP addresses respectively. For details about how to change the ip address in Gui, see the attachment.
2. ssh password-free Login
2.1 change the/etc/sysconfig/network File
2.2 generate a key
Press enter.
2.3 copy to Public Key
2.4 copy the public key of the machine to the machine you want to remotely log on
Enter yes and password in the prompt
2.5 configure two-way ssh login.
In this case, master1 can be used to log on to other machines through ssh. Next, we configure that all machines have no password to log on to each other.
One of our machines, slave3, is used as an example. Other machines (master2 slave1 slave2) all perform this operation. We will not describe it here.
Upload the slave3 key to all other machines.
Enter yes and password as prompted.
On each machine (master2 slave1 slave2 slave3), this operation is performed on all other machines.
Then, append the key to the end of the public key file on another machine.
In this case, you can log on without a key.
You can view the/root/. ssh/authorized_keys file to find the key of each machine.
3. Modify the hadoop configuration file
Change $ HADOOP_HOME/etc/hadoop/yarn-site.xml file in master2:
The property value of Yarn. resourcemanager. ha. id is rm2.
4. Change the Zookeeper File
Change the $ ZOOKEEPER_HOME/var/data/myid File
The corresponding value is changed according to the id in $ ZOOKEEPER_HOME/conf/zoo. cfg.
5. Synchronization time
First, we establish master1 as the time server. We configure other nodes to synchronize time with master1.
5.1 perform operations on the master1 time server:
Check whether the time service is installed
Change related configuration files
Start the service
Check if master1 is synchronized with itself
5.2 synchronize time between other machines and master1
When the master1 time server starts for 3-5 minutes, we synchronize the time of other nodes with master1.
Change the configuration of the/etc/ntp. conf file
Send/etc/ntp. conf of master2 to other machines
Enable the time service on other nodes
5.3 Time Service is set to start
Run this command in master1 master2 slave1 slave2 slave3
5.4 open ports on the Time Server
Enter commands Based on IP addresses
First deployment + start hadoop + zookeeper + hbase
After installing hadoop, You need to deploy it and start it again. The following is the execution required for the first time. You do not need to perform this operation again later.
1. Start Zookeeper
Run the command zkServer. sh start on each machine or run the./zkServer. sh start command in the $ ZOOKEEPER_HOME/bin directory. Then you can run the jps command to view the QuorumPeerMain process started by Zookeeper.
You can run the zkServer. sh status Command to view the Zookeeper status. Normally, there is only one leader in the machine, and all others are follow.
2. format the ZooKeeper Cluster
The purpose is to create an HA node on the ZooKeeper cluster.
Execute commands on master1
It will be initialized Based on the ha. zookeeper. quorum value in the $ HADOOP_HOME/etc/hadoop/core-site.xml file.
3. Start the journalnode Process
Run
Or execute
Slave1 slave2 slave3
The second method is recommended. The journalnode of master1 and master2 cannot be started.
After startup, there will be more JournalNode processes on all nodes.
4. Format namenode
Execute commands on master1
Some folders and files (name, data, and journal) will be created under the mydata file)
5. Start namenode
Execute commands on master1
NameNode on master1
6. Synchronize the formatted namenode information to the standby namenode.
Execute commands on master2
7. Start namenode on master2
NameNode on master2
8. Start all datanode
Execute commands on master1
The process datanode displayed on the datanode node after execution
9. Start yarn
Execute commands on master1
There are more ResourceManager processes on master1 and NodeManager processes on slave1 slave2 slave3.
10. Start ZKFC
Start zkfc on master1 and master2
11. Hadoop started successfully
Are two master nodes after startup
12. Start hbase
Execute commands on master1
Nth start
1. For HA, you must first start Zookeeper
ZkServer. sh start (each node)
2. Start hadoop
Start-dfs.sh start-yarn.sh (master1 node)
3. Start hbase
Start-hbase.sh (master1 node)
Issues to be resolved in the future
I hope my God can help me with some advice ....
Question 1: How does the hbase-site.xml configure the HA framework based on the active master?
Problem 2: the service is restarted after it is stopped, and hbase cannot be started. I don't know why. I can only clear and format all hadoop files.
Steps:
1. Delete all folders under hadoop/mydata and create a new yarn folder.
2. Delete all files in the hadoop/log folder.
3. Delete all files except myid in zookeeper/var/data
4. Delete all folders under Zookeeper/var/datalog
5. delete the file in hbase: file
6. Delete all logs files under hbase
7. reformat hadoop
All attachments used in this article:
------------------------------------------ Split line ------------------------------------------
FTP address: ftp://ftp1.bkjia.com
Username: ftp1.bkjia.com
Password: www.bkjia.com
Install the package in the LinuxIDC.com \ 2015 \ Hadoop2.6 + HA + Zookeeper3.4.6 + HBase1.0.0 cluster on April 9, August
For the download method, see
------------------------------------------ Split line ------------------------------------------
Detailed configuration process of Hadoop2.5.2 + HA + Zookeeper3.4.6
Tutorial on standalone/pseudo-distributed installation and configuration of Hadoop2.4.1 under Ubuntu14.04
Install and configure Hadoop2.2.0 on CentOS
Build a Hadoop environment on Ubuntu 13.04
Cluster configuration for Ubuntu 12.10 + Hadoop 1.2.1
Build a Hadoop environment on Ubuntu (standalone mode + pseudo Distribution Mode)
Configuration of Hadoop environment in Ubuntu
Detailed tutorial on creating a Hadoop environment for standalone Edition