Hadoop2.6 + HA + Zookeeper3.4.6 + HBase1.0.0 cluster Installation

Source: Internet
Author: User

Hadoop2.6 + HA + Zookeeper3.4.6 + HBase1.0.0 cluster Installation

Detailed records of cluster Installation Process of Hadoop2.6 + HA + Zookeeper3.4.6 + HBase1.0.0.

Install hadoop2.6 + HA

1. Prepare a CentOS6.4 System

2. Five servers in the CentOS6.4 Environment

Machine name IP address installation software running process

Master1 192.168.3.141 hadoop, Zookeeper, hbase NN, RM, DFSZKFC, journalNode, HMaster, QuorumPeerMain

Master2 192.168.3.142 hadoop, Zookeeper, hbase NN, RM, DFSZKFC, journalNode, HRegionServer, QuorumPeerMain

Slave1 192.168.3.143 hadoop, Zookeeper, hbase DN, NM, journalNode, HRegionServer, QuorumPeerMain

Slave2 192.168.3.144 hadoop, Zookeeper, hbase DN, NM, journalNode, HRegionServer, QuorumPeerMain

Slave3 192.168.3.145 hadoop, Zookeeper, hbase DN, NM, journalNode, HRegionServer, QuorumPeerMain

3. Modify the first machine, and clone the machine later.

4. Modify the/etc/hosts file

Modify/etc/sysconfig/network

5. Restart the machine

Or

Temporarily modify hostname

6. Install JDK

Decompress JDK

Edit/etc/profile to add jdk path

Save and exit

Modify java priority in CentOS

JDK has been installed.

7. Decompress hadoop and modify Environment Variables

8. modify the configuration file

8.1 modify the $ HADOOP_HOME/etc/hadoop/slaves File

Hostname of all slave nodes added

8.2 modify $ HADOOP_HOME/etc/hadoop/hadoop-env.sh File

Modify JAVA_HOME path

8.3 modify $ HADOOP_HOME/etc/hadoop/yarn-env.sh File

Modify JAVA_HOME path

8.4 modify hadoop h ome/etc/hadoop/core −site. xml file see Attachment 8.5 modify HADOOP_HOME/etc/hadoop/hdfs-site.xml File

For more information, see the attachment.

8.6 modify hadoop h ome/etc/hadoop/mapred − site. xml file see Attachment 8.7 modify HADOOP_HOME/etc/hadoop/yarn-site.xml File

For details, see the attachment (the attribute value of yarn. resourcemanager. ha. id must be changed to rm2 on master2)

8.8 Add the $ HADOOP_HOME/etc/hadoop/fairscheduler. xml file

For more information, see the attachment.

8.9 create related folders

Create a folder Based on the xml configuration file

Now that the Hadoop + HA configuration file has been configured, the ssh password-free login + formatting of the Hadoop system is poor.

After installing all the software (Zookeeper + hbase), clone the machine and then perform ssh password-free login and Hadoop formatting. After cloning, you also need to change the hostname in/etc/sysconfig/network for each node, and change the yarn of $ HADOOP_HOME/etc/hadoop/yarn-site.xml file in master2. resourcemanager. ha. the id property value is rm2.

Install Zookeeper3.4.6

1. Decompress Zookeeper

2. Configure Zookeeper Environment Variables

Add Zookeeper path

3. Change the configuration file

Change conf/zoo_sample.cfg to conf/zoo. cfg.

Then modify zoo. cfg.

4. Create a myid file in the DataDir path

Based on the dataLogDir path of the configuration file

Create/soft/zookeeper-3.4.6/var/datalog folder

Create/soft/zookeeper-3.4.6/var folder

Create the/soft/zookeeper-3.4.6/var/data folder again

Create the/soft/zookeeper-3.4.6/var/data/myid file again

Enter the number 1 (corresponding to the number after the zoo. cfg file server)

After cloning, other nodes should be changed according to the corresponding values in zoo. cfg.

Install Hbase1.0.0

1. modify local configuration

2. modify the configuration file

Conf/hbase-site.xml see the attachment (note that the value of hbase. rootdir is expressed by IP address)

Create a folder Based on the hbase. tmp. dir value in conf/hbase-site.xml

3. Create a connection

4. Edit the regionserver File

5. overwrite the hadoop *. jar file in the lib folder of hbase.

(Due to an error in later hbase installation, I exported all jar packages under hadoop to hbase/lib)

If the jar package of zookeeper In the lib folder of hbase does not match the jar package in Zookeeper, replace the package.

Now hbase has been installed. After cloning, You need to unify the Time of the cluster nodes.

Clone a machine

1. At this time, the machine clones four copies.

Change the IP addresses respectively. For details about how to change the ip address in Gui, see the attachment.

2. ssh password-free Login

2.1 change the/etc/sysconfig/network File

2.2 generate a key

Press enter.

2.3 copy to Public Key

2.4 copy the public key of the machine to the machine you want to remotely log on

Enter yes and password in the prompt

2.5 configure two-way ssh login.

In this case, master1 can be used to log on to other machines through ssh. Next, we configure that all machines have no password to log on to each other.

One of our machines, slave3, is used as an example. Other machines (master2 slave1 slave2) all perform this operation. We will not describe it here.

Upload the slave3 key to all other machines.

Enter yes and password as prompted.

On each machine (master2 slave1 slave2 slave3), this operation is performed on all other machines.

Then, append the key to the end of the public key file on another machine.

In this case, you can log on without a key.

You can view the/root/. ssh/authorized_keys file to find the key of each machine.

3. Modify the hadoop configuration file

Change $ HADOOP_HOME/etc/hadoop/yarn-site.xml file in master2:

The property value of Yarn. resourcemanager. ha. id is rm2.

4. Change the Zookeeper File

Change the $ ZOOKEEPER_HOME/var/data/myid File

The corresponding value is changed according to the id in $ ZOOKEEPER_HOME/conf/zoo. cfg.

5. Synchronization time

First, we establish master1 as the time server. We configure other nodes to synchronize time with master1.

5.1 perform operations on the master1 time server:

Check whether the time service is installed

Change related configuration files

Start the service

Check if master1 is synchronized with itself

5.2 synchronize time between other machines and master1

When the master1 time server starts for 3-5 minutes, we synchronize the time of other nodes with master1.

Change the configuration of the/etc/ntp. conf file

Send/etc/ntp. conf of master2 to other machines

Enable the time service on other nodes

5.3 Time Service is set to start

Run this command in master1 master2 slave1 slave2 slave3

5.4 open ports on the Time Server

Enter commands Based on IP addresses

First deployment + start hadoop + zookeeper + hbase

After installing hadoop, You need to deploy it and start it again. The following is the execution required for the first time. You do not need to perform this operation again later.

1. Start Zookeeper

Run the command zkServer. sh start on each machine or run the./zkServer. sh start command in the $ ZOOKEEPER_HOME/bin directory. Then you can run the jps command to view the QuorumPeerMain process started by Zookeeper.

You can run the zkServer. sh status Command to view the Zookeeper status. Normally, there is only one leader in the machine, and all others are follow.

2. format the ZooKeeper Cluster

The purpose is to create an HA node on the ZooKeeper cluster.

Execute commands on master1

It will be initialized Based on the ha. zookeeper. quorum value in the $ HADOOP_HOME/etc/hadoop/core-site.xml file.

3. Start the journalnode Process

Run

Or execute

Slave1 slave2 slave3

The second method is recommended. The journalnode of master1 and master2 cannot be started.

After startup, there will be more JournalNode processes on all nodes.

4. Format namenode

Execute commands on master1

Some folders and files (name, data, and journal) will be created under the mydata file)

5. Start namenode

Execute commands on master1

NameNode on master1

6. Synchronize the formatted namenode information to the standby namenode.

Execute commands on master2

7. Start namenode on master2

NameNode on master2

8. Start all datanode

Execute commands on master1

The process datanode displayed on the datanode node after execution

9. Start yarn

Execute commands on master1

There are more ResourceManager processes on master1 and NodeManager processes on slave1 slave2 slave3.

10. Start ZKFC

Start zkfc on master1 and master2

11. Hadoop started successfully

Are two master nodes after startup

12. Start hbase

Execute commands on master1

Nth start

1. For HA, you must first start Zookeeper

ZkServer. sh start (each node)

2. Start hadoop

Start-dfs.sh start-yarn.sh (master1 node)

3. Start hbase

Start-hbase.sh (master1 node)

Issues to be resolved in the future

I hope my God can help me with some advice ....

Question 1: How does the hbase-site.xml configure the HA framework based on the active master?

Problem 2: the service is restarted after it is stopped, and hbase cannot be started. I don't know why. I can only clear and format all hadoop files.

Steps:

1. Delete all folders under hadoop/mydata and create a new yarn folder.

2. Delete all files in the hadoop/log folder.

3. Delete all files except myid in zookeeper/var/data

4. Delete all folders under Zookeeper/var/datalog

5. delete the file in hbase: file

6. Delete all logs files under hbase

7. reformat hadoop

All attachments used in this article:

------------------------------------------ Split line ------------------------------------------

FTP address: ftp://ftp1.bkjia.com

Username: ftp1.bkjia.com

Password: www.bkjia.com

Install the package in the LinuxIDC.com \ 2015 \ Hadoop2.6 + HA + Zookeeper3.4.6 + HBase1.0.0 cluster on April 9, August

For the download method, see

------------------------------------------ Split line ------------------------------------------

Detailed configuration process of Hadoop2.5.2 + HA + Zookeeper3.4.6

Tutorial on standalone/pseudo-distributed installation and configuration of Hadoop2.4.1 under Ubuntu14.04

Install and configure Hadoop2.2.0 on CentOS

Build a Hadoop environment on Ubuntu 13.04

Cluster configuration for Ubuntu 12.10 + Hadoop 1.2.1

Build a Hadoop environment on Ubuntu (standalone mode + pseudo Distribution Mode)

Configuration of Hadoop environment in Ubuntu

Detailed tutorial on creating a Hadoop environment for standalone Edition

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.