Installation and configuration of a fully distributed Hadoop cluster (4 nodes)

Source: Internet
Author: User
Tags failover zookeeper ssh file permissions root directory secure copy ssh server hdfs dfs

Hadoop version: hadoop-2.5.1-x64.tar.gz

The study referenced the Hadoop build process for the two nodes of the, I used VirtualBox to open four Ubuntu (version 15.10) virtual machines, build four nodes of the Hadoop distributed cluster, and configured Ha, suitable for the first construction of the classmate reference, node design as follows:

First, the preparatory work on the master node to complete the basic work, including the configuration of Hadoop users, installation configuration SSH, installation configuration Java environment, this stage to the Force star students write very detailed, every step has explained.

1. New Hadoop users are advised to use the Hadoop username when installing a virtual machine, and if not, add a Hadoop User:

sudo useradd-m hadoop-s/bin/bash

To set a password for a Hadoop user, it is recommended that all passwords be set to Hadoop:

sudo passwd Hadoop
Add administrator privileges to Hadoop users:
sudo adduser hadoop sudo

Finally log off the current user (tap the gear in the upper-right corner of the screen, select logout) and log in using the Hadoop user you just created in the login interface. After logging in with Hadoop users, we will update apt to facilitate subsequent installation of the software:

sudo apt-get update

Install vim for text editing:

sudo apt-get install vim
2. Installation configuration SSH

The cluster, single-node mode requires SSH login (similar to remote login, you can log on to a Linux host and run commands on it), Ubuntu has the SSH client installed by default, and also needs to install SSH server:

sudo apt-get install Openssh-server

To configure SSH login without password we'll do it behind. 3. Install and configure the Java environment

Install OpenJDK7 directly from the command:

sudo apt-get install Openjdk-7-jre openjdk-7-jdk

After installing the OpenJDK, you need to locate the appropriate installation path, which is used to configure the JAVA_HOME environment variable. Execute the following command:

Dpkg-l OPENJDK-7-JDK | grep '/bin/javac '

The command outputs a path, removing the "/bin/javac" at the end of the path, and the rest is the correct path. If the output path is/usr/lib/jvm/java-7-openjdk-amd64/bin/javac, the path we need is/usr/lib/jvm/java-7-openjdk-amd64. Then configure the JAVA_HOME environment variable, for convenience, we set in ~/.BASHRC:


Add a separate line at the front of the file (note that there can be no spaces before and after the = number), change the "JDK installation path" to the path obtained by the above command, and save the following:

Export JAVA_HOME=JDK Installation path

You will then need to make the environment variable effective, and execute the following code:

SOURCE ~/.BASHRC    # makes variable settings effective

"In a Linux system, ~ represents the user's home folder, the"/home/User name "directory, such as your user name is Hadoop, then ~ represents"/home/hadoop/". " 4, clone node

I cloned the master node in VirtualBox 3 times, named Slave1,slave2 and Slave3, the network connection mode to bridge, on all nodes to do the following 2 things: Modify the hostname, and we set the name of the node corresponding (restart system effective):


Add all your host names and corresponding IP addresses (ifconfig view native IP)


Once all the nodes have been completed, test to see if they can ping.

Ping slave1-c 3   # Ping Slave1 node,-c3 means only ping 3 times, press CTRL + C to interrupt command
5. Configure SSH login without passwordOn the master node, execute:
CD ~/.ssh/                     # If you do not have this directory, please first execute SSH localhost, and then Exit
ssh-keygen-t RSA # Using Exit              will be prompted, press ENTER to be able to
cat./ >>./authorized_keys  # Join license
In this case, ssh localhost command, no need to enter the password can be directly logged in.
Then transfer the public key to the SLAVE1 node on the Master node:
SCP ~/.ssh/ hadoop@slave1:/home/hadoop/

Then, on the SLAVE1 node, add the SSH public key to authorize:

mkdir ~/.ssh       # If the folder does not exist, it needs to be created, if it already exists, ignore
cat ~/ >> ~/.ssh/authorized_keys
RM ~/    # You can erase it when you're done with it.

The steps to transfer and authorize the public key are also done on other slave nodes.

This way, you can SSH to each Slave node without a password on the Master node.

"SCP is a shorthand for secure copy for remote copy files under Linux, similar to the CP command, but the CP can only be copied in this machine." If the file times in the SCP error, appear permission denied, to the target file modification permissions on the line:

sudo chown–r Hadoop path

If you modify the permissions times wrong:/etc/sudoers belongs to the user id1000 should be 0 or not find a valid sudoers resources, the above command to remove the sudo on the line " two, installation configuration Hadoop 1, Master node installation and configuration Hadoop

I am using the 64-bit version of 2.5.1, the installation package is placed under the "~/download" path, the master node executes:

sudo tar-zxf ~/download/hadoop-2.6.0.tar.gz-c/usr/local    # extract into/usr/local cd/usr/local/
sudo mv./hadoop-2.6.0 /./hadoop            # Change folder name to Hadoop
sudo chown-r hadoop./hadoop       # Modify file permissions
"./is the file under the current path, and/is the root directory, such as I am executing the CD under the/usr/local/hadoop path./bin is entering the/usr/local/hadoop/bin/hadoop directory. PS:CD. is to go back to the previous level directory "
To change the configuration file after loading, the Hadoop configuration file is located in/usr/local/hadoop/etc/hadoop/, Modify the three configuration files Core-site.xml and Hdfs-site.xml and slaves in the master node master above directory:
Vim Core-site.xml
Add the following (Myhadoop is the one you can take, whatever you call it):
  <value>hdfs:// myhadoop</value>
   <name>ha.zookeeper.quorum</name >
<property >
</ Property>
Then modify the other:
Vim Hdfs-site.xml

Add the following content:

<configuration> <property> <name>dfs.nameservices</name>
                    <value> myhadoop </value> </property> <property> <name>dfs.ha.namenodes. Myhadoop </name> <value>nn1,nn2</value> </property> &L T;property> <name>dfs.namenode.rpc-address.myhadoop.nn1</name> < value>master:8020</value> </property> <property> <name
            >dfs.namenode.rpc-address.myhadoop.nn2</name> <value>Slave1:8020</value> </property> <property> <name>dfs.namenode.http-address. myhadoop.nn1</name> <value>Master:50070</value> </property> <pRoperty> <name>dfs.namenode.http-address. myhadoop.nn2</name> <value>Slave1:50070</value> </property> &LT;PR Operty> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://Slave1:8485; slave2:8485; Slave3:8485/myhadoop </value> </property> <property> <name> Dfs.client.failover.proxy.provider. Myhadoop </name> <value> Org.apache.hadoop.hdfs.server.namenode.ha.configuredfailoverproxyprovider</value> </property> < property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property > <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/home/hadoop/.
  ssh/id_rsa</value> </property> <property> <name>dfs.journalnode.edits.dir</name> <value>/opt/jn/data</value> </property> <property&Gt <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> <

Finally, modify this:

Vim Slaves

Add the following content (labeled Datanode node):

2. Pass the provisioned Hadoop to another node
sudo rm-r./hadoop/tmp     # Remove the Hadoop temp file
sudo rm-r./hadoop/logs/*   # Delete log file
tar-zcf ~/ hadoop.master.tar.gz./hadoop     # First Compress and then copy the
CD ~
SCP./hadoop.master.tar.gz slave1:/home/hadoop
SCP. hadoop.master.tar.gz slave2:/home/hadoop
SCP./hadoop.master.tar.gz Slave3:/home/hadoop

Then execute on all slave nodes:

sudo rm-r/usr/local/hadoop    # Erase old (if present)
sudo tar-zxf ~/hadoop.master.tar.gz-c/usr/local
sudo chown-r Hadoop/usr/local/hadoop
third, configure HA and boot 1, Zookeeper

Download the Zookeeper package (I used the 3.4.6 version) to extract to the/user/local and change it to a simple name zookeeper (the process is the same as the one in which the Hadoop package is installed), copy the Zoo_ in the Conf directory under this directory Sample.cfg file, renamed to Zoo.cfg saved in the same directory, the command is as follows:

Cp-a zoo_sample.cfg zoo.cfg

Then modify the Zoo.cfg command as follows:

Vim Zoo.cfg

After modifying the DataDir path to/opt/zookeeper (this is to create a zookeeper file in the OPT directory), then add the following three lines at the end (that is, configure zookeeper on these three nodes):


Then create a zookeeper file in the OPT directory, creating a new myID file within the file and modifying:

vim myID

Add a number 1 and then upload the zookeeper directory to the same directory as the other ZK nodes (that is, slave1 and 2), and modify the myID. (Slave1 modified to 2,slave2 modified to 3)

Then add the path variable to ZK (this step will be done by each ZK node, that is, to do 3 times):


Add this line:

Export path= $PATH:/usr/local/zookeeper/bin

Then bring it into effect:


You can then start ZK in any directory (which is why you added the path variable to ZK): start

Enter the JPS command to see if there is quorumpeermain to indicate that the zookeeper started successfully.

"If the firewall is not turned off, may error, shut down the firewall command: Service iptables stop" 2, Journalnode

The Journalnode can be started on all slave nodes: Start Journalnode
3. Configure the path path for Hadoop

Since many of the commands in Hadoop are executed under the installation directory, we can add the Hadoop installation directory to the PATH variable so that commands such as HDFS can be used directly in any directory, and if not configured, it needs to be configured on the Master node. First, the Vim ~/.BASHRC is executed, adding a line:

Export path= $PATH:/usr/local/hadoop/bin:/usr/local/hadoop/sbin

After saving, execute the source ~/.BASHRC to make the configuration effective. 4, the main preparation Namenode

The first time you start Hadoop you need to perform NameNode formatting on the Master node first:

HDFs Namenode-format       # First run requires initialization, after which no

Because Slave1 is an alternate namenode node, it replicates the metadata of the master node (stored in the/opt/hadoop2/dfs/name/current of Master), and the Namenode of the master node is turned on before copying: Start Namenode

Then Slave1 node execution:

HDFs Namenode-bootstrapstandby
Then Slave1 can start the namenode.

Format before you start:

HDFs Zkfc-formatzk

Without error, you can start all of them:

Each node execution jps,master node should have JPS, NameNode, Quorumpeermain, Dfszkfailoverconteroller four services, SLAVE1 node should have the front 4 plus datanode and journalnode a total of 6, Slave2 node should have JPS, Quorumpeermain, Datanode and Journalnode four services, Slave3 node should have JPS , Datanode and Journalnode three services.

"If all the Datanode nodes do not start, the other normal startup situation, the/opt/hadoop2/dfs/directory of each of your slave nodes to delete the data file, and then open the test. " 6, upload files

HDFs dfs-mkdir-p/usr/file  #新建hdfs一个目录
hdfs dfs-put/home/hadoop/test/usr/file  #put上传
Can be viewed in a browser with an address of http://Master:50070 or http://Slave1:50070
Four, configure yarnHere and the configuration of the Force star is different, it is recommended to use my configuration below, or there may be a node drop line situation.
Locate the Mapred-site.xml file (and the Core-site.xml file in a directory), and add the following:
   <value >yarn</value>

Then find the Yarn-site.xml file (and the Core-site.xml file in a directory) and add the following:

   <name>yarn.resourcemanager.hostname </name>
   < value>master</value>
   <name> yarn.nodemanager.aux-services</name>
 </property >
   <name> yarn.nodemanager.aux-services.mapreduce.shuffle.class </name>
</ Configuration>

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.