Installation and configuration of a fully distributed Hadoop cluster (4 nodes)

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Hadoop version: hadoop-2.5.1-x64.tar.gz

The study referenced the Hadoop build process for the two nodes of the http://www.powerxing.com/install-hadoop-cluster/, I used VirtualBox to open four Ubuntu (version 15.10) virtual machines, build four nodes of the Hadoop distributed cluster, and configured Ha, suitable for the first construction of the classmate reference, node design as follows:

First, the preparatory work on the master node to complete the basic work, including the configuration of Hadoop users, installation configuration SSH, installation configuration Java environment, this stage to the Force star students write very detailed, every step has explained.

1. New Hadoop users are advised to use the Hadoop username when installing a virtual machine, and if not, add a Hadoop User:

sudo useradd-m hadoop-s/bin/bash

To set a password for a Hadoop user, it is recommended that all passwords be set to Hadoop:

sudo passwd Hadoop

Add administrator privileges to Hadoop users:

sudo adduser hadoop sudo

Finally log off the current user (tap the gear in the upper-right corner of the screen, select logout) and log in using the Hadoop user you just created in the login interface. After logging in with Hadoop users, we will update apt to facilitate subsequent installation of the software:

sudo apt-get update

Install vim for text editing:

sudo apt-get install vim

2. Installation configuration SSH

The cluster, single-node mode requires SSH login (similar to remote login, you can log on to a Linux host and run commands on it), Ubuntu has the SSH client installed by default, and also needs to install SSH server:

sudo apt-get install Openssh-server

To configure SSH login without password we'll do it behind. 3. Install and configure the Java environment

Install OpenJDK7 directly from the command:

sudo apt-get install Openjdk-7-jre openjdk-7-jdk

After installing the OpenJDK, you need to locate the appropriate installation path, which is used to configure the JAVA_HOME environment variable. Execute the following command:

Dpkg-l OPENJDK-7-JDK | grep '/bin/javac '

The command outputs a path, removing the "/bin/javac" at the end of the path, and the rest is the correct path. If the output path is/usr/lib/jvm/java-7-openjdk-amd64/bin/javac, the path we need is/usr/lib/jvm/java-7-openjdk-amd64. Then configure the JAVA_HOME environment variable, for convenience, we set in ~/.BASHRC:

Vim ~/.BASHRC

Add a separate line at the front of the file (note that there can be no spaces before and after the = number), change the "JDK installation path" to the path obtained by the above command, and save the following:

Export JAVA_HOME=JDK Installation path

You will then need to make the environment variable effective, and execute the following code:

SOURCE ~/.BASHRC    # makes variable settings effective

"In a Linux system, ~ represents the user's home folder, the"/home/User name "directory, such as your user name is Hadoop, then ~ represents"/home/hadoop/". " 4, clone node

I cloned the master node in VirtualBox 3 times, named Slave1,slave2 and Slave3, the network connection mode to bridge, on all nodes to do the following 2 things: Modify the hostname, and we set the name of the node corresponding (restart system effective):

Vim/etc/hostname

Add all your host names and corresponding IP addresses (ifconfig view native IP)

Vim/etc/hosts

Once all the nodes have been completed, test to see if they can ping.

Ping slave1-c 3   # Ping Slave1 node,-c3 means only ping 3 times, press CTRL + C to interrupt command

5. Configure SSH login without passwordOn the master node, execute:

CD ~/.ssh/                     # If you do not have this directory, please first execute SSH localhost, and then Exit
ssh-keygen-t RSA # Using Exit              will be prompted, press ENTER to be able to
cat./id_rsa.pub >>./authorized_keys  # Join license

In this case, ssh localhost command, no need to enter the password can be directly logged in.
Then transfer the public key to the SLAVE1 node on the Master node:

SCP ~/.ssh/id_rsa.pub hadoop@slave1:/home/hadoop/

Then, on the SLAVE1 node, add the SSH public key to authorize:

mkdir ~/.ssh       # If the folder does not exist, it needs to be created, if it already exists, ignore
cat ~/id_rsa.pub >> ~/.ssh/authorized_keys
RM ~/id_rsa.pub    # You can erase it when you're done with it.

The steps to transfer and authorize the public key are also done on other slave nodes.

This way, you can SSH to each Slave node without a password on the Master node.

"SCP is a shorthand for secure copy for remote copy files under Linux, similar to the CP command, but the CP can only be copied in this machine." If the file times in the SCP error, appear permission denied, to the target file modification permissions on the line:

sudo chown–r Hadoop path

If you modify the permissions times wrong:/etc/sudoers belongs to the user id1000 should be 0 or not find a valid sudoers resources, the above command to remove the sudo on the line " two, installation configuration Hadoop 1, Master node installation and configuration Hadoop

I am using the 64-bit version of 2.5.1, the installation package is placed under the "~/download" path, the master node executes:

sudo tar-zxf ~/download/hadoop-2.6.0.tar.gz-c/usr/local    # extract into/usr/local cd/usr/local/
sudo mv./hadoop-2.6.0 /./hadoop            # Change folder name to Hadoop
sudo chown-r hadoop./hadoop       # Modify file permissions

"./is the file under the current path, and/is the root directory, such as I am executing the CD under the/usr/local/hadoop path./bin is entering the/usr/local/hadoop/bin/hadoop directory. PS:CD. is to go back to the previous level directory "
To change the configuration file after loading, the Hadoop configuration file is located in/usr/local/hadoop/etc/hadoop/, Modify the three configuration files Core-site.xml and Hdfs-site.xml and slaves in the master node master above directory:

Vim Core-site.xml

Add the following (Myhadoop is the one you can take, whatever you call it):

<configuration>
<property>
  <name>fs.defaultFS</name>
  <value>hdfs:// myhadoop</value>
</property>
<property>
   <name>ha.zookeeper.quorum</name >
   <value>Master:2181,Slave1:2181,Slave2:2181</value>
 </property>
<property >
                    <name>hadoop.tmp.dir</name>
                    <value>/opt/hadoop2</value>
</ Property>
</configuration>

Then modify the other:

Vim Hdfs-site.xml

Add the following content:

<configuration> <property> <name>dfs.nameservices</name>
                    <value> myhadoop </value> </property> <property> <name>dfs.ha.namenodes. Myhadoop </name> <value>nn1,nn2</value> </property> &L T;property> <name>dfs.namenode.rpc-address.myhadoop.nn1</name> < value>master:8020</value> </property> <property> <name
            >dfs.namenode.rpc-address.myhadoop.nn2</name> <value>Slave1:8020</value> </property> <property> <name>dfs.namenode.http-address. myhadoop.nn1</name> <value>Master:50070</value> </property> <pRoperty> <name>dfs.namenode.http-address. myhadoop.nn2</name> <value>Slave1:50070</value> </property> &LT;PR Operty> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://Slave1:8485; slave2:8485; Slave3:8485/myhadoop </value> </property> <property> <name> Dfs.client.failover.proxy.provider. Myhadoop </name> <value> Org.apache.hadoop.hdfs.server.namenode.ha.configuredfailoverproxyprovider</value> </property> < property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property > <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/home/hadoop/.
  ssh/id_rsa</value> </property> <property> <name>dfs.journalnode.edits.dir</name> <value>/opt/jn/data</value> </property> <property&Gt <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> <
 /configuration>

Finally, modify this:

Vim Slaves

Add the following content (labeled Datanode node):

Slave1
Slave2
Slave3

2. Pass the provisioned Hadoop to another node

Cd/usr/local
sudo rm-r./hadoop/tmp     # Remove the Hadoop temp file
sudo rm-r./hadoop/logs/*   # Delete log file
tar-zcf ~/ hadoop.master.tar.gz./hadoop     # First Compress and then copy the
CD ~
SCP./hadoop.master.tar.gz slave1:/home/hadoop
SCP. hadoop.master.tar.gz slave2:/home/hadoop
SCP./hadoop.master.tar.gz Slave3:/home/hadoop

Then execute on all slave nodes:

sudo rm-r/usr/local/hadoop    # Erase old (if present)
sudo tar-zxf ~/hadoop.master.tar.gz-c/usr/local
sudo chown-r Hadoop/usr/local/hadoop

third, configure HA and boot 1, Zookeeper

Download the Zookeeper package (I used the 3.4.6 version) to extract to the/user/local and change it to a simple name zookeeper (the process is the same as the one in which the Hadoop package is installed), copy the Zoo_ in the Conf directory under this directory Sample.cfg file, renamed to Zoo.cfg saved in the same directory, the command is as follows:

Cp-a zoo_sample.cfg zoo.cfg

Then modify the Zoo.cfg command as follows:

Vim Zoo.cfg

After modifying the DataDir path to/opt/zookeeper (this is to create a zookeeper file in the OPT directory), then add the following three lines at the end (that is, configure zookeeper on these three nodes):

server.1=master:2888:3888
server.2=slave1:2888:3888
server.3=slave2:2888:3888

Then create a zookeeper file in the OPT directory, creating a new myID file within the file and modifying:

Mkdir/opt/zookeeper
cd/opt/zookeeper/
vim myID

Add a number 1 and then upload the zookeeper directory to the same directory as the other ZK nodes (that is, slave1 and 2), and modify the myID. (Slave1 modified to 2,slave2 modified to 3)

Then add the path variable to ZK (this step will be done by each ZK node, that is, to do 3 times):

Vim/etc/profile

Add this line:

Export path= $PATH:/usr/local/zookeeper/bin

Then bring it into effect:

Source/etc/profile

You can then start ZK in any directory (which is why you added the path variable to ZK):

zkserver.sh start

Enter the JPS command to see if there is quorumpeermain to indicate that the zookeeper started successfully.

"If the firewall is not turned off, may error, shut down the firewall command: Service iptables stop" 2, Journalnode

The Journalnode can be started on all slave nodes:

hadoop-daemon.sh Start Journalnode

3. Configure the path path for Hadoop

Since many of the commands in Hadoop are executed under the installation directory, we can add the Hadoop installation directory to the PATH variable so that commands such as HDFS can be used directly in any directory, and if not configured, it needs to be configured on the Master node. First, the Vim ~/.BASHRC is executed, adding a line:

Export path= $PATH:/usr/local/hadoop/bin:/usr/local/hadoop/sbin

After saving, execute the source ~/.BASHRC to make the configuration effective. 4, the main preparation Namenode

The first time you start Hadoop you need to perform NameNode formatting on the Master node first:

HDFs Namenode-format       # First run requires initialization, after which no

Because Slave1 is an alternate namenode node, it replicates the metadata of the master node (stored in the/opt/hadoop2/dfs/name/current of Master), and the Namenode of the master node is turned on before copying:

hadoop-daemon.sh Start Namenode

Then Slave1 node execution:

HDFs Namenode-bootstrapstandby

Then Slave1 can start the namenode.
5, ZKFC

Format before you start:

HDFs Zkfc-formatzk

Without error, you can start all of them:

start-dfs.sh

Each node execution jps,master node should have JPS, NameNode, Quorumpeermain, Dfszkfailoverconteroller four services, SLAVE1 node should have the front 4 plus datanode and journalnode a total of 6, Slave2 node should have JPS, Quorumpeermain, Datanode and Journalnode four services, Slave3 node should have JPS , Datanode and Journalnode three services.

"If all the Datanode nodes do not start, the other normal startup situation, the/opt/hadoop2/dfs/directory of each of your slave nodes to delete the data file, and then open the test. " 6, upload files

HDFs dfs-mkdir-p/usr/file  #新建hdfs一个目录
hdfs dfs-put/home/hadoop/test/usr/file  #put上传

Can be viewed in a browser with an address of http://Master:50070 or http://Slave1:50070
Four, configure yarnHere and the configuration of the Force star is different, it is recommended to use my configuration below, or there may be a node drop line situation.
Locate the Mapred-site.xml file (and the Core-site.xml file in a directory), and add the following:

<configuration>
<property>
   <name>mapreduce.framework.name</name>
   <value >yarn</value>
 </property>
</configuration>

Then find the Yarn-site.xml file (and the Core-site.xml file in a directory) and add the following:

<configuration>
<property>
   <name>yarn.resourcemanager.hostname </name>
   < value>master</value>
 </property>
<property>
   <name> yarn.nodemanager.aux-services</name>
   <value>mapreduce_shuffle</value>
 </property >
<property>
   <name> yarn.nodemanager.aux-services.mapreduce.shuffle.class </name>
   <value>org.apache.hadoop.mapred.ShuffleHandler</value>
 </property>
</ Configuration>

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Installation and configuration of a fully distributed Hadoop cluster (4 nodes)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Installation and configuration of a fully distributed Hadoop cluster (4 nodes)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support