Add a datanode node to a Hadoop Cluster

Last Update:2015-05-21 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

As the business expands, the company's first three nodes are not enough, so datanode needs to be added. Follow these steps to add a datanode node:

1. Create a user grid running Hadoop on the new datanode and change the password. Modify the IP address and bind the host name, hosts file, and disable the firewall of the new node.

I am using the CentOS7 system, so modifying the static IP address is: vi/etc/sysconfig/network-scripts/ifcfg-eno16777736

Then run service network restart to restart the network.

Bind the Host Name: vi/etc/hostname and modify it to a custom host name.

Disable firewall: centos7 replaces iptables with firewalld. Therefore, the command to disable the firewall is:

Systemctl status firewalld view the Firewall status

Systemctl stop firewalld disable Firewall

Systemctl disable firewalld disables firewall startup. The corresponding two files are deleted.

Modify the hosts file:

Vi/etc/hosts

Shutdown-r now restart your computer.

2. (1) log on to the new datanode node using grid and create an ssh Public Key: ssh-keygen-t rsa,

(2) back up the generated public key id_rsa.pub: cp id_rsa.pub id_rsa.pub.x. Here x indicates the encoding of the newly added node.

(3) and send id_rsa.pub.x to the master node.

Scp-r id_rsa.pub.x grid @ namenode_hostname:/home/grid/. ssh/

3. On the master node,

(1) Copy id_rsa.pub.x to authorized_keys: cat id_rsa.pub.x> authorized_keys

(2) The master node distributes authorized_keys to all nodes, including the new datanode.

Scp-r authorized_keys grid @ datanode_hostname1:/home/grid/. ssh/

(3) Add a new datanode IP address on the master node:

Vi/etc/hosts Add the IP address and Host Name of the new node to the file

Then, the hosts file on the master node is distributed to all nodes (including the newly added datanode ).

(4) Distribute the jdk Installation File, hadoop installation file, and environment variable file/etc/profile of the master node to the newly added datanode.

(5) modify the hadoop salves file on the master node and add the Host Name of the new datanode node.

(6) execute source/etc/profile on the newly added Node

(7) run the following command on the new node:
Hadoop-daemon.sh start datanode
Hadoop-daemon.sh start tasktracker

(8) balance the previous datanode block and execute the following on the new node:
Start-balancer.sh

This takes a long time.
1) if not, the cluster will store new data on the new node, which will reduce the efficiency of mapred.
2) set the balance threshold. The default value is 10%. The lower the value, the more balanced the nodes, but the longer the consumption time.
[Root @ slave-004 hadoop] # start-balancer.sh-threshold 5
3) set the balance bandwidth, the default is only 1 Mb/s, modify the hdfs-site.xml

<Property>
<Name> dfs. balance. bandwidthPerSec </name>
<Value> 1048576 </value>
</Property>

(9) Security Mode
There are two ways to exit this security mode:
1) Modify dfs. safemode. threshold. pct to a smaller value. The default value is 0.999.
Dfs. safemode. threshold. pct (default value: 0.999f)
When HDFS is started, if the number of blocks reported by DataNode reaches 0.999 times of the number of blocks recorded by metadata, the security mode can be left; otherwise, the read-only mode is always used. If it is set to 1, HDFS is always in SafeMode.

2) force the hadoop dfsadmin-safemode leave command to exit
Description of the dfsadmin-safemode value parameter value:
Enter-enter security mode
Leave-force NameNode to exit safe Mode
Get-returned information about whether the security mode is enabled
Wait-wait until the end of security mode.

Tutorial on standalone/pseudo-distributed installation and configuration of Hadoop2.4.1 under Ubuntu14.04

Install and configure Hadoop2.2.0 on CentOS

Build a Hadoop environment on Ubuntu 13.04

Cluster configuration for Ubuntu 12.10 + Hadoop 1.2.1

Build a Hadoop environment on Ubuntu (standalone mode + pseudo Distribution Mode)

Configuration of Hadoop environment in Ubuntu

Detailed tutorial on creating a Hadoop environment for standalone Edition

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Add a datanode node to a Hadoop Cluster

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support