Add a datanode node to a Hadoop Cluster

Source: Internet
Author: User

Add a datanode node to a Hadoop Cluster

As the business expands, the company's first three nodes are not enough, so datanode needs to be added. Follow these steps to add a datanode node:

1. Create a user grid running Hadoop on the new datanode and change the password. Modify the IP address and bind the host name, hosts file, and disable the firewall of the new node.

I am using the CentOS7 system, so modifying the static IP address is: vi/etc/sysconfig/network-scripts/ifcfg-eno16777736

Then run service network restart to restart the network.

Bind the Host Name: vi/etc/hostname and modify it to a custom host name.

Disable firewall: centos7 replaces iptables with firewalld. Therefore, the command to disable the firewall is:

Systemctl status firewalld view the Firewall status

Systemctl stop firewalld disable Firewall

Systemctl disable firewalld disables firewall startup. The corresponding two files are deleted.

Modify the hosts file:

Vi/etc/hosts

Shutdown-r now restart your computer.

2. (1) log on to the new datanode node using grid and create an ssh Public Key: ssh-keygen-t rsa,

(2) back up the generated public key id_rsa.pub: cp id_rsa.pub id_rsa.pub.x. Here x indicates the encoding of the newly added node.

(3) and send id_rsa.pub.x to the master node.

Scp-r id_rsa.pub.x grid @ namenode_hostname:/home/grid/. ssh/

3. On the master node,

(1) Copy id_rsa.pub.x to authorized_keys: cat id_rsa.pub.x> authorized_keys

(2) The master node distributes authorized_keys to all nodes, including the new datanode.

Scp-r authorized_keys grid @ datanode_hostname1:/home/grid/. ssh/

Scp-r authorized_keys grid @ datanode_hostname1:/home/grid/. ssh/

:

:

(3) Add a new datanode IP address on the master node:

Vi/etc/hosts Add the IP address and Host Name of the new node to the file

Then, the hosts file on the master node is distributed to all nodes (including the newly added datanode ).

(4) Distribute the jdk Installation File, hadoop installation file, and environment variable file/etc/profile of the master node to the newly added datanode.

(5) modify the hadoop salves file on the master node and add the Host Name of the new datanode node.

(6) execute source/etc/profile on the newly added Node

(7) run the following command on the new node:
Hadoop-daemon.sh start datanode
Hadoop-daemon.sh start tasktracker

(8) balance the previous datanode block and execute the following on the new node:
Start-balancer.sh

This takes a long time.
1) if not, the cluster will store new data on the new node, which will reduce the efficiency of mapred.
2) set the balance threshold. The default value is 10%. The lower the value, the more balanced the nodes, but the longer the consumption time.
[Root @ slave-004 hadoop] # start-balancer.sh-threshold 5
3) set the balance bandwidth, the default is only 1 Mb/s, modify the hdfs-site.xml

<Property>
<Name> dfs. balance. bandwidthPerSec </name>
<Value> 1048576 </value>
</Property>

        

(9) Security Mode
There are two ways to exit this security mode:
1) Modify dfs. safemode. threshold. pct to a smaller value. The default value is 0.999.
Dfs. safemode. threshold. pct (default value: 0.999f)
When HDFS is started, if the number of blocks reported by DataNode reaches 0.999 times of the number of blocks recorded by metadata, the security mode can be left; otherwise, the read-only mode is always used. If it is set to 1, HDFS is always in SafeMode.

2) force the hadoop dfsadmin-safemode leave command to exit
Description of the dfsadmin-safemode value parameter value:
Enter-enter security mode
Leave-force NameNode to exit safe Mode
Get-returned information about whether the security mode is enabled
Wait-wait until the end of security mode.

Tutorial on standalone/pseudo-distributed installation and configuration of Hadoop2.4.1 under Ubuntu14.04

Install and configure Hadoop2.2.0 on CentOS

Build a Hadoop environment on Ubuntu 13.04

Cluster configuration for Ubuntu 12.10 + Hadoop 1.2.1

Build a Hadoop environment on Ubuntu (standalone mode + pseudo Distribution Mode)

Configuration of Hadoop environment in Ubuntu

Detailed tutorial on creating a Hadoop environment for standalone Edition

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.