Hadoop dynamic Join/delete nodes (Datanode and Tacktracker)

Source: Internet
Author: User

In general, the correct approach is to prefer the configuration file and then start the detailed machine corresponding to the process/stop operation.

Some of the information on the web said that in the adjustment of the configuration file, the first use of host name instead of IP configuration.

In general, the method of adding/removing Datanode and Tasktracker is very similar, only a small difference between the operation's configuration items and the commands used.


1. DataNode1.0 Configuration FilesChange the configuration file Conf/mapred-site.xml under Master/namenode. Key parameters of Dfs.hosts and Dfs.hosts.exclude.
Note: Configuration file planning for different Hadoop version numbers is not consistent!

The cluster Setup section of the Hadoop official documentation for a detailed reference to the relevant version number. http://hadoop.apache.org/docs/click on the same or similar version number.

The above statement is in Hadoop 1.x, followed by this version number example, the above configuration in Hadoop 0.x is stored in the file Conf/hadoop-site.xml, in the Hadoop 2.x is very large, the file is conf/ Hdfs-site.xml, the parameters are: Dfs.namenode.hosts and fs.namenode.hosts.exclude.
Role: dfs.hosts: agree to access the list of machines for Datanode, assuming that not configured or the specified list file is empty, the default consent to all hosts become DataNodedfs.hosts.exclude: List of machines denied access to Datanode. If a machine is present in two lists at the same time, it is rejected.

Their essential role is to deny Datanode process connections on certain nodes. instead of dispatching the consent and shutdown of the datanode process on these nodes.
Examples of how to use: Change Conf/mapred-site.xml, add:

<property><name>dfs.hosts</name><value>/opt/hadoop/conf/datanode-allow.list</value ></property><property><name>dfs.hosts.exclude</name><value>/opt/hadoop/conf/ Datanode-deny.list</value></property>
If you do not need to agree to the list, do not create the corresponding item.

The file specified by value is then created. Write a host name on one line.



1.1 Join

1, configure on the new slave.

2, add the slave on the slave list on master (not required, convenient to restart later cluster)

3. (if any) add the slave to the Datanode-allow.list

4, start the Datanode process on slave:

Execution:hadoop-daemon.sh start Datanode

PS: Ability to use the JPS command to view the PID and process name of the Java process on the machine.


1.2 Delete Extreme is not recommended directly on the slave by:hadoop-daemon.sh Stop Datanodecommand to turn off Datanode. This causes the missing block to appear in HDFs.



1. Change datanode-deny.list on Master to join the corresponding machine2, refresh the node configuration on master:Hadoop dfsadmin-refreshnodes
At this point in the Web UI you can immediately see that the node becomes decommissioning state, and after a while it becomes dead. can also be viewed by: Hadoop dfsadmin-report command. 3. Close the Datanode process on slave (not required): Execute:hadoop-daemon.sh Stop Datanode

1.2.1 Add each deleted node again1, delete the corresponding machine in master's datanode-deny.list2. To refresh the node configuration on master:Hadoop dfsadmin-refreshnodes
3, restart the Datanode process on slave:hadoop-daemon.sh Start Datanode
PS: Assuming that the datanode process on the slave has not been closed before, you need to shut down and start again.
2. TackTracker2.0 config file under Hadoop 1.x under Master/namenode change profile conf/mapred-site.xml.

Key parameters of Mapred.hosts and Mapred.hosts.exclude.


For Hadoop 0.x need to change the configuration file conf/hadoop-site.xml; For Hadoop 2.x is not clear, not to mention.
Role: The same as the corresponding datanode.
Example of how to use: Change Conf/mapred-site.xml. Join:

<property><name><span style= "font-family:arial, Helvetica, Sans-serif;" >mapred</span><span style= "font-family:arial, Helvetica, Sans-serif;" >.hosts</name></span><value>/opt/hadoop/conf/tasktracker-allow.list</value></ Property><property><name><span style= "font-family:arial, Helvetica, Sans-serif;" >mapred</span>.hosts.exclude</name><value>/opt/hadoop/conf/tasktracker-deny.list</ Value></property>
Suppose you don't need a consent list. Do not create a corresponding item.

The file specified by value is then created.

Write a host name on one line.
2.1 Join

1, configure on the new slave.

2. The slave list on master adds the slave (not required. Convenient for later restart cluster)

3, (if any) add the slave to the Tasktracker-allow.list

4, start the tasktracker process on slave:

Execution:hadoop-daemon.sh start Tasktracker

PS: Ability to use the JPS command to view the PID and process name of the Java process on the machine.


2.2 Delete

It is not recommended to go directly on slave by: hadoop-daemon.sh stop tasktracker command to turn off Tasktracker, which will cause Namenode to feel that these machines are temporarily missing the union. Within a timeout period (default 10min+30s) The task will still be sent to them if they are normal.

1, change the tasktracker-deny.list on master, add the corresponding machine 2. Refresh node configuration on master: Hadoop mradmin-refreshnodes
At this point in the Web UI, you can see that the number of nodes is reduced immediately. And the number of exclude nodes has been added. Be able to click in detail to view.

3, close the Tasktracker process on slave (not required): Execute: hadoop-daemon.sh stop Tasktracker

2.2.1 Add each deleted node again1, delete the corresponding machine in master's tasktracker-deny.list2. To refresh the node configuration on master:Hadoop mradmin-refreshnodes
3. Restart the tasktracker process on slave:hadoop-daemon.sh Start Tasktracker
PS: Assume that the tasktracker process on the slave has not been closed before. You need to shut down and start again.


Originally contained in Http://blog.csdn.net/yanxiangtianji

Reprint please indicate the source

Copyright notice: This article Bo Master original articles, blogs, without consent may not be reproduced.

Hadoop dynamic Join/delete nodes (Datanode and Tacktracker)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.