Hadoop dynamic Add/Remove nodes (Datanode and Tacktracker)

Source: Internet
Author: User

In general, the correct approach is to prioritize the configuration file and then start/stop the corresponding process on the specific machine.

Some of the information on the web said that in the adjustment of the configuration file, the first use of host name instead of IP configuration.

In general, the method of adding/removing Datanode and Tasktracker is very similar, except that there are slight differences between the operation's configuration items and the commands used.


1. The DataNode 1.0 configuration file modifies the configuration file Conf/mapred-site.xml under Master/namenode. Key parameters Dfs.hosts and Dfs.hosts.exclude.
Note: Configuration file planning for different versions of Hadoop is not consistent. Refer specifically to the Cluster Setup section of the official Hadoop documentation for the relevant version. http://hadoop.apache.org/docs/click on the same or similar version.

The above statement is in Hadoop 1.x, followed by the version of the example, the above configuration in Hadoop 0.x is stored in the file Conf/hadoop-site.xml, in the Hadoop 2.x is very large, the file is conf/ Hdfs-site.xml, the parameter names are: Dfs.namenode.hosts and fs.namenode.hosts.exclude.
Parameter action: dfs.hosts: Allow access to the list of Datanode machines, if not configured or the specified list file is empty, by default allows all hosts to become Datanode Dfs.hosts.exclude: Deny access to the list of machines for Datanode, If a machine appears in two lists at the same time, it is rejected. Their essential role is to deny Datanode process connections on some nodes, rather than scheduling the Datanode process on those nodes to allow and close.
Usage Example: Modify Conf/mapred-site.xml, add:

	<property>
		<name>dfs.hosts</name>
		<value>/opt/hadoop/conf/datanode-allow.list </value>
	</property>
	<property>
		<name>dfs.hosts.exclude</name>
		<value>/opt/hadoop/conf/datanode-deny.list</value>
	</property>
If you do not need an allow list, do not create a corresponding item. The file specified by value is then created. Write a host name on one line.

1.1 Add

1, configure on the new slave.

2, add the slave to the slave list on master (not necessary, convenient to restart cluster later)

3, (if any) add the slave to the Datanode-allow.list

4, start the Datanode process on slave:

Run: hadoop-daemon.sh start Datanode

PS: You can use the JPS command to view the PID and process name of the Java process on the machine.


1.2 Delete Extreme is not recommended directly on slave: Hadoop-daemon.sh stop datanode   command to turn off Datanode, This causes the missing block to appear in HDFs.

1, modify the datanode-deny.list on master, add the appropriate machine 2, refresh the node configuration on master: Hadoop dfsadmin-refreshnodes  
at this time in the Web Immediately on the UI you can see that the node becomes decommissioning state, and then it becomes dead. can also be viewed by: Hadoop dfsadmin-report command. 3, close the Datanode process on slave (not required): Run: Hadoop-daemon.sh stop datanode  

1.2.1 rejoin each deleted node 1, in master's datanode-deny.list delete the corresponding machine 2, refresh the node configuration on master: Hadoop dfsadmin-refreshnodes  
3, restart the Datanode process on slave: Hadoop-daemon.sh start datanode  
PS: If the slave process on the datanode was not previously closed, You need to shut down and restart.
2. Tacktracker 2.0 configuration file under Master/namenode, modify the configuration file under Hadoop 1.x conf/ Mapred-site.xml. Key parameters Mapred.hosts and Mapred.hosts.exclude.
The configuration file Conf/hadoop-site.xml needs to be modified for Hadoop 0.x; it's not clear for Hadoop 2.x, let's not talk about it. The
parameter acts: The same as the Datanode counterpart.
Usage Example: Modify Conf/mapred-site.xml, add:

	<property>
		<name><span style= "font-family:arial, Helvetica, Sans-serif;" >mapred</span><span style= "font-family:arial, Helvetica, Sans-serif;" >.hosts</name></span>
		<value>/opt/hadoop/conf/tasktracker-allow.list</value>
	</property>
	<property>
		<name><span style= "font-family:arial, Helvetica, Sans-serif; " >mapred</span>.hosts.exclude</name>
		<value>/opt/hadoop/conf/tasktracker-deny.list< /value>
	</property>
If you do not need an allow list, do not create a corresponding item. The file specified by value is then created. Write a host name on one line.
2.1 Add

1, configure on the new slave.

2, add the slave to the slave list on master (not necessary, convenient to restart cluster later)

3, (if any) add the slave to the Tasktracker-allow.list

4, start the tasktracker process on slave:

Run: hadoop-daemon.sh start Tasktracker

PS: You can use the JPS command to view the PID and process name of the Java process on the machine.
2.2 Delete

It is not recommended to go directly on the slave by: hadoop-daemon.sh stop Tasktracker command turns off Tasktracker, which causes Namenode to think that these machines are temporarily lost, within a timeout period (default 10min+ 30s) still assume that they are normal and will send tasks to them.

1, modify the tasktracker-deny.list on master, add the corresponding machine 2, refresh the node configuration on master: Hadoop mradmin-refreshnodes
The number of nodes is now visible on the Web UI, and the number of exclude nodes has increased. You can click to see it in detail. 3, close the Tasktracker process on slave (not required): Run: hadoop-daemon.sh stop Tasktracker

2.2.1 Re-joins each deleted node 1, deletes the corresponding machine 2 in master's tasktracker-deny.list, refreshes the node configuration on master: Hadoop mradmin-refreshnodes
3, restart the tasktracker process on slave: hadoop-daemon.sh start Tasktracker
PS: If you have not previously closed the tasktracker process on the slave, you need to shut down and restart.


Originally contained in Http://blog.csdn.net/yanxiangtianji

Reprint please indicate the source

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.