Hadoop dynamic Add/Remove nodes (Datanode and Tacktracker)

Last Update:2018-07-26 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

In general, the correct approach is to prioritize the configuration file and then start/stop the corresponding process on the specific machine.

Some of the information on the web said that in the adjustment of the configuration file, the first use of host name instead of IP configuration.

In general, the method of adding/removing Datanode and Tasktracker is very similar, except that there are slight differences between the operation's configuration items and the commands used.

1. The DataNode 1.0 configuration file modifies the configuration file Conf/mapred-site.xml under Master/namenode. Key parameters Dfs.hosts and Dfs.hosts.exclude.
Note: Configuration file planning for different versions of Hadoop is not consistent. Refer specifically to the Cluster Setup section of the official Hadoop documentation for the relevant version. http://hadoop.apache.org/docs/click on the same or similar version.

The above statement is in Hadoop 1.x, followed by the version of the example, the above configuration in Hadoop 0.x is stored in the file Conf/hadoop-site.xml, in the Hadoop 2.x is very large, the file is conf/ Hdfs-site.xml, the parameter names are: Dfs.namenode.hosts and fs.namenode.hosts.exclude.
Parameter action: dfs.hosts: Allow access to the list of Datanode machines, if not configured or the specified list file is empty, by default allows all hosts to become Datanode Dfs.hosts.exclude: Deny access to the list of machines for Datanode, If a machine appears in two lists at the same time, it is rejected. Their essential role is to deny Datanode process connections on some nodes, rather than scheduling the Datanode process on those nodes to allow and close.
Usage Example: Modify Conf/mapred-site.xml, add:

	<property>
		<name>dfs.hosts</name>
		<value>/opt/hadoop/conf/datanode-allow.list </value>
	</property>
	<property>
		<name>dfs.hosts.exclude</name>
		<value>/opt/hadoop/conf/datanode-deny.list</value>
	</property>

If you do not need an allow list, do not create a corresponding item. The file specified by value is then created. Write a host name on one line.

1.1 Add

1, configure on the new slave.

2, add the slave to the slave list on master (not necessary, convenient to restart cluster later)

3, (if any) add the slave to the Datanode-allow.list

4, start the Datanode process on slave:

Run: hadoop-daemon.sh start Datanode

PS: You can use the JPS command to view the PID and process name of the Java process on the machine.

1.2 Delete Extreme is not recommended directly on slave: Hadoop-daemon.sh stop datanode command to turn off Datanode, This causes the missing block to appear in HDFs.

1, modify the datanode-deny.list on master, add the appropriate machine 2, refresh the node configuration on master: Hadoop dfsadmin-refreshnodes
at this time in the Web Immediately on the UI you can see that the node becomes decommissioning state, and then it becomes dead. can also be viewed by: Hadoop dfsadmin-report command. 3, close the Datanode process on slave (not required): Run: Hadoop-daemon.sh stop datanode

1.2.1 rejoin each deleted node 1, in master's datanode-deny.list delete the corresponding machine 2, refresh the node configuration on master: Hadoop dfsadmin-refreshnodes
3, restart the Datanode process on slave: Hadoop-daemon.sh start datanode
PS: If the slave process on the datanode was not previously closed, You need to shut down and restart.
2. Tacktracker 2.0 configuration file under Master/namenode, modify the configuration file under Hadoop 1.x conf/ Mapred-site.xml. Key parameters Mapred.hosts and Mapred.hosts.exclude.
The configuration file Conf/hadoop-site.xml needs to be modified for Hadoop 0.x; it's not clear for Hadoop 2.x, let's not talk about it. The
parameter acts: The same as the Datanode counterpart.
Usage Example: Modify Conf/mapred-site.xml, add:

	<property>
		<name><span style= "font-family:arial, Helvetica, Sans-serif;" >mapred</span><span style= "font-family:arial, Helvetica, Sans-serif;" >.hosts</name></span>
		<value>/opt/hadoop/conf/tasktracker-allow.list</value>
	</property>
	<property>
		<name><span style= "font-family:arial, Helvetica, Sans-serif; " >mapred</span>.hosts.exclude</name>
		<value>/opt/hadoop/conf/tasktracker-deny.list< /value>
	</property>

If you do not need an allow list, do not create a corresponding item. The file specified by value is then created. Write a host name on one line.
2.1 Add

1, configure on the new slave.

2, add the slave to the slave list on master (not necessary, convenient to restart cluster later)

3, (if any) add the slave to the Tasktracker-allow.list

4, start the tasktracker process on slave:

Run: hadoop-daemon.sh start Tasktracker

PS: You can use the JPS command to view the PID and process name of the Java process on the machine.
2.2 Delete

It is not recommended to go directly on the slave by: hadoop-daemon.sh stop Tasktracker command turns off Tasktracker, which causes Namenode to think that these machines are temporarily lost, within a timeout period (default 10min+ 30s) still assume that they are normal and will send tasks to them.

1, modify the tasktracker-deny.list on master, add the corresponding machine 2, refresh the node configuration on master: Hadoop mradmin-refreshnodes
The number of nodes is now visible on the Web UI, and the number of exclude nodes has increased. You can click to see it in detail. 3, close the Tasktracker process on slave (not required): Run: hadoop-daemon.sh stop Tasktracker

2.2.1 Re-joins each deleted node 1, deletes the corresponding machine 2 in master's tasktracker-deny.list, refreshes the node configuration on master: Hadoop mradmin-refreshnodes
3, restart the tasktracker process on slave: hadoop-daemon.sh start Tasktracker
PS: If you have not previously closed the tasktracker process on the slave, you need to shut down and restart.

Originally contained in Http://blog.csdn.net/yanxiangtianji

Reprint please indicate the source

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hadoop dynamic Add/Remove nodes (Datanode and Tacktracker)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Hadoop dynamic Add/Remove nodes (Datanode and Tacktracker)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support