In general, the correct approach is to prioritize the configuration file and then start/stop the corresponding process on the specific machine.
Some of the information on the web said that in the adjustment of the configuration file, the first use of host name instead of IP configuration.
In general, the method of adding/removing Datanode and Tasktracker is very similar, except that there are slight differences between the operation's configuration items and the commands used.
1. The DataNode 1.0 configuration file modifies the configuration file Conf/mapred-site.xml under Master/namenode. Key parameters Dfs.hosts and Dfs.hosts.exclude.
Note: Configuration file planning for different versions of Hadoop is not consistent. Refer specifically to the Cluster Setup section of the official Hadoop documentation for the relevant version. http://hadoop.apache.org/docs/click on the same or similar version.
The above statement is in Hadoop 1.x, followed by the version of the example, the above configuration in Hadoop 0.x is stored in the file Conf/hadoop-site.xml, in the Hadoop 2.x is very large, the file is conf/ Hdfs-site.xml, the parameter names are: Dfs.namenode.hosts and fs.namenode.hosts.exclude.
Parameter action: dfs.hosts: Allow access to the list of Datanode machines, if not configured or the specified list file is empty, by default allows all hosts to become Datanode Dfs.hosts.exclude: Deny access to the list of machines for Datanode, If a machine appears in two lists at the same time, it is rejected. Their essential role is to deny Datanode process connections on some nodes, rather than scheduling the Datanode process on those nodes to allow and close.
Usage Example: Modify Conf/mapred-site.xml, add:
<property>
<name>dfs.hosts</name>
<value>/opt/hadoop/conf/datanode-allow.list </value>
</property>
<property>
<name>dfs.hosts.exclude</name>
<value>/opt/hadoop/conf/datanode-deny.list</value>
</property>
If you do not need an allow list, do not create a corresponding item. The file specified by value is then created. Write a host name on one line.
1.1 Add
1, configure on the new slave.
2, add the slave to the slave list on master (not necessary, convenient to restart cluster later)
3, (if any) add the slave to the Datanode-allow.list
4, start the Datanode process on slave:
Run: hadoop-daemon.sh start Datanode
PS: You can use the JPS command to view the PID and process name of the Java process on the machine.
1.2 Delete Extreme is not recommended directly on slave: Hadoop-daemon.sh stop datanode command to turn off Datanode, This causes the missing block to appear in HDFs.
1, modify the datanode-deny.list on master, add the appropriate machine 2, refresh the node configuration on master: Hadoop dfsadmin-refreshnodes
at this time in the Web Immediately on the UI you can see that the node becomes decommissioning state, and then it becomes dead. can also be viewed by: Hadoop dfsadmin-report command. 3, close the Datanode process on slave (not required): Run: Hadoop-daemon.sh stop datanode
1.2.1 rejoin each deleted node 1, in master's datanode-deny.list delete the corresponding machine 2, refresh the node configuration on master: Hadoop dfsadmin-refreshnodes
3, restart the Datanode process on slave: Hadoop-daemon.sh start datanode
PS: If the slave process on the datanode was not previously closed, You need to shut down and restart.
2. Tacktracker 2.0 configuration file under Master/namenode, modify the configuration file under Hadoop 1.x conf/ Mapred-site.xml. Key parameters Mapred.hosts and Mapred.hosts.exclude.
The configuration file Conf/hadoop-site.xml needs to be modified for Hadoop 0.x; it's not clear for Hadoop 2.x, let's not talk about it. The
parameter acts: The same as the Datanode counterpart.
Usage Example: Modify Conf/mapred-site.xml, add:
<property>
<name><span style= "font-family:arial, Helvetica, Sans-serif;" >mapred</span><span style= "font-family:arial, Helvetica, Sans-serif;" >.hosts</name></span>
<value>/opt/hadoop/conf/tasktracker-allow.list</value>
</property>
<property>
<name><span style= "font-family:arial, Helvetica, Sans-serif; " >mapred</span>.hosts.exclude</name>
<value>/opt/hadoop/conf/tasktracker-deny.list< /value>
</property>
If you do not need an allow list, do not create a corresponding item. The file specified by value is then created. Write a host name on one line.
2.1 Add
1, configure on the new slave.
2, add the slave to the slave list on master (not necessary, convenient to restart cluster later)
3, (if any) add the slave to the Tasktracker-allow.list
4, start the tasktracker process on slave:
Run: hadoop-daemon.sh start Tasktracker
PS: You can use the JPS command to view the PID and process name of the Java process on the machine.
2.2 Delete
It is not recommended to go directly on the slave by: hadoop-daemon.sh stop Tasktracker command turns off Tasktracker, which causes Namenode to think that these machines are temporarily lost, within a timeout period (default 10min+ 30s) still assume that they are normal and will send tasks to them.
1, modify the tasktracker-deny.list on master, add the corresponding machine 2, refresh the node configuration on master: Hadoop mradmin-refreshnodes
The number of nodes is now visible on the Web UI, and the number of exclude nodes has increased. You can click to see it in detail. 3, close the Tasktracker process on slave (not required): Run: hadoop-daemon.sh stop Tasktracker
2.2.1 Re-joins each deleted node 1, deletes the corresponding machine 2 in master's tasktracker-deny.list, refreshes the node configuration on master: Hadoop mradmin-refreshnodes
3, restart the tasktracker process on slave: hadoop-daemon.sh start Tasktracker
PS: If you have not previously closed the tasktracker process on the slave, you need to shut down and restart.
Originally contained in Http://blog.csdn.net/yanxiangtianji
Reprint please indicate the source