Problem Description:
When the node is changed in cluster mode, startup cluster Discovery Datanode has not been started up.
I cluster configuration: There are 5 nodes, respectively, the master slave1-5.
In master with Hadoop user execution: start-all.sh
JPS to view the master node boot situation:
NameNode
Jobtracker
Secondarynamenode
Have been started normally, using master:50070, Live Nodes is 0, with access to SLAVE1:
SSH slave1, input command JPS, found only Tasktracker and not datanode. Then read the log
The Internet to find solutions, finally solved, the solution is as follows:
1. Perform stop-all.sh to suspend all services first
2. Remove the TMP on all salve nodes (that is, the Dfs.data.dir folder specified in Hdfs-site.xml, datanode the location where the data block resides), delete the Logs folder, and then re-establish the TMP, logs folder
3. Delete the core-site.xml under/usr/hadoop/conf on all Salve nodes and copy the Core-site.xml files from the master node to each salve node
Scp/usr/hadoop/conf/core-site.xml hadoop@slave1:/usr/hadoop/conf/
4. Reformatting: Hadoop Namenode-format
5. Start: start-all.sh
In addition may also meet Slave's Datanode error:
Error 1,hadoop Datanode problem INFO org.apache.hadoop.ipc.RPC:Server at/:9000 not available yet, zzzzz.
Workaround See: Http://blog.sina.com.cn/s/blog_893ee27f0100zoh7.html,
Error 2,slave node Datanode cannot connect to master, the log information is: info org.apache. Ipc. Client:retrying Connect to server:master/172.16.0.100:9000. Already tried 0 time (s);
Workaround:
1, Ping master can pass, Telnet Master 9000 can not pass, indicating that the firewall is turned on
2, shut down the master host firewall, you can clear all rules by/sbin/iptables-f to temporarily stop the firewall
If you want to clear, execute/sbin/iptables-p INPUT ACCEPT First, then execute/sbin/iptables-f
Note: This is the situation I encountered, not necessarily the problem you encountered, basically from the following aspects to solve the problem:
1. Check that each XML file is configured correctly
2. The Java environment variable is configured correctly
3. Does SSH have no password interoperability
4, Hadoop out of security mode, Hadoop dfsadmin-safemode leave.
You can also refer to this: http://blog.sina.com.cn/s/blog_76fbd24d01017qmc.html