Problem: When you start the Hadoop cluster, there's a nn that never comes up. Review the log and find the error as follows:
2016-05-04 15:12:27,837 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem:Get corrupt file blocks returned Error:operation category READ isn't supported in state standby 2016-05-04 15:12:36,124 INFO org.apache.hadoop.ipc.Server : IPC Server Handler 2 on 8020, call Org.apache.hadoop.hdfs.server.protocol.NamenodeProtocol.rollEditLog from 10.30.12.88:49509 call#185 retry#0:org.apache.hadoop.ipc.standbyexception:operation category JOURNAL is not supported In the state standby 2016-05-04 15:12:52,584 INFO org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer:Triggering Log Roll on remote Namenode lida2/10.30.12.88:8020 2016-05-04 15:12:52,598 WARN Org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer:Unable to trigger a roll of the active NN ORG.APACHE.HADOOP.IPC . RemoteException (org.apache.hadoop.ipc.StandbyException): Operation category JOURNAL is isn't supported in the state standby at Org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation (standbystate.java:87) atOrg.apache.hadoop.hdfs.server.namenode.namenode$namenodehacontext.checkoperation (NameNode.java:1688) at Org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation (fsnamesystem.java:1258) at Org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog (fsnamesystem.java:5765) at Org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog (namenoderpcserver.java:886) at Org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog ( namenodeprotocolserversidetranslatorpb.java:139) at Org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos $NamenodeProtocolService $2.callblockingmethod (namenodeprotocolprotos.java:11214) at Org.apache.hadoop.ipc.protobufrpcengine$server$protobufrpcinvoker.call (protobufrpcengine.java:585) at Org.apache.hadoop.ipc.rpc$server.call (rpc.java:928) at Org.apache.hadoop.ipc.server$handler$1.run (Server.java : 2013) at Org.apache.hadoop.ipc.server$handler$1.run (server.java:2009) at Java.security.AccessController.doPrIvileged (Native method) at Javax.security.auth.Subject.doAs (subject.java:415) at Org.apache.hadoop.security.UserGroupInformation.doAs (usergroupinformation.java:1614) at Org.apache.hadoop.ipc.server$handler.run (server.java:2007) at Org.apache.hadoop.ipc.Client.call (client.java:1411 ) at Org.apache.hadoop.ipc.Client.call (client.java:1364) at Org.apache.hadoop.ipc.protobufrpcengine$invoker.invoke (protobufrpcengine.java:206) at Com.sun.proxy. $Proxy 14.rollEditLog (Unknown Source) at Org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolTranslatorPB.rollEditLog (Namenodeprotocoltranslatorpb.java : 139) at Org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.triggerActiveLogRoll (editlogtailer.java:271) at org.apache.hadoop.hdfs.server.namenode.ha.editlogtailer.access$600 (editlogtailer.java:61) at Org.apache.hadoop.hdfs.server.namenode.ha.editlogtailer$editlogtailerthread.dowork (EditLogTailer.java:313) at Org.apache.hadoop.hdfs.server.namenode.ha.editlogtailer$editlogtailerthread.access$ (editlogtailer.java:282) at org.apache.hadoop.hdfs.server.namenode.ha.editlogtailer$editlogtailerthread$1. Run (editlogtailer.java:299) at Org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal (Securityutil.java
: 411) at Org.apache.hadoop.hdfs.server.namenode.ha.editlogtailer$editlogtailerthread.run (EditLogTailer.java:295) 2016-05-04 15:12:53,632 INFO org.apache.hadoop.ipc.Client:Retrying connect to server:lida4/10.30.12.90:8485. Already tried 0 time (s); Retry policy is Retryuptomaximumcountwithfixedsleep (maxretries=10, sleeptime=1000 milliseconds) 2016-05-04 15:12:53,636 INFO org.apache.hadoop.ipc.Client:Retrying Connect to server:lida3/10.30.12.89:8485. Already tried 0 time (s); Retry policy is Retryuptomaximumcountwithfixedsleep (maxretries=10, sleeptime=1000 milliseconds) 2016-05-04 15:12:54,635 INFO org.apache.hadoop.ipc.Client:Retrying Connect to server:lida4/10.30.12.90:8485. Already tried 1 time (s); Retry policy is retryuptomaximumcountwithfixedsleep (maxretries=10, sleeptime=1000 milliseconds) 2016-05-04 15:12:54,638 INFO org.apache.hadoop.ipc.Client:Retrying Connect to Server:lid a3/10.30.12.89:8485. Already tried 1 time (s); Retry policy is Retryuptomaximumcountwithfixedsleep (maxretries=10, sleeptime=1000 milliseconds) 2016-05-04 15:12:55,636 INFO org.apache.hadoop.ipc.Client:Retrying Connect to server:lida4/10.30.12.90:8485. Already tried 2 time (s); Retry policy is Retryuptomaximumcountwithfixedsleep (maxretries=10, sleeptime=1000 milliseconds) 2016-05-04 15:12:55,639 INFO org.apache.hadoop.ipc.Client:Retrying Connect to server:lida3/10.30.12.89:8485. Already tried 2 time (s); Retry policy is Retryuptomaximumcountwithfixedsleep (maxretries=10, sleeptime=1000 milliseconds) 2016-05-04 15:12:56,638 INFO org.apache.hadoop.ipc.Client:Retrying Connect to server:lida4/10.30.12.90:8485. Already tried 3 time (s); Retry policy is Retryuptomaximumcountwithfixedsleep (maxretries=10, sleeptime=1000 milliseconds) 2016-05-04 15:12:56,640 INFO org.apache.hAdoop.ipc.Client:Retrying Connect to server:lida3/10.30.12.89:8485. Already tried 3 time (s); Retry policy is Retryuptomaximumcountwithfixedsleep (maxretries=10, sleeptime=1000 milliseconds) 2016-05-04 15:12:57,641 INFO org.apache.hadoop.ipc.Client:Retrying Connect to server:lida4/10.30.12.90:8485. Already tried 4 time (s); Retry policy is Retryuptomaximumcountwithfixedsleep (maxretries=10, sleeptime=1000 milliseconds) 2016-05-04 15:12:57,642 INFO org.apache.hadoop.ipc.Client:Retrying Connect to server:lida3/10.30.12.89:8485. Already tried 4 time (s); Retry policy is Retryuptomaximumcountwithfixedsleep (maxretries=10, sleeptime=1000 milliseconds) 2016-05-04 15:12:58,632 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager:Waited 6001 MS (timeout=20000 ms) for a Response for Selectinputstreams. Succeeded so far: [10.30.12.88:8485] 2016-05-04 15:12:58,644 INFO org.apache.hadoop.ipc.Client:Retrying Connect to serve r:lida3/10.30.12.89:8485. Already tried 5 time (s); Retry PolIcy is Retryuptomaximumcountwithfixedsleep (maxretries=10, sleeptime=1000 milliseconds) 2016-05-04 15:12:58,646 INFO Org.apache.hadoop.ipc.Client:Retrying Connect to server:lida4/10.30.12.90:8485. Already tried 5 time (s); Retry policy is Retryuptomaximumcountwithfixedsleep (maxretries=10, sleeptime=1000 milliseconds) 2016-05-04 15:12:59,633 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager:Waited 7003 MS (timeout=20000 ms) for a Response for Selectinputstreams. Succeeded so far: [10.30.12.88:8485] 2016-05-04 15:12:59,647 INFO org.apache.hadoop.ipc.Client:Retrying Connect to serve r:lida3/10.30.12.89:8485. Already tried 6 time (s); Retry policy is Retryuptomaximumcountwithfixedsleep (maxretries=10, sleeptime=1000 milliseconds) 2016-05-04 15:12:59,648 INFO org.apache.hadoop.ipc.Client:Retrying Connect to server:lida4/10.30.12.90:8485. Already tried 6 time (s); Retry policy is Retryuptomaximumcountwithfixedsleep (maxretries=10, sleeptime=1000 milliseconds) 2016-05-04 15:13:00,635 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager:Waited 8005 MS (timeout=20000 ms) for a Response for Selectinputstreams. Succeeded so far: [10.30.12.88:8485] 2016-05-04 15:13:00,652 INFO org.apache.hadoop.ipc.Client:Retrying Connect to serve r:lida3/10.30.12.89:8485. Already tried 7 time (s); Retry policy is Retryuptomaximumcountwithfixedsleep (maxretries=10, sleeptime=1000 milliseconds) 2016-05-04 15:13:00,653 INFO org.apache.hadoop.ipc.Client:Retrying Connect to server:lida4/10.30.12.90:8485. Already tried 7 time (s); Retry policy is Retryuptomaximumcountwithfixedsleep (maxretries=10, sleeptime=1000 milliseconds) 2016-05-04 15:13:01,637 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager:Waited 9007 MS (timeout=20000 ms) for a Response for Selectinputstreams. Succeeded so far: [10.30.12.88:8485] 2016-05-04 15:13:01,655 INFO org.apache.hadoop.ipc.Client:Retrying Connect to serve r:lida4/10.30.12.90:8485. Already tried 8 time (s); Retry policy is REtryuptomaximumcountwithfixedsleep (maxretries=10, sleeptime=1000 milliseconds) 2016-05-04 15:13:01,656 INFO Org.apache.hadoop.ipc.Client:Retrying Connect to server:lida3/10.30.12.89:8485. Already tried 8 time (s); Retry policy is Retryuptomaximumcountwithfixedsleep (maxretries=10, sleeptime=1000 milliseconds) 2016-05-04 15:13:02,638 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager:Waited 10008 MS (timeout=20000 ms) for a Response for Selectinputstreams. Succeeded so far: [10.30.12.88:8485] 2016-05-04 15:13:02,659 INFO org.apache.hadoop.ipc.Client:Retrying Connect to serve r:lida4/10.30.12.90:8485. Already tried 9 time (s); Retry policy is Retryuptomaximumcountwithfixedsleep (maxretries=10, sleeptime=1000 milliseconds) 2016-05-04 15:13:02,659 INFO org.apache.hadoop.ipc.Client:Retrying Connect to server:lida3/10.30.12.89:8485. Already tried 9 time (s); Retry policy is Retryuptomaximumcountwithfixedsleep (maxretries=10, sleeptime=1000 milliseconds) 2016-05-04 15:13:02,663WARN org.apache.hadoop.hdfs.server.namenode.FSEditLog:Unable to determine input streams from QJM to [10.30.12.88:8485, 1 0.30.12.89:8485, 10.30.12.90:8485].
Skipping. Org.apache.hadoop.hdfs.qjournal.client.QuorumException:Got too many exceptions to achieve quorum size 2/3. 1 successful responses:10.30.12.88:8485: [[7162,7163], [7164,7165], [7166,7167], [7168,7169], [7170,7171], [7172,7173] , [7174,7175], [7176,7177], [7178,7179], [7180,7181], [7182,7183], [7184,7185], [7186,7187], [7188,7189], [7190,7191]] 2 Exceptions Thrown:10.30.12.90:8485:no Route to Host from lida1/10.30.12.87 to lida4:8485 failed on socket timeout EXCEP Tion:java.net.NoRouteToHostException:No route to host; For more details See:http://wiki.apache.org/hadoop/noroutetohost 10.30.12.89:8485:no Route to Host from lida1/10.30.12 Failed to lida3:8485 the socket timeout exception:java.net.NoRouteToHostException:No route to host; For more details See:http://wiki.apache.org/hadoop/noroutetohost at Org.apacHe.hadoop.hdfs.qjournal.client.QuorumException.create (quorumexception.java:81) at Org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException (quorumcall.java:223) at Org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum (asyncloggerset.java:142) at
Org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectInputStreams (quorumjournalmanager.java:471) At Org.apache.hadoop.hdfs.server.namenode.JournalSet.selectInputStreams (journalset.java:260) at Org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams (fseditlog.java:1430) at Org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams (fseditlog.java:1450) at Org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits (editlogtailer.java:212) at Org.apache.hadoop.hdfs.server.namenode.ha.editlogtailer$editlogtailerthread.dowork (EditLogTailer.java:324) at
org.apache.hadoop.hdfs.server.namenode.ha.editlogtailer$editlogtailerthread.access$200 (EditLogTailer.java:282) At Org.apaChe.hadoop.hdfs.server.namenode.ha.editlogtailer$editlogtailerthread$1.run (editlogtailer.java:299) at Org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal (securityutil.java:411) at Org.apache.hadoop.hdfs.server.namenode.ha.editlogtailer$editlogtailerthread.run (EditLogTailer.java:295)
Cause: Problem 1 is the inducement of question 2, that is, two nn are standby states
Solution:
1, restart the server, restart the Hadoop cluster, let one of the NN status of active.
2. Manually switch the NN state of a server to active. Commands are as follows: HDFs haadmin-transitiontoactive nn2. The order means to change the state of the nn2 to active.