Ha mode forced manual switchover: IPC ' s epoch [X] is less than the last promised epoch [x+1]

Source: Internet
Author: User
Tags zookeeper

016- One- -  +: -: -,637WARN Org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager:Remote Journal192.168.58.183:8485Failed to write Txns54560-54560. WouldTryTo write to ThisJN again after the next log roll. At Org.apache.hadoop.hdfs.qjournal.server.Journal.checkRequest (Journal.java:414) at Org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest (Journal.java:442) at Org.apache.hadoop.hdfs.qjournal.server.Journal.journal (Journal.java:342) at Org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal (Journalnoderpcserver.java:148) at Org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal ( Qjournalprotocolserversidetranslatorpb.java:158) at org.apache.hadoop.hdfs.qjournal.protocol.qjournalprotocolprotos$qjournalprotocolservice$2. Callblockingmethod (Qjournalprotocolprotos.java:25421) at Org.apache.hadoop.ipc.protobufrpcengine$server$protobufrpcinvoker.call (Protobufrpcengine.java:619) at Org.apache.hadoop.ipc.rpc$server.call (Rpc.java:975) at org.apache.hadoop.ipc.server$handler$1. Run (Server.java:2040) at org.apache.hadoop.ipc.server$handler$1. Run (Server.java:2036) at java.security.AccessController.doPrivileged (Native Method) at Javax.security.auth.Subject.doAs (Subject.jav A:422) at Org.apache.hadoop.security.UserGroupInformation.doAs (Usergroupinformation.java:1656) at Org.apache.hadoop.ipc.server$handler.run (Server.java:2034) at Org.apache.hadoop.ipc.Client.call (Client.java:1469) at Org.apache.hadoop.ipc.Client.call (Client.java:1400) at Org.apache.hadoop.ipc.protobufrpcengine$invoker.invoke (Protobufrpcengine.java:232) at Com.sun.proxy. $Proxy 10.journal (Unknown Source) at Org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProto Coltranslatorpb.journal (Qjournalprotocoltranslatorpb.java:167) at org.apache.hadoop.hdfs.qjournal.client.ipcloggerchannel$7. Call (Ipcloggerchannel.java:385) at org.apache.hadoop.hdfs.qjournal.client.ipcloggerchannel$7. Call (Ipcloggerchannel.java:378) at Java.util.concurrent.FutureTask.run (Futuretask.java:266) at Java.util.concurrent.ThreadPoolExecutor.runWorker (Threadpoolexecutor.java:1142) at Java.util.concurrent.threadpoolexecutor$worker.run (Threadpoolexecutor.java:617) at Java.lang.Thread.run (Thread.java:745) .- One- -  +: -: -, -WARN Org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager:Remote Journal192.168.58.181:8485Failed to write Txns54560-54560. WouldTryTo write to ThisJN again after the next log roll.org.apache.hadoop.ipc.RemoteException (java.io.IOException): IPC'S-epoch-is-less than the last promised epochAt Org.apache.hadoop.hdfs.qjournal.server.Journal.checkRequest (Journal.java:414) at Org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest (Journal.java:442) at Org.apache.hadoop.hdfs.qjournal.server.Journal.journal (Journal.java:342) at Org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal (Journalnoderpcserver.java:148) at Org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal ( Qjournalprotocolserversidetranslatorpb.java:158) at org.apache.hadoop.hdfs.qjournal.protocol.qjournalprotocolprotos$qjournalprotocolservice$2. Callblockingmethod (Qjournalprotocolprotos.java:25421) at Org.apache.hadoop.ipc.protobufrpcengine$server$protobufrpcinvoker.call (Protobufrpcengine.java:619) at Org.apache.hadoop.ipc.rpc$server.call (Rpc.java:975) at org.apache.hadoop.ipc.server$handler$1. Run (Server.java:2040) at org.apache.hadoop.ipc.server$handler$1. Run (Server.java:2036) at java.security.AccessController.doPrivileged (Native Method) at Javax.security.auth.Subject.doAs (Subject.ja VA:422) at Org.apache.hadoop.security.UserGroupInformation.doAs (Usergroupinformation.java:1656) at Org.apache.hadoop.ipc.server$handler.run (Server.java:2034) at Org.apache.hadoop.ipc.Client.call (Client.java:1469) at Org.apache.hadoop.ipc.Client.call (Client.java:1400) at Org.apache.hadoop.ipc.protobufrpcengine$invoker.invoke (Protobufrpcengine.java:232) at Com.sun.proxy. $Proxy 10.journal (Unknown Source) at Org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProto Coltranslatorpb.journal (Qjournalprotocoltranslatorpb.java:167) at org.apache.hadoop.hdfs.qjournal.client.ipcloggerchannel$7. Call (Ipcloggerchannel.java:385) at org.apache.hadoop.hdfs.qjournal.client.ipcloggerchannel$7. Call (Ipcloggerchannel.java:378) at Java.util.concurrent.FutureTask.run (Futuretask.java:266) at Java.util.concurrent.ThreadPoolExecutor.runWorker (Threadpoolexecutor.java:1142) at Java.util.concurrent.threadpoolexecutor$worker.run (Threadpoolexecutor.java:617) at Java.lang.Thread.run (Thread.java:745) .- One- -  +: -: -,812WARN Org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager:Remote Journal192.168.58.182:8485Failed to write Txns54560-54560. WouldTryTo write to ThisJN again after the next log roll.org.apache.hadoop.ipc.RemoteException (java.io.IOException): IPC'S-epoch-is-less than the last promised epochAt Org.apache.hadoop.hdfs.qjournal.server.Journal.checkRequest (Journal.java:414) at Org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest (Journal.java:442)
I. Causes of Errors

Active Namenode Log Exception IPC's Epoch [X] is less than the last promised epoch [x+1], appearing short-term dual active

I configured the HA automatic switch, but found that Standbynamenode is active, I forced to manually switch three times, Standbynamenode will not be able to access the problem is estimated.

Two. Internal causes

"HDFs mechanism": This problem belongs to the abnormal protection of the brain column in HDFs, which belongs to normal behavior and does not affect the business.

1) ZKFC1 to NameNode1 (active) Health check, because long time monitoring not NN1 reply, think that NameNode1 unhealthy, active release ZK Activestandbyelectorlock, At this point the NN1 is still active (because ZKFC is not connected to the NAMENODE1, it cannot be shutdown).

  

ZKFC log: the- .- -  Geneva: One: Geneva,720WARN Org.apache.hadoop.ha.healthmonitor:transport-level exception trying to monitor health of NameNode at namenode01/172.21.248.14:9005: Call from namenode01/1    72.21.248.14To NAMENODE02:9005Failed on socket timeout exception:java.net.SocketTimeoutException:45000Millis Timeout whileWaiting forChannel to IS ready forread. ch:java.nio.channels.socketchannel[connected local=/172.21.248.14:47271remote=namenode01/172.21.248.14:9005]; For more details See:http://Wiki.apache.org/hadoop/sockettimeout the- .- -  Geneva: A: A,825WARN org.apache.hadoop.ha.FailoverController:Unable to gracefully make NameNode at namenode02/172.21.248.13:9005Standby (Unable to connect) Java.net.SocketTimeoutException:Call from Namenode01/172.21.248.14To NAMENODE02:9005Failed on socket timeout exception:java.net.SocketTimeoutException: theMillis Timeout whileWaiting forChannel to IS ready forRead. Ch:java.nio.channels.socketchannel[connected local=/172.21.248.14:59156remote=namenode02/172.21.248.13:9005]; For more details See:http://Wiki.apache.org/hadoop/sockettimeout

2) ZKFC2 competes in ZK to Activestandbyelectorlock, turns NameNode2 (original standby) into active, and updates the epoch in JN to make it +1.

3) NameNode1 (formerly Active) once again to operate Journalnode's editlog found that his epoch was 1 smaller than the epoch of JN, prompting himself to restart and become standby NameNode.

NN1 log: the- ,- -  A: -: -,017FATAL Org.apache.hadoop.hdfs.server.namenode.FSEditLog:Error:flush failed forRequired Journal (Journalandstream (Mgr=qjm to [10.1.1.107:8485,192.10.1.208:8485,192.10.1.209:8485], Stream=quorumoutputstream starting at Txid22795230)) Org.apache.hadoop.hdfs.qjournal.client.QuorumException:Got too many exceptions to achieve quorum size2/3.3Exceptions thrown:192.10.1.208:8485: IPC's Epoch is less than the last promised epoch 
Three. Solution

You can modify the ha.health-monitor.rpc-timeout.ms parameter value in the Core-site.xml file to expand the ZKFC monitoring check timeout.

<property><name>ha.health-monitor.rpc-timeout.ms</name><value>180000</value>< /property>

  

Iv. concluding remarks

Finally set to manually switch it ... Actually can through zookeeper to find that is active, I don't do it first. In Hdfs-site.xml.

But set to not automatically switch, ZKFC on the way to start, hbase must use their own zookeeper.

Ha mode forced manual switchover: IPC ' s epoch [X] is less than the last promised epoch [x+1]

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.