When testing the namenode ha of hdfs2.0, concurrently put a file of MB and kill the master NN. After the slave NN is switched, the process exits.
2014-09-03 11:34:27,221 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: recoverUnfinalizedSegments failed for required journal (JournalAndStream(mgr=QJM to [10.136.149.96:8485, 10.136.149.97:8485, 10.136.149.99:8485], stream=null))org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many exceptions to achieve quorum size 2/3. 1 successful responses:10.136.149.99:8485: null [success]2 exceptions thrown:10.136.149.97:8485: org/apache/hadoop/io/MD5Hash
Then restart NN. Both of them failed,
It is suspected that there is a problem with JN, and garbage may be generated. bin/HDFS namenode-initializesharededits starts NN, or fails.
Synchronize the current metadata directories of the two NN instances, restart all JN instances, and then the startup still fails;
It is suspected that NN has a problem with metadata merging. Delete the edits file starting with the error and modify the txid number in seen_txid;
The NN is started successfully, and the active/standby NN is started successfully.
The specific cause is still being located, but at least the environment has been recovered and the recent edits has been abandoned.
Restoration After hdfs2.0 namenode ha switchover failure (Bad metadata writing)