Hadoop series First Pit: HDFs journalnode Sync Status

Source: Internet
Author: User

Come up this morning. The company found that Cloudera manager had an HDFS warning, such as:

The solution is: 1, the first to solve the simple problem, check the warning set threshold of how much, so you can quickly locate the problem where, sure enough journalnode sync status hint first eliminate, 2, and then solve the sync status problem, first find the explanation of the prompt , visible on the official web. Then check the configuration parameters there is no problem, look at the log, sure enough to see the error message in the log; 3, the final can be located to the prompt is because the synchronization files between the Journalnode node is not consistent, then the use of repair (elegant) or copy (not elegant) way can be resolved; 4, In response to a problem, there are many ways to solve the problem, some "elegant approach", some "not elegant approach", unfortunately I used the "not elegant approach" to solve the issue. If a friend knows how to initialize the Dfs.journalnode.edits.dir directory under Journalnode (to synchronize the namespace metadata on a namenode to the Journalnode node), you can tell me oh, thanks! Here's the whole process of solving the problem:
The first reaction is definitely a bit of a problem with Journalnode in synchronizing the namespace's mirror and edit log, and it is clear that there is a problem with Journalnode synchronization on both 108 and 1092 nodes. When I configured the Journalnode of five nodes here, there are now 2 journalnode problems, which should theoretically be warning, not critical. When prompted by "Sync status" or "Journalnode sync status", you can find explanations on the Cloudera website, such as: About Namenode's health Check the prompt in the following link is to find the relevant instructions: English address (recommended): http://www.cloudera.com/content/cloudera/en/documentation/core/latest/ Topics/cm_ht_namenode.html Chinese address (with few missing): Http://www.cloudera.com/content/cloudera/zh-CN/documentation/core/v5-3-x  /topics/cm_ht_namenode.html we can find a description of the short name Journalnode Sync status as follows: Then, in cm, the settings for the relevant parameters are found: HDFS---> Configuration---> View and Edit---> Monitoring, Write "Namenode_out_of_sync_journal_nodes_thresholds" in the search form, You can see that the configuration is now configured to never prompt warning, as long as the Journalnode sync appears with the problem prompt critical, we set the following:now the Journalnode sync status warning is gone, but the sync status warning is still there. Just processed is the warning prompt that modifies the Namenode health check. Here's how to handle the sync status warning: Again, we found a description of it http://www.cloudera.com/content/cloudera/en/documentation/core/latest/ Topics/cm_ht_journalnode.html by looking at this, and does not solve the service itself problem description, synchronization time set to 180 seconds, view network IO, and there is no high load situation. The following is the only log that looks at the service:
IPC Server Handler3On8485, Call Org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocol.getEditLogManifest from 172.31.13.29:56789: error:org.apache.hadoop.hdfs.qjournal.protocol.JournalNotFormattedException:Journal Storage Directory/ hadop-cdh-data/jddfs/nn/JOURNALHDFS1 not formattedorg.apache.hadoop.hdfs.qjournal.protocol.JournalNotFormattedException:Journal Storage Directory/hadop-cdh-data/jddfs/nn/JOURNALHDFS1 not formatted at org.apache.hadoop.hdfs.qjournal.server.Journal.checkFormatted (Journal.java:451) at Org.apache.hadoop.hdfs.qjournal.server.Journal.getEditLogManifest (Journal.java:634) at Org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.getEditLogManifest ( Journalnoderpcserver.java:178) at Org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.getEditLogManifest ( Qjournalprotocolserversidetranslatorpb.java:196) at org.apache.hadoop.hdfs.qjournal.protocol.qjournalprotocolprotos$qjournalprotocolservice$2. Callblockingmethod (Qjournalprotocolprotos.java:14094) at Org.apache.hadoop.ipc.protobufrpcengine$server$protobufrpcinvoker.call (Protobufrpcengine.java:453) at Org.apache.hadoop.ipc.rpc$server.call (Rpc.java:1002) at org.apache.hadoop.ipc.server$handler$1. Run (Server.java:1760) at org.apache.hadoop.ipc.server$handler$1. Run (Server.java:1756At java.security.AccessController.doPrivileged (Native Method) at Javax.security.auth.Subject.doAs (Subject. Java:396) at Org.apache.hadoop.security.UserGroupInformation.doAs (Usergroupinformation.java:1438) at Org.apache.hadoop.ipc.server$handler.run (Server.java:1754)
IPC Server Handler4On8485, Call Org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocol.heartbeat from 172.31.9.109:53202: error:java.io.FileNotFoundException:/hadop-cdh-data/jddfs/nn/journalhdfs1/current/last-promised-epoch.tmp (No such file or directory) Java.io.FileNotFoundException:/hadop-cdh-data/jddfs/nn/journalhdfs1/current/last-promised-epoch.tmp (No such file or directory) at Java.io.FileOutputStream.open (Native Method) at Java.io.FileOutputStream .<init> (Fileoutputstream.java:194) at Java.io.FileOutputStream.<init> (Fileoutputstream.java:145) at Org.apache.hadoop.hdfs.util.AtomicFileOutputStream.<init> (Atomicfileoutputstream.java: About) at Org.apache.hadoop.hdfs.util.PersistentLongFile.writeFile (Persistentlongfile.java: the) at Org.apache.hadoop.hdfs.util.PersistentLongFile.Set(Persistentlongfile.java: A) at Org.apache.hadoop.hdfs.qjournal.server.Journal.updateLastPromisedEpoch (Journal.java:311) at Org.apache.hadoop.hdfs.qjournal.server.Journal.checkRequest (Journal.java:414) at Org.apache.hadoop.hdfs.qjournal.server.Journal.heartbeat (Journal.java:397) at Org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.heartbeat (Journalnoderpcserver.java:148) at Org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.heartbeat ( Qjournalprotocolserversidetranslatorpb.java:146) at org.apache.hadoop.hdfs.qjournal.protocol.qjournalprotocolprotos$qjournalprotocolservice$2. Callblockingmethod (Qjournalprotocolprotos.java:14086) at Org.apache.hadoop.ipc.protobufrpcengine$server$protobufrpcinvoker.call (Protobufrpcengine.java:453) at Org.apache.hadoop.ipc.rpc$server.call (Rpc.java:1002) at org.apache.hadoop.ipc.server$handler$1. Run (Server.java:1760) at org.apache.hadoop.ipc.server$handler$1. Run (Server.java:1756At java.security.AccessController.doPrivileged (Native Method) at Javax.security.auth.Subject.doAs (Subject. Java:396) at Org.apache.hadoop.security.UserGroupInformation.doAs (Usergroupinformation.java:1438) at Org.apache.hadoop.ipc.server$handler.run (Server.java:1754)
At this point, you can see the directory that holds the synchronization files/hadop-cdh-data/jddfs/nn/journalhdfs1 not found, SSH remote connection to the node to see that there is no such directory. Here, basically can be fixed to the problem, there are 2 ways to solve: one is to initialize the directory through the relevant command (I think this method is the correct way to solve the problem), and the second is to directly copy the normal Journalnode files over. I use method Two, here is a point to note that the time to copy is not too long, should not exceed the value set by the Journalnode_sync_status_startup_tolerance (personal understanding), Because the first time the zip package is uploaded to the other nodes after the use of time-out, the second use of SCP Direct copy is possible, the command is as follows: Rm-rf JOURNALHDFS1
Scp-r-i/root/xxxxxxx.pem [email PROTECTED]:/HADOP-CDH-DATA/JDDFS/NN/JOURNALHDFS1./
Chown-r Hdfs:hdfs JOURNALHDFS1 Note: When the namespace metadata is too large, you need to be aware of the settings of the DFS.IMAGE.TRANSFER.BANDWIDTHPERSEC parameter, which is the bandwidth limit when synchronizing data. Then restart the two Journalnode (can also be closed first, the file is copied well after the start) node, problem resolution. In addition, I built a QQ group: 305994766, I hope the big data, algorithm research and development, system architecture interested friends can join in, we learn together, common progress (into the group please explain their company-occupation-nickname)

Hadoop series First Pit: HDFs journalnode Sync Status

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.