2018-02-24
February 22, Discover the Namenode (standby) node of the NAMENODE02 server hangs up and view the Hadoop log/app/hadoop/logs/ Hadoop-appadm-namenode-prd-bldb-hdp-name02.log
Found 2018-02-17 03:29:34, the first reported java.lang.OutOfMemoryError error, the specific error message is as follows
2018-02-17 03:29:34 , 485 ERROR org.apache.hadoop.hdfs.server.namenode.EditLogInputStream:caught exception initializing http:// datanode01:8480/getjournal?jid=cluster1&segmenttxid=2187844&storageinfo=-63%3a1002064722% 3a1516782893469%3acid-02428012-28ec-4c03-b5ba-bfec77c3a32bjava.lang.OutOfMemoryError: Unable to create new native thread at java.lang.Thread.start0 (native Method) at Java.lang.Thread.start ( Thread.java:714)
After the 2018-02-17 03:34:34,shutting down standby NN
2018-02-17 03:34:34,495 FATALOrg.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer:Unknown Error encountered whiletailing edits. shutting down standby NN. java.lang.OutOfMemoryError: Unable to create new native thread at java.lang.Thread.start0 (native Method) At Java.lang.Thread.start (Thread.java:714) at Java.util.concurrent.ThreadPoolExecutor.addWorker (Threadpoolexecutor.java:949) at Java.util.concurrent.ThreadPoolExecutor.execute (Threadpoolexecutor.java:1371) at Com.google.common.util.concurrent.moreexecutors$listeningdecorator.execute (Moreexecutors.java: the) at Com.google.common.util.concurrent.AbstractListeningExecutorService.submit ( Abstractlisteningexecutorservice.java: About) at Org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel.getEditLogManifest (Ipcloggerchannel.java:553) at Org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.getEditLogManifest (Asyncloggerset.java: the) at Org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectInputStreams ( Quorumjournalmanager.java:474) at Org.apache.hadoop.hdfs.server.namenode.JournalSet.selectInputStreams (Journalset.java:278) at Org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams (Fseditlog.java:1590) at Org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams (Fseditlog.java:1614) at Org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits (Editlogtailer.java:216) at Org.apache.hadoop.hdfs.server.namenode.ha.editlogtailer$editlogtailerthread.dowork (EditLogTailer.java:< /c1>342) at org.apache.hadoop.hdfs.server.namenode.ha.editlogtailer$editlogtailerthread.access$ $(Editlogtailer.java:295) at org.apache.hadoop.hdfs.server.namenode.ha.editlogtailer$editlogtailerthread$1. Run (Editlogtailer.java:312) at Org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal (Securityutil.java:455) at Org.apache.hadoop.hdfs.server.namenode.ha.editlogtailer$editlogtailerthread.run (EditLogTailer.java: 308)2018- Geneva- - Geneva: the: the, -INFO org.apache.hadoop.util.ExitUtil:Exiting with status1
The same day check the system memory usage, found that there is indeed insufficient memory, so manually freed the memory, and restarted the NAMENODE02 node.
free-syncecho3 >/proc/sys/vm/drop_cachesEcho 1 >/proc/sys/vm/drop_caches# is executed on NAMENODE02 node,su - appadmhadoop-daemon. sh start Namenode
To verify that the release is due to the Namenode node remaining out of memory, and the resulting namenode (standby) hangs, the developer adjusts the frequency of the MapReduce job run. In order to simulate the long-running status as soon as possible, pumping a 1-day run once the job changed to 5 minutes to run once.
After running the 2-day job, see the Namenode host's historical memory usage trend graph from the Cloud Platform CAs Monitor as follows
Number 22nd 17:00~18:00 increased the job run frequency, 22nd before 18:20, memory utilization was maintained at around 40%, 18:20~19:10, linear growth to 70%, and maintained at this level, until 23rd 14:42, followed by slow growth, breaking the 80% threshold. At 24th 0:00~1:00 This time period, reached 90% peak.
According to StackOverflow, a description of the problem, the physical memory of the person who raised the question is 12G, which is recommended to set the-XMX value to 3/4 of the physical memory. The Namenode physical memory of our production environment is 8g,datanode physical memory of 125G
Https://stackoverflow.com/questions/9703436/hadoop-heap-space-and-gc-problems
18:43 2018-2-24 Factoring Hadoop production cluster change
executed separately on 5 servers, the following command
vim/app/hadoop/etc/hadoop/hadoop-env.sh
Add the following parameters
Export hadoop_opts= "-xx:+useparallelgc-xmx4g"
In order to facilitate future operation, special record the cluster restart operation procedure
Factoring hadoop/hive/hbase/Zookeeper Cluster Restart operation step ####################################################################1, close hive# in Namenode01, close hiveserver2lsof-I.:9999|grep-V"ID"|awk '{print "kill-9", $ $}'|SH############2, close hbase# in Namenode01, close Hbasestop-hbase.SH############3, close hadoop# in Namnode01, close Hadoopstop-all.SH############4, close zookeeper# executes zkserver on 3 datanode nodes.SHStopzkserver.SHstatus####################################################### #手动释放Linux系统内存SyncEcho 3>/proc/sys/vm/drop_cachesEcho 1>/proc/sys/vm/drop_caches####################################################################5, start zookeeper# execute zkserver on 3 datanode nodes.SHStartzkserver.SHstatus############6, start hadoop# execute start in namenode01-all.SH# Execute in namenode02, restart Namenodehadoop-daemon.SHStop Namenodehadoop-daemon.SHstart namenode# HDFS Namenode01:9000(Active) WEB uihttp://172.31.132.71:50070/# HDFS NAMENODE02:9000(Standby) WEB uihttp://172.31.132.72:50070/# YARN WEB uihttp://172.31.132.71:8088/############7, start hbase# on namenode01 and NAMENODE02 nodes, execute start separately-hbase.SH# Master WEB uihttp://172.31.132.71:60010/# Backup Master WEB uihttp://172.31.132.72:60010/# regionserver WEB uihttp://172.31.132.73:60030/http//172.31.132.74:60030/http//172.31.132.75:60030/############8, start hive# in namenode01, start hiveserver2hive--service Hiveserver2 &# in Datanode01, start metastorehive--service Metastore &# in Namenode01, launch Hwi (web interface) Hive--service Hwi &# HWI WEB uihttp://172.31.132.71:9999/hwi########################################################
Hadoop cluster Namenode (standby), exception hangs problem