The pits that were encountered during the Hadoop development process

Source: Internet
Author: User

Core content:
1, the Hadoop development process common problem is the solution

In the process of Hadoop development, we are always faced with a variety of problems, today to summarize:

All-in-one solution: 6 checks + specific logs

In the process of Hadoop development, if you encounter various exceptions, first use the JPS command to see if the node starts properly, and then go to view the relevant log files, but before you view the related logs, you can check the following points:
1. Firewall reason: Check whether the firewall of each node is successfully shut down. (emphasis is on checking namenode)

[root@hadoop11 ~]# service iptables statusiptablesisnot running.

2. Check if the mapping between IP address and host name is successfully bound

[[Email protected] ~]# more/etc/hosts127.0. 0. 1localhost localhost. LocaldomainLocalhost4 Localhost4. Localdomain410.187.. theHadoop1110.187.. WuyiHadoop2210.187.. theHadoop3310.187..Hadoop4410.187..Hadoop5510.187.. theHadoop66

3. Check if Namenode is in safe mode

[[email protected] ~]# hadoop dfsadmin -safemode getofscripttoistheforitis OFF

4, check whether the Namenode has been formatted processing

<property>    <name>hadoop.tmp.dir</name>    <value>/usr/local/hadoop/tmp</value></property>

5. Check that configuration file configuration is successful
6. Check if the version number of the Namespaceid stored in the Namenode node and the Datanode node is the same

[[email protected] current]# pwd/usr/local/hadoop/tmp/dfs/name/current[[email protected] [current]# more VERSION#Wed Nov 21:27:01 CSTNamespaceid=1890187682Clusterid=cid-6a82f5f4-a705-4a20-bfda-ee5e9a69c3deCTime=0Storagetype=name_nodeBlockpoolid=bp-574118934-10.187.84.50-1478093221696layoutversion=-56[[email protected] [current]# pwd/usr/local/hadoop/tmp/dfs/data/current/bp-574118934-10.187.84.50-1478093221696/current[[email Protected] Current ]# more VERSION#Sun 09:37:03 CSTNamespaceid=1890187682CTime=0Blockpoolid=bp-574118934-10.187.84.50-1478093221696layoutversion=-55

OK, when we look at the above 6 points, if we have not resolved the problem, then we can check the relevant log files.
OK, so far I'm going to introduce you to some of the unusual problems that are often encountered during the development process:

1. Possible reasons for not namenode when starting Hadoop

This problem is often encountered for beginners in Hadoop, and there are 3 possible reasons why this problem occurs:
1, Namenode no format processing (6 check to include)
Delete the corresponding directory for Hadoop.tmp.dir (that is, logs and TMP), and then format the Namenode

<property>    <name>hadoop.tmp.dir</name>    <value>/usr/local/hadoop/tmp</value></property></configuration>

2. Check if the mapping between IP address and hostname is successfully bound (6 checks to include)
3. Check that the configuration file configuration is successful (6 checks to include), with emphasis on hadoop-env.sh, Core-site.xml, Hdfs-site.xml, Mapred-site.xml, and slaves.

2. Name node is in safe mode.

For example:

Cause: Namenode in the beginning of the start will enter the security mode, countdown 30s after the exit, in Safe mode will not be able to increase, delete, change operation, only to view operations. However, if the data node Datanode the missing block block to a certain percentage, then the system has been in safe mode, that is, read-only state.
Workaround:
1, in the configuration file hdfs-site.xml of HDFs, modify the value of dfs.safemode.threshold.pct, change its value to a smaller value, the default value is 0.999f.

< Property> <name>dfs.safemode.threshold.pct</name> <value>0.999F</value> <description> Specifies thePercentage ofBlocks thatshould satisfy theMinimal replication requirement defined byDfs.replication.min. ValuesLess than or equal  to 0Mean not  toWait forAny particular percentage ofBlocksbeforeExiting SafeMode. ValuesGreater than 1Would make Safe mode permanent. </description> </ Property>

2, execute command hadoop dfsadmin-safemode leave force Namenode leave safe mode. (6 checks to include)

[[email protected] hadoop]# hadoop dfsadmin -safemode leaveofscripttoistheforitis OFF
3, could only is replicatied to 0 nodes, instead of 1.

For example:


This exception may occur: Execute command JPS display of the process is normal, but with the Web interface, the display of live nodes is 0, indicating that the data node Datanode does not start properly, But the data node Datanode and started normally.
This problem may occur because:
1, the firewall causes the firewall to check that all the nodes are closed successfully. (6 checks to include)
2, disk space reason: Execute command df-al View disk space usage, or adjust disk space if disk space is low.

If you are running out of disk space, take the following steps to view :

[[email protected] local]# CD/[[email protected]/]# lsbin Dev Home lib64 Media mnt opt root selinux sys usrboot etc lib lost+found misc net proc sbin SRV tmp VAR[[EMAIL&N Bsp;protected]/]# du-sh * (This command is important) 7.6M bin27m boot264k dev36m etc5.4g home142m lib26m lib6416 K lost+found4.0k media0 misc4.0k mnt0 net8.0k optdu:cannot access  ' Proc/14788/task/14788/fd/4 ' : No such file or directorydu:cannot access  ' proc/ 14788/task/14788/fdinfo/4 ' : No such file or directorydu:cannot access  proc/ 14788/fd/4 ' : No such file or directorydu:cannot access  ' PROC/14788/FDINFO/4 ' : No Such file or directory0 proc2.6g root15m sbin0 selinux4.0k srv0 sys252k tmp31g usr256m var 

3, if the above methods are not possible, the following methods can be processed (but this method will result in the loss of data, so use caution!) )
Delete the corresponding directory for Hadoop.tmp.dir, and then reformat the Namenode. (6 checks to include)

4, start the Times wrong java.net.UnknownHostException

Cause: The host name in the cluster is not mapped to the appropriate IP address (6 checks to include)
Workaround: Add a mapping of the host name and IP address of all nodes in the/etc/hosts file.

[[Email protected] ~]# more/etc/hosts127.0. 0. 1localhost localhost. LocaldomainLocalhost4 Localhost4. Localdomain410.187.. theHadoop1110.187.. WuyiHadoop2210.187.. theHadoop3310.187..Hadoop4410.187..Hadoop5510.187.. theHadoop66
5, the Tasktracker process started, but the Datanode process did not start

Workaround: First delete the folder corresponding to Hadoop.tmp.dir, and then reformat the Namenode.

<property>  <name>hadoop.tmp.dir</name>  <value>/tmp/hadoop-${user.name}</value>  <description>A base for other temporary directories.</description></property>
6, Java.lang.OutOfMemoryError

Cause Analysis: This exception is obviously the reason why the JVM is out of memory, to modify the JVM memory size for all data node Datanode.
Method: In the MapReduce configuration file Mapred-site.xml, modify the value corresponding to the mapred.child.java.opts.

<property>  <name>mapred.child.java.opts</name>  <value>-Xmx200m</value></property>

Note: The maximum memory usage of the general JVM should be half the total memory size, for example, our server's memory size is 4G, then set to 2048m, but this value may still not be the optimal number. which
-XMS represents the size of the memory initialization,-XMX represents the maximum memory that can be used.
View hardware information for server memory under Linux:

[root@hadoop11 ~]# cat /proc/meminfo |grep MemTotalMemTotal:        3871080 kB
7, incompatible namespaceids in

Cause Analysis: Each time the Namenode is formatted, a new Namespaceid is generated, and if you format the Namenode multiple times, the version number stored in the Namenode node and the Datanode node may be inconsistent.
Workaround:
1. Check that the version number of the Namespaceid is the same in the Namenode node and the Datanode node, and if it is not the same, modify it to the same value and then restart the node. (6 checks to include)
2, first delete the corresponding directory Hadoop.tmp.dir, and then re-format the Namenode processing. (6 checks to include)
These are some of the problems I have encountered in the development process, and I hope to help you.

The pits that were encountered during the Hadoop development process

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.