Hadoop Common Errors and Solutions

Last Update:2014-05-08 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1: Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out

Answer:
In the program, you need to open multiple files for analysis. The default number of files in the system is 1024. (You can see it using ulimit-a.) It is enough for normal use, but for the program, that's too little.
Modification method:
Modify two files.
/Etc/security/limits. conf
Vi/etc/security/limits. conf
Add:
* Soft nofile 102400
* Hard nofile 409600

$ Cd/etc/pam. d/
$ Sudo vi login
Add session required/lib/security/pam_limits.so

For the first question, I corrected the following:
This is because the number of failed outputs of the completed map obtained by shuffle in the reduce preprocessing stage exceeds the upper limit, which is 5 by default. There may be many ways to cause this problem, such as abnormal network connection, connection timeout, poor bandwidth, and port blocking... This error is usually not returned if the network condition in the framework is good.

2: Too uninstall fetch-failures
Answer:
This problem occurs mainly because the connectivity between nodes is incomplete.
1) Check the/etc/hosts file on all hosts.
Requires the server name corresponding to the local ip Address
All Server ip addresses and server names must be included.
2) Check. ssh/authorized_keys
Public key that must contain all servers (including itself)

3: the processing speed is very slow. map is very fast, but reduce is very slow, and reduce = 0% appears repeatedly.
Answer:
Combine the second point, and then
Modify export HADOOP_HEAPSIZE = 4000 in conf/Hadoop-env.sh

4: Hdfs error: datanode can be started but cannot be accessed or ended
When re-formatting a new distributed file, you need to reformat the dfs configured on your NameNode. name. dir this namenode is used to store the NameNode persistent storage namespace and the local file system path of the transaction log, and dfs on each DataNode. data. the Directory of the local file system where the dir path DataNode stores block data is also deleted. If this configuration is used,/home/hadoop/NameData is deleted on NameNode,/home/hadoop/DataNode1 and/home/hadoop/DataNode2 are deleted on DataNode. This is because when Hadoop formats A New Distributed File System, the namespace of each storage corresponds to the VERSION at the time of creation (you can view the VERSION file in the/home/hadoop/NameData/current Directory, which records the VERSION information ), when re-formatting a new distributed system file, it is best to delete the NameData directory first. You must delete dfs. data. dir of each DataNode. In this way, the information versions recorded by namedode and datanode can be matched.
Note: deletion is a very dangerous action and cannot be deleted if it cannot be confirmed !! Back up deleted files !!

5: java. io. IOException: cocould not obtain block: blk_194219614024901469_1100 file =/user/hive/warehouse/src_20090724_log/src_20090724_log
In this case, most of the nodes are disconnected and there is no connection.

6: java. lang. OutOfMemoryError: Java heap space
This exception occurs because the jvm memory is insufficient. Modify the jvm memory size of all datanode.
Java-Xms1024m-Xmx4096m
Generally, the maximum memory usage of jvm should be half of the total memory size. The 8 GB memory we use is set to 4096 MB, and this value may not be the optimal value.

How to add nodes to Hadoop
Add a node by yourself:
1. Configure the environment on the slave, including copying ssh, jdk, related config, lib, bin, etc;
2. Add the host of the new datanode to the cluster namenode and other datanode;
3. Add the new datanode ip address to the master's conf/slaves;
4. Restart the cluster to view the new datanode node in the cluster;
5. Run the bin/start-balancer.sh, it will take a lot of time
Note:
1. If not, the cluster will store new data on the new node, which will reduce the mr efficiency;
2. You can also call the bin/start-balancer.sh command to execute, you can also add the parameter-threshold 5
Threshold is the balance threshold. The default value is 10%. The lower the value, the more balanced the nodes, but the longer the consumption time.
3. balancer can also run on a cluster with an mr job. The default value of dfs. balance. bandwidthPerSec is 1 M/s. When there is no mr job, you can increase this setting to speed up the Load Balancing time.

Other notes:
1. Make sure that the firewall of slave is disabled;
2. make sure that the new Server Load balancer ip address has been added to the master and other Server Load balancer/etc/hosts, otherwise, add the master and other slave ip addresses to the new slave/etc/hosts.
Mapper and reducer count
Url: http://wiki.apache.org/hadoop/HowManyMapsAndReduces
HowManyMapsAndReduces
Partitioning your job into maps and reduces
Picking the appropriate size for the tasks for your job can radically change the performance of Hadoop. increasing the number of tasks increases the framework overhead, but increases load balancing and lowers the cost of failures. at one extreme is the 1 map/1 reduce case where nothing is distributed. the other extreme is to have 1,000,000 maps/1,000,000 reduces where the framework runs out of resources for the overhead.
Number of Maps
The number of maps is usually driven by the number of DFS blocks in the input files. although that causes people to adjust their DFS block size to adjust the number of maps. the right level of parallelism for maps seems to be around 10-100 maps/node, although we have taken it up to 300 or so for very cpu-light map tasks. task setup takes awhile, so it is best if the maps take at least a minute to execute.
Actually controlling the number of maps is subtle. the mapred. map. tasks parameter is just a hint to the InputFormat for the number of maps. the default InputFormat behavior is to split the total number of bytes into the right number of fragments. however, in the default case the DFS block size of the input files is treated as an upper bound for input splits. A lower bound on the split size can be set via mapred. min. split. size. thus, if you want CT 10 TB of input data and have 128 mb dfs blocks, you'll end up with 82 k maps, unless your mapred. map. tasks is even larger. ultimately the [WWW] InputFormat determines the number of maps.
The number of map tasks can also be increased manually using the JobConf's conf. setNumMapTasks (int num ). this can be used to increase the number of map tasks, but will not set the number below that which Hadoop determines via splitting the input data.
Number of Reduces
The right number of requests CES seems to be 0.95 or 1.75 * (nodes * mapred. tasktracker. tasks. maximum ). at 0.95 all of the launch CES can launch immediately and start transfering map outputs as the maps finish. at 1.75 the faster nodes will finish their first round of specified CES and launch a second round of specified CES doing a much better job of load balancing.
Currently the number of specified CES is limited to roughly 1000 by the buffer size for the output files (io. buffer. size * 2 * numReduces The number of reduces also controls the number of output files in the output directory, but usually that is not important because the next map/reduce step will split them into even smaller splits for the maps.
The number of reduce tasks can also be increased in the same way as the map tasks, via JobConf's conf. setNumReduceTasks (int num ).
Your understanding:
Set the number of mappers: it is related to the input file and filesplits. The filesplits is launched as dfs. block. size, which can be removed through mapred. min. split. the size is determined by InputFormat.

Good suggestions:
The right number of requests CES seems to be 0.95 or 1.75 multiplied by (<no. of nodes> * mapred. tasktracker. reduce. tasks. maximum ). increasing the number of incoming CES increases the framework overhead, but increases load balancing and lowers the cost of failures.
<Property>
<Name> mapred. tasktracker. reduce. tasks. maximum </name>
<Value> 2 </value>
<Description> The maximum number of reduce tasks that will be run
Simultaneously by a task tracker.
</Description>
</Property>

New hard drive for a Single node
1. Modify dfs. data. dir of the node to be added to the hard disk, and use commas to separate the directories of the new and old files.
2. Restart dfs

Build a Hadoop environment on Ubuntu 13.04

Cluster configuration for Ubuntu 12.10 + Hadoop 1.2.1

Build a Hadoop environment on Ubuntu (standalone mode + pseudo Distribution Mode)

Configuration of Hadoop environment in Ubuntu

Detailed tutorial on creating a Hadoop environment for standalone Edition

Build a Hadoop environment (using virtual machines to build two Ubuntu systems in a Winodws environment)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hadoop Common Errors and Solutions

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support