Hadoop Performance Tuning

Source: Internet
Author: User

1.JVM Reuse

JVM reuse does not mean that two or more tasks of the same job run on the same JVM at the same time, but that n tasks run on the same JVM sequentially , eliminating the time for the JVM to shut down and restart. The n value can be set in the Mapre-site.xml file mapreduce.job.jvm.numtasks(default 1) attribute of Hadoop. Also available in hive execution settings:set mapred.job.reuse.jvm.num.tasks=10; (default 1)

A TT can run up to the same number of tasks by Mapred-site.xml in mapreduce.tasktracker.map.tasks.maximum and Mapreduce.tasktracker.reduce.tasks.maximum settings. Other methods, such as on the jobclient side through the command line :-D mapred.tasktracker.map.tasks.maximum=number or conf.set (" Mapred.tasktracker.map.tasks.maximum "," number ") is set to " Invalid ".

What factors affect the operational efficiency of the job?

Number of mapper : Try to cut the input data into an integer multiple of the data block. If you have too many small files, consider Combinefileinputformat

number of Reducer : In order to achieve maximum performance, the number of reducer in the cluster should be slightly smaller than the number of reducer task slots

combiner use : Fully use the merge function to reduce the amount of data passed between map and reduce, combiner run after map

median compression : compressing the map output value reduces the amount of conf.setcompressmapoutput (true) before reducing to reduce Setmapoutputcompressorclass (Gzipcodec.class)

Custom Writable: If you use a custom writable object or a custom comparator, you must ensure that you have implemented Rawcomparator

adjust the shuffle parameter : MapReduce's shuffle process can adjust some memory management parameters to compensate for poor performance

Hadoop Performance Tuning

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.