MapReduce Optimization Parameters

Last Update:2018-10-08 Source: Internet

Author: User

Tags shuffle

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Resource-related parameters
The following parameters are configured in the user's own MapReduce application to take effect
(1) MAPREDUCE.MAP.MEMORY.MB: The maximum amount of memory that a map task can use (in megabytes), the default is 1024. If the map task actually uses more resources than this value, it is forced to kill.
(2) MAPREDUCE.REDUCE.MEMORY.MB: The maximum amount of resources that a reduce task can use (in megabytes), the default is 1024. If the reduce task actually uses more resources than this value, it is forced to kill.
(3) Mapreduce.map.cpu.vcores: The maximum number of CPU cores available per Maptask, default value: 1
(4) Mapreduce.reduce.cpu.vcores: The maximum number of CPU cores per reducetask default value: 1
(5) The JVM parameter of the MAPREDUCE.MAP.JAVA.OPTS:MAP task, where you can configure the default Java heap
Parameters such as size, for example: "-xmx1024m-verbose:gc-xloggc:/tmp/@[email protected]"
(@[email protected] will be automatically replaced by the Hadoop framework for the corresponding TaskID), the default value: ""
(6) The JVM parameter of the Mapreduce.reduce.java.opts:Reduce task, where you can configure the default Java
Parameters such as heap size, for example: "-xmx1024m-verbose:gc-xloggc:/tmp/@[email protected]", Default value: ""
Should be configured in the server's configuration file prior to yarn startup to take effect
(1) The minimum configuration requested for each container in YARN.SCHEDULER.MINIMUM-ALLOCATION-MB rm, in megabytes, default 1024.
(2) Maximum allocation per container request in YARN.SCHEDULER.MAXIMUM-ALLOCATION-MB RM, in megabytes, default 8192.
(3) Yarn.scheduler.minimum-allocation-vcores 1
(4) Yarn.scheduler.maximum-allocation-vcores 32
(5) YARN.NODEMANAGER.RESOURCE.MEMORY-MB indicates the total amount of physical memory that yarn can use on the node, which is 8192 (MB) By default, and note that if your node's memory resources are not 8GB, you need to reduce this value. Yarn does not intelligently probe the total amount of physical memory for a node.
Shuffle key parameters for performance optimization should be configured before yarn starts
(1) MAPREDUCE.TASK.IO.SORT.MB Shuffle ring buffer size, default 100m
(2) mapreduce.map.sort.spill.percent 0.8 ring Buffer Overflow threshold value, default 80%
2. Fault tolerance related parameters
(1) Mapreduce.map.maxattempts: The maximum number of retries per map task, and once the retry parameter exceeds this value, the map task fails with the default value: 4.
(2) Mapreduce.reduce.maxattempts: The maximum number of retries per reduce task, and once the retry parameter exceeds this value, the map task fails with the default value: 4.
(3) Mapreduce.map.failures.maxpercent: When the failed map task failure ratio exceeds this value, the entire job fails with a default value of 0. If your application allows partial input data to be discarded, the value is set to a value greater than 0, such as 5, indicating that if a map task with less than 5% fails (if a map task retries more than mapreduce.map.maxattempts, the map Task fails, its corresponding input data will not produce any results), the entire job is thrown to be considered successful.
(4) Mapreduce.reduce.failures.maxpercent: When the failed reduce task failure ratio exceeds this value, the entire job fails with a default value of 0.
(5) Mapreduce.task.timeout: If a task does not have any entry within a certain period of time, that is, the new data is not read, and there is no output data, it is considered that the task is in block state, may be temporarily stuck, and may be stuck forever. In order to prevent the user program from never exiting the block, force a timeout (in milliseconds), the default is 600000, and a value of 0 disables time-outs:
3. Efficiency and stability parameters
(1) Mapreduce.map.speculative: whether to open the speculative execution mechanism for map task, the default is true, and if true, you can execute several instances of the map task in parallel.
(2) Mapreduce.reduce.speculative: If the speculative execution mechanism is turned on for the reduce task, the default is True
(3) Mapreduce.input.fileinputformat.split.minsize:FileInputFormat The minimum slice size, default 1.
(5) Mapreduce.input.fileinputformat.split.maxsize:FileInputFormat The maximum slice size when slicing
Speculative execution mechanism (speculative execution): it is based on certain rules to infer the "drag" task, and for such a task to start a backup task, so that the task and the original task to process the same data at the same time, and finally choose the first successful completion of the task of the calculation results as the final result.

MapReduce Optimization Parameters

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More