MapReduce Optimization Parameters

Source: Internet
Author: User
Tags shuffle

1. Resource-related parameters
The following parameters are configured in the user's own MapReduce application to take effect
(1) MAPREDUCE.MAP.MEMORY.MB: The maximum amount of memory that a map task can use (in megabytes), the default is 1024. If the map task actually uses more resources than this value, it is forced to kill.
(2) MAPREDUCE.REDUCE.MEMORY.MB: The maximum amount of resources that a reduce task can use (in megabytes), the default is 1024. If the reduce task actually uses more resources than this value, it is forced to kill.
(3) Mapreduce.map.cpu.vcores: The maximum number of CPU cores available per Maptask, default value: 1
(4) Mapreduce.reduce.cpu.vcores: The maximum number of CPU cores per reducetask default value: 1
(5) The JVM parameter of the MAPREDUCE.MAP.JAVA.OPTS:MAP task, where you can configure the default Java heap
Parameters such as size, for example: "-xmx1024m-verbose:gc-xloggc:/tmp/@[email protected]"
(@[email protected] will be automatically replaced by the Hadoop framework for the corresponding TaskID), the default value: ""
(6) The JVM parameter of the Mapreduce.reduce.java.opts:Reduce task, where you can configure the default Java
Parameters such as heap size, for example: "-xmx1024m-verbose:gc-xloggc:/tmp/@[email protected]", Default value: ""
Should be configured in the server's configuration file prior to yarn startup to take effect
(1) The minimum configuration requested for each container in YARN.SCHEDULER.MINIMUM-ALLOCATION-MB rm, in megabytes, default 1024.
(2) Maximum allocation per container request in YARN.SCHEDULER.MAXIMUM-ALLOCATION-MB RM, in megabytes, default 8192.
(3) Yarn.scheduler.minimum-allocation-vcores 1
(4) Yarn.scheduler.maximum-allocation-vcores 32
(5) YARN.NODEMANAGER.RESOURCE.MEMORY-MB indicates the total amount of physical memory that yarn can use on the node, which is 8192 (MB) By default, and note that if your node's memory resources are not 8GB, you need to reduce this value. Yarn does not intelligently probe the total amount of physical memory for a node.
Shuffle key parameters for performance optimization should be configured before yarn starts
(1) MAPREDUCE.TASK.IO.SORT.MB Shuffle ring buffer size, default 100m
(2) mapreduce.map.sort.spill.percent 0.8 ring Buffer Overflow threshold value, default 80%
2. Fault tolerance related parameters
(1) Mapreduce.map.maxattempts: The maximum number of retries per map task, and once the retry parameter exceeds this value, the map task fails with the default value: 4.
(2) Mapreduce.reduce.maxattempts: The maximum number of retries per reduce task, and once the retry parameter exceeds this value, the map task fails with the default value: 4.
(3) Mapreduce.map.failures.maxpercent: When the failed map task failure ratio exceeds this value, the entire job fails with a default value of 0. If your application allows partial input data to be discarded, the value is set to a value greater than 0, such as 5, indicating that if a map task with less than 5% fails (if a map task retries more than mapreduce.map.maxattempts, the map Task fails, its corresponding input data will not produce any results), the entire job is thrown to be considered successful.
(4) Mapreduce.reduce.failures.maxpercent: When the failed reduce task failure ratio exceeds this value, the entire job fails with a default value of 0.
(5) Mapreduce.task.timeout: If a task does not have any entry within a certain period of time, that is, the new data is not read, and there is no output data, it is considered that the task is in block state, may be temporarily stuck, and may be stuck forever. In order to prevent the user program from never exiting the block, force a timeout (in milliseconds), the default is 600000, and a value of 0 disables time-outs:
3. Efficiency and stability parameters
(1) Mapreduce.map.speculative: whether to open the speculative execution mechanism for map task, the default is true, and if true, you can execute several instances of the map task in parallel.
(2) Mapreduce.reduce.speculative: If the speculative execution mechanism is turned on for the reduce task, the default is True
(3) Mapreduce.input.fileinputformat.split.minsize:FileInputFormat The minimum slice size, default 1.
(5) Mapreduce.input.fileinputformat.split.maxsize:FileInputFormat The maximum slice size when slicing
Speculative execution mechanism (speculative execution): it is based on certain rules to infer the "drag" task, and for such a task to start a backup task, so that the task and the original task to process the same data at the same time, and finally choose the first successful completion of the task of the calculation results as the final result.

MapReduce Optimization Parameters

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.