Number of maps in MapReduce

Source: Internet
Author: User

Fileinputformat will split the input file into split before reading the data in the map phase. the number of split determines the number of maps. The main factors that affect the number of maps (split) are:

1) The size of the file. When the Block (dfs.block.size) is 128m, if the input file is 128m, it is divided into 1 split, and when the block is 256m, it is divided into 2 split.

2) Number of files. Fileinputformat splits split according to the file and splits only large files, that is, those sizes that exceed the size of the HDFS block. If the dfs.block.size in HDFs is set to 128m, and there are 100 files in the entered directory, the split number is at least 100.

3) The size of the splitsize. Shards are segmented according to the size of the Splitszie, and the size of a split is equal to the size of the HDFs block without setting the default. But the application can adjust the splitsize by two parameters.

Inputsplit=math.max (MinSize, Math.min (maxSize, BlockSize)

which

Minsize=mapred.min.split.size

Maxsize=mapred.max.split.size

We can add the following code to the driver section of the MapReduce program:

Textinputformat.setmininputsplitsize (job,1024l); Set Minimum shard size

Textinputformat.setmaxinputsplitsize (job,1024x1024x10l); Set Maximum shard size

Summarized as follows:

When Mapreduce.input.fileinputformat.split.maxsize > Mapreduce.input.fileinputformat.split.minsize > Dfs.blocksize, the splitsize at this time will be determined by the Mapreduce.input.fileinputformat.split.minsize parameter

When Mapreduce.input.fileinputformat.split.maxsize > Dfs.blocksize > Mapreduce.input.fileinputformat.split.minsize, the splitsize at this time will be determined by the Dfs.blocksize configuration

When Dfs.blocksize > Mapreduce.input.fileinputformat.split.maxsize > Mapreduce.input.fileinputformat.split.minsize, the splitsize at this time will be determined by the Mapreduce.input.fileinputformat.split.maxsize parameter.

If you think reading this blog gives you something to gain, you might want to click " recommend " in the lower right corner.
If you want to find my new blog more easily, click on " Follow me " in the lower left corner.
If you are interested in what my blog is talking about, please keep following my follow-up blog, I am " Liu Chao ★ljc".

This article is copyright to the author and the blog Park, Welcome to reprint, but without the consent of the author must retain this paragraph, and in the article page obvious location to the original link, otherwise reserves the right to pursue legal responsibility.

Number of maps in MapReduce

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.