Hadoop--reducer Full Order

Last Update:2018-05-28 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Directory　　　　　　

First, about reducer full sequencing

1.1, what is called full order

1.2. What are the criteria for partitioning?

Ii. three ways to fully sort

2.1, a Reducer

2.2. Custom partition function

2.3. Sampling

first, about reducer full sequencing1.1, what is called full order?

In all partitions (Reducer), key is ordered:

The correct example: if the key in reducer partition 1 is 1, 3, 4, the key in partition 2 is 5, 8, 9
Error Example: If the key in reducer partition 1 is 1, 3,4, the key in Partition 2 is 2, 7, 9

1.2. What is the standard for data partitioning?

The default partitioning method is based on the hash value of the key after mapper, divided by the number of partitions in reducer, and the remaining number is determined;

The hash value of a key is 999, at this time there are 3 partitions (Reducer), then 999 3 = 0; then the key and its corresponding value will be divided in the first area (similarly, when the remainder is 1, 2 o'clock will be divided in the corresponding two additional areas).

Note: If the type of the key is the text class (or intwritable, etc.), the hash value of the key of type text is computed, not the hash value of the type string (or int, etc.) obtained through text.

You can also customize the way the partition is judged, see below 2.2, custom partition function

Ii. three ways to fully sort

A reduce
Custom partition functions
Sampling

2.1. A reduce

Only one reduce partition, which is naturally the full sort effect

2.2. Custom partition Function

Create a class that inherits Partitioner, such as: Partition
Rewrite its "getpartition" method as the basis for judging partitions
Add it to the job in main: Job.setpartitionerclass (Partition.class);

In the case of random partitioning, the pseudo-code is as follows:

1  Public classPartitionextendsPartitioner <Text,IntWritable>{2 3 @Override4      Public intGetpartition (text text, intwritable intwritable,intnumpartitions) {5Random r =NewRandom ();6         //based on the number of partitions (numpartitions), gets a random value returned, the value returned as the key to determine the partition's basis7         inti =R.nextint (numpartitions);8         returni;9     }Ten } One  A  Public classRandomapp { -      Public Static voidMain (string[] args)throwsIOException, ClassNotFoundException, interruptedexception { -         ...... the  -         //the way in which the partition is placed (randomly placed) -Job.setpartitionerclass (Partition.class); -          +         ...... -  +         //wait for execution Mapperreducer AJob.waitforcompletion (true); at     } -}

2.3, sampling: Totalorderpartition

Randomsampler: Random sampling, poor performance, suitable for disorderly order data
Intervalsampler: Interval sampling, good performance, suitable for ordered data
Splitsampler: Slicing sampling, good performance, suitable for ordered data

In the case of random sampling, the pseudo code is as follows:

Note: The following needs to be placed in the app after setting the configuration file

1         //Specify the partition function class in the app2Job.setpartitionerclass (totalorderpartition.class);3 4         //setting the Write path to a file5Totalorderpartition.setpartitionfile (Job.getconfiguration (),NewPath ("E:/par.dat"));6 7         /**8 * Initialize Sampler9 * Randomsampler using random sampling methodTen * Freq The probability of each key being selected Freq x key > Partition number One * NumSamples required number of samples NumSamples > Partitions A * maxsplitssampled file maximum number of slices maxsplitssampled > current slices -          */ -Inputsampler.randomsampler =NewInputsampler.randomsampler (Freq, numsamples,maxsplitssampled); the  -         //Write sampled data -Inputsampler.writepartitionfile (Job,sampler);

Over

Hadoop--reducer Full Order

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hadoop--reducer Full Order

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Hadoop--reducer Full Order

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support