Hadoop--reducer Full Order

Source: Internet
Author: User

Directory      

First, about reducer full sequencing

1.1, what is called full order

1.2. What are the criteria for partitioning?

Ii. three ways to fully sort

2.1, a Reducer

2.2. Custom partition function

2.3. Sampling

first, about reducer full sequencing1.1, what is called full order?

In all partitions (Reducer), key is ordered:

    • The correct example: if the key in reducer partition 1 is 1, 3, 4, the key in partition 2 is 5, 8, 9
    • Error Example: If the key in reducer partition 1 is 1, 3,4, the key in Partition 2 is 2, 7, 9

1.2. What is the standard for data partitioning?

The default partitioning method is based on the hash value of the key after mapper, divided by the number of partitions in reducer, and the remaining number is determined;

    • The hash value of a key is 999, at this time there are 3 partitions (Reducer), then 999 3 = 0; then the key and its corresponding value will be divided in the first area (similarly, when the remainder is 1, 2 o'clock will be divided in the corresponding two additional areas).

Note: If the type of the key is the text class (or intwritable, etc.), the hash value of the key of type text is computed, not the hash value of the type string (or int, etc.) obtained through text.

You can also customize the way the partition is judged, see below 2.2, custom partition function

Ii. three ways to fully sort
    • A reduce
    • Custom partition functions
    • Sampling

2.1. A reduce

Only one reduce partition, which is naturally the full sort effect

2.2. Custom partition Function
    1. Create a class that inherits Partitioner, such as: Partition
    2. Rewrite its "getpartition" method as the basis for judging partitions
    3. Add it to the job in main: Job.setpartitionerclass (Partition.class);

In the case of random partitioning, the pseudo-code is as follows:

1  Public classPartitionextendsPartitioner <Text,IntWritable>{2 3 @Override4      Public intGetpartition (text text, intwritable intwritable,intnumpartitions) {5Random r =NewRandom ();6         //based on the number of partitions (numpartitions), gets a random value returned, the value returned as the key to determine the partition's basis7         inti =R.nextint (numpartitions);8         returni;9     }Ten } One  A  Public classRandomapp { -      Public Static voidMain (string[] args)throwsIOException, ClassNotFoundException, interruptedexception { -         ...... the  -         //the way in which the partition is placed (randomly placed) -Job.setpartitionerclass (Partition.class); -          +         ...... -  +         //wait for execution Mapperreducer AJob.waitforcompletion (true); at     } -}

2.3, sampling: Totalorderpartition
    • Randomsampler: Random sampling, poor performance, suitable for disorderly order data
    • Intervalsampler: Interval sampling, good performance, suitable for ordered data
    • Splitsampler: Slicing sampling, good performance, suitable for ordered data

In the case of random sampling, the pseudo code is as follows:

Note: The following needs to be placed in the app after setting the configuration file

1         //Specify the partition function class in the app2Job.setpartitionerclass (totalorderpartition.class);3 4         //setting the Write path to a file5Totalorderpartition.setpartitionfile (Job.getconfiguration (),NewPath ("E:/par.dat"));6 7         /**8 * Initialize Sampler9 * Randomsampler using random sampling methodTen * Freq The probability of each key being selected Freq x key > Partition number One * NumSamples required number of samples NumSamples > Partitions A * maxsplitssampled file maximum number of slices maxsplitssampled > current slices -          */ -Inputsampler.randomsampler =NewInputsampler.randomsampler (Freq, numsamples,maxsplitssampled); the  -         //Write sampled data -Inputsampler.writepartitionfile (Job,sampler);

Over

Hadoop--reducer Full Order

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.