Directory
First, about reducer full sequencing
1.1, what is called full order
1.2. What are the criteria for partitioning?
Ii. three ways to fully sort
2.1, a Reducer
2.2. Custom partition function
2.3. Sampling
first, about reducer full sequencing1.1, what is called full order?
In all partitions (Reducer), key is ordered:
- The correct example: if the key in reducer partition 1 is 1, 3, 4, the key in partition 2 is 5, 8, 9
- Error Example: If the key in reducer partition 1 is 1, 3,4, the key in Partition 2 is 2, 7, 9
1.2. What is the standard for data partitioning?
The default partitioning method is based on the hash value of the key after mapper, divided by the number of partitions in reducer, and the remaining number is determined;
- The hash value of a key is 999, at this time there are 3 partitions (Reducer), then 999 3 = 0; then the key and its corresponding value will be divided in the first area (similarly, when the remainder is 1, 2 o'clock will be divided in the corresponding two additional areas).
Note: If the type of the key is the text class (or intwritable, etc.), the hash value of the key of type text is computed, not the hash value of the type string (or int, etc.) obtained through text.
You can also customize the way the partition is judged, see below 2.2, custom partition function
Ii. three ways to fully sort
- A reduce
- Custom partition functions
- Sampling
2.1. A reduce
Only one reduce partition, which is naturally the full sort effect
2.2. Custom partition Function
- Create a class that inherits Partitioner, such as: Partition
- Rewrite its "getpartition" method as the basis for judging partitions
- Add it to the job in main: Job.setpartitionerclass (Partition.class);
In the case of random partitioning, the pseudo-code is as follows:
1 Public classPartitionextendsPartitioner <Text,IntWritable>{2 3 @Override4 Public intGetpartition (text text, intwritable intwritable,intnumpartitions) {5Random r =NewRandom ();6 //based on the number of partitions (numpartitions), gets a random value returned, the value returned as the key to determine the partition's basis7 inti =R.nextint (numpartitions);8 returni;9 }Ten } One A Public classRandomapp { - Public Static voidMain (string[] args)throwsIOException, ClassNotFoundException, interruptedexception { - ...... the - //the way in which the partition is placed (randomly placed) -Job.setpartitionerclass (Partition.class); - + ...... - + //wait for execution Mapperreducer AJob.waitforcompletion (true); at } -}
2.3, sampling: Totalorderpartition
- Randomsampler: Random sampling, poor performance, suitable for disorderly order data
- Intervalsampler: Interval sampling, good performance, suitable for ordered data
- Splitsampler: Slicing sampling, good performance, suitable for ordered data
In the case of random sampling, the pseudo code is as follows:
Note: The following needs to be placed in the app after setting the configuration file
1 //Specify the partition function class in the app2Job.setpartitionerclass (totalorderpartition.class);3 4 //setting the Write path to a file5Totalorderpartition.setpartitionfile (Job.getconfiguration (),NewPath ("E:/par.dat"));6 7 /**8 * Initialize Sampler9 * Randomsampler using random sampling methodTen * Freq The probability of each key being selected Freq x key > Partition number One * NumSamples required number of samples NumSamples > Partitions A * maxsplitssampled file maximum number of slices maxsplitssampled > current slices - */ -Inputsampler.randomsampler =NewInputsampler.randomsampler (Freq, numsamples,maxsplitssampled); the - //Write sampled data -Inputsampler.writepartitionfile (Job,sampler);
Over
Hadoop--reducer Full Order