Totalorderpartitioner of hadoop

Source: Internet
Author: User
Tags comparable comparison table

Http://blog.oddfoo.net/2011/04/17/mapreduce-partition%E5%88%86%E6%9E%90-2/

Location of Partition

Partition location

Partition is mainly used to send the map results to the corresponding reduce. This has two requirements for partition:

1) balance the load and distribute the work evenly to different reduce workers as much as possible.

2) Efficiency and fast allocation speed.

Partitioner provided by mapreduce

The default partitioner of mapreduce is hashpartitioner. In addition to this mapreduce, three types of partitioner are provided. As shown in:

Patition Class Structure


1. partitioner Is the base class of partitioner. If you need to customize partitioner, you also need to inherit this class.

2. hashpartitioner Is the default partitioner of mapreduce. The calculation method is

Which reducer = (key. hashcode () & integer. max_value) % numreducetasks, get the current Reducer.

3. binarypatitioner inherits from partitioner <binarycomparable, V>, Which is partitioner The special subclass. This class provides leftoffset and rightoffset. When calculating the which reducer, only hash is used for the range [rightoffset, leftoffset] of the key value K.

Which reducer = (hash & integer. max_value) % numreducetasks

4. keyfieldbasedpartitioner It is also a hash-based partitioner. Unlike binarypatitioner, it provides multiple intervals for hash calculation. When the number of intervals is 0, keyfieldbasedpartitioner degrades to hashpartitioner.

5. The totalorderpartitioner class can sort the output in full order. Unlike the preceding three partitioner, this class is not based on hash. In the next section, we will introduce totalorderpartitioner in detail.

Totalorderpartitioner

The output of each reducer is ordered by default, but the inputs between reducers are disordered. Totalorderpartitioner is used if the output is in full sorting.

To use totalorderpartitioner, you must provide a partition file for totalorderpartitioner. This file requires that the number of keys (these keys are called the Division) be the same as the number of current reducers-1 and are arranged in ascending order. We will still mention the reason why such a file is used and the details of the file.

Totalorderpartitioner provides two solutions for data types of different keys:

1) For keys of the non-binarycomparable type (refer to Appendix A), totalorderpartitioner uses binary distribution to find the index of the current K.

For example, the number of reducers is 5, and the four Partitions provided by partition file are [2, 4, 6, 8 ]. If the current key value pair is <4, "good"> Index = 1, index + 1 = 2, then the key value pair is sent to the second CER Cer. If a key value pair is <4.5, "good">, the binary search returns-3, and Adds 1 to-3 and returns the reducer to which the key value pair is going to go.

For some numeric data, the complexity of binary search is O (log (CER count), which is faster.

2) For keys of the binarycomparable type (or strings ). Strings can also be sorted alphabetically. In this way, we can also specify some divisions so that different string keys can be allocated to different reducers. The processing here is similar to the numerical type.

For example, if the number of reducers is 5, the partition file provides four partitions: ["ABC", "BCE", "EAA ", "FHC"] Then the "AB" string will be allocated to the first CER because it is smaller than the first division "ABC ".

However, unlike numeric data, string search and comparison cannot follow the numeric data comparison method. The string search method of the tire tree used by mapreducer. The time complexity O (m) of the search, M is the depth of the tree, and the space complexity O (255 m-1 ). It is a typical case of changing the space time.

Tire tree

Tire tree construction

Assume that the maximum depth of the tree is 3, which is divided into [AAad, AAAF, aaaeh, abbx]

Tairtree Structure

The tire tree in mapreduce mainly consists of two types of nodes:
1) innertirenode
Innertirenode is a long string containing 255 characters in mapreduce. The example in contains only 26 English letters.
2) leaf node {unslipttirenode, singesplittirenode, leaftirenode}
Unslipttirenode is a leaf node that does not contain the partition.
Singlesplittirenode is a leaf node that contains only one vertex division.
Leafnode is a leaf node that contains multiple points. (This is rare. This is the case only when the maximum depth of the tree is reached. It is rare in the actual operation process)

Tire tree search process

Example:
1) if the current key value pair is used, the leafnode in the figure will be found. In leafnode, the binary method is used to continue searching and return the index of AAD in the array. If no index is found, an index with the closest partition is returned.
2) If singlenode is found, if it is the same or small as singlenode, return its index. If it is larger than singlenode, return index + 1.
3) If nosplitnode is found, the previous index is returned. For example Returns the index of abbx In the partitioning array.

Totalorderpartitioner

The preceding section describes two requirements for partitioner: speed and load balancing. Tire tree improves the search speed, but how can we find such a partition file? Load Balancing can be achieved with all the divisions.

Inputsampler
Input sampling class, which can sample data in the input directory. Three sampling methods are provided.

Sample Structure

Sample Method Comparison table:

class name

sampling method

constructor

efficiency

features

splitsampler

sampling the first N records

total number of samples, number of partitions

maximum

randomsampler

traverse all data and perform random sampling

sampling frequency, total number of samples, number of samples

Minimum

Intervalsampler <K, V>

Fixed interval sampling

Sampling frequency, number of partitions

Medium

Applicable to ordered data

The writepartitionfile method is critical. This method is to sort the samples provided by the sampling class first, and then write the samples (random method) and the CER number-1 to the partition file. In this way, the key value pair in each partition is approximately the same for the division generated by the sampled data, so that the load balancing function can be completed.

Totalorderpartitioner instance

 Public  Class Sortbytemperatureusingtotalorderpartitioner Extends Configured
Implements Tool
{
@ Override
Public Int Run (string [] ARGs) Throws Exception
{
Jobconf conf = jobbuilder. parseinputandoutput ( This , Getconf (), argS );
If (CONF = Null ){
Return -1;
}
Conf. setinputformat (sequencefileinputformat. Class );
Conf. setoutputkeyclass (intwritable. Class );
Conf. setoutputformat (sequencefileoutputformat. Class );
Sequencefileoutputformat. setcompressoutput (Conf, True );
Sequencefileoutputformat
. Setoutputcompressorclass (Conf, gzipcodec. Class );
Sequencefileoutputformat. setoutputcompressiontype (Conf,
Compressiontype. Block );
Conf. setpartitionerclass (totalorderpartitioner. Class );
Inputsampler. sampler <intwritable, text> sampler = New Inputsampler. randomsampler <intwritable, text> (
0.1, 10000, 10 );
Path input = fileinputformat. getinputpaths (CONF) [0];
Input = input. makequalified (input. getfilesystem (CONF ));
Path partitionfile = New PATH (input, "_ partitions ");
Totalorderpartitioner. setpartitionfile (Conf, partitionfile );
Inputsampler. writepartitionfile (Conf, Sampler );
// Add to distributedcache
Uri partitionuri = New Uri (partitionfile. tostring () + "# _ partitions ");
Distributedcache. addcachefile (partitionuri, conf );
Distributedcache. createsymlink (CONF );
Jobclient. runjob (CONF );
Return 0;
}

Public Static Void Main (string [] ARGs) Throws Exception {
Int Exitcode = toolrunner. Run (
New Sortbytemperatureusingtotalorderpartitioner (), argS );
System. Exit (exitcode );
}
}

ExampleProgramReferenced in: http://www.cnblogs.com/funnydavid/archive/2010/11/24/1886974.html

Appendix
Text is of the binarycomparable and writeablecomparable type.
Booleanwritable, bytewritable, doublewritable, md5hash, intwritable, floatwritable, longwritable, and nullwriable are all writeablecomparable.

 

 

Http://www.cnblogs.com/OnlyXP/archive/2008/12/06/1349026.html

 

In versions earlier than 0.19.0, hadoop does not provide full-order solution. If the default partitioner (hashpartitioner) is used, the output of each reducer is ordered, however, there is no full-order relationship between output files of multiple reducers. To achieve full sorting, You need to implement partitioner by yourself, for example, for partitioner whose key is MAC address, if the distribution of MAC addresses is even, you can construct a partitioner with no more than 255 Reducers Based on the first two bytes of the MAC address. However, this partitoiner is logically related to applications, so it is not universal, therefore, hadoop 0.19.0 provides a general full-order partitioner.

Totalorderpartitioner was initially used for hadoop terasort, probably considering its versatility. later it was released as the release feature of 0.19.0.

The purpose of partitioner is to determine which reducer processes the record output by each map. It must satisfy

1. Average distribution. That is, the number of records processed by each CER should be equal as much as possible.

2. Efficient. Since each record needs to be allocated by partitioner in the map reduce process, its efficiency is crucial and requires efficientAlgorithm.

Obtain data distribution

For the first point, because totalorderpartitioner does not know the distribution of keys in advance, it is necessary to estimate the distribution of keys using a small amount of data samples, and then construct a specific partition model based on the distribution.

One inputsampler in 0.19.0 is used to do this. By specifying the number of reducers and reading some of the input data as the sample, the sample data is sorted and equals according to the number of reducers, obtain the processing interval of Each CER Cer. For example, if the sample contains nine data entries, the sorted keys are:

A B C D E F G H I

If the number of reducers specified is 3, the interval for each CER is

Reducer0 [a, B, c]
Reducer1 [D, E, F]
Reducer2 [G, H, I]

The boundary between intervals is called cut point. The cut point of the preceding three reducers is D and G. Inputsampler sorts the cut points and writes them to the HDFS file, which contains the distribution of input data.

Build an efficient partition model based on Distribution

For the efficiency of the 2nd point mentioned above, after reading the data distribution regular file, totalorderpartitioner will determine whether the key is of the binarycomparable type.

The meaning of binarycomparable is "byte comparable", O. a. h. io. text is of this type, because the two text objects can be compared by byte. If the corresponding bytes are not equal, the size of the two texts can be determined immediately.

First, it is not the binarycomparable type. In this case, totalorderpartitioner uses binary search to determine which interval the key belongs to, and then determines which CER it belongs to. the time complexity of each query is O (logr ), r indicates the number of reducers.

If the key is of the binarycomparable type, totalorderpartitioner constructs trie Based on cut points. Trie is a more efficient data structure for searching. This data structure is suitable for the string type of keys. For example, the default depth of trie in totalorderpartitioner is 2, trie is constructed using 2 + 1 prefix. Each parent node has 255 subnodes, corresponding to 255 ASCII characters. The time complexity of the search is O (M), M is the depth of the tree, and the space complexity is O (255s-1). We can see that this is a space-for-time solution, when the tree depth is 2, a maximum of 255x255 reducers can be allocated, which is sufficient in most cases.

We can see that the efficiency of using trie for partition is higher than that of binarysearch. Two searches can be executed at a time, but when processing hundreds of millions of records, their difference is obvious.AppendixIntroduction to the trie treeArticle: Http://hi.baidu.com/ecchi/blog/item/84bcdc3ff832a5c37d1e71bf.html

In versions earlier than 0.19.0, hadoop does not provide full-order solution. If the default partitioner (hashpartitioner) is used, the output of each reducer is ordered, however, there is no full-order relationship between output files of multiple reducers. To achieve full sorting, You need to implement partitioner by yourself, for example, for partitioner whose key is MAC address, if the distribution of MAC addresses is even, you can construct a partitioner with no more than 255 Reducers Based on the first two bytes of the MAC address. However, this partitoiner is logically related to applications, so it is not universal, therefore, hadoop 0.19.0 provides a general full-order partitioner.

Totalorderpartitioner was initially used for hadoop terasort, probably considering its versatility. later it was released as the release feature of 0.19.0.

The purpose of partitioner is to determine which reducer processes the record output by each map. It must satisfy

1. Average distribution. That is, the number of records processed by each CER should be equal as much as possible.

2. Efficient. Since each record needs to be allocated by partitioner in the map reduce process, its efficiency is crucial and needs to be implemented using efficient algorithms.

Obtain data distribution

For the first point, because totalorderpartitioner does not know the distribution of keys in advance, it is necessary to estimate the distribution of keys using a small amount of data samples, and then construct a specific partition model based on the distribution.

One inputsampler in 0.19.0 is used to do this. By specifying the number of reducers and reading some of the input data as the sample, the sample data is sorted and equals according to the number of reducers, obtain the processing interval of Each CER Cer. For example, if the sample contains nine data entries, the sorted keys are:

A B C D E F G H I

If the number of reducers specified is 3, the interval for each CER is

Reducer0 [a, B, c]
Reducer1 [D, E, F]
Reducer2 [G, H, I]

The boundary between intervals is called cut point. The cut point of the preceding three reducers is D and G. Inputsampler sorts the cut points and writes them to the HDFS file, which contains the distribution of input data.

Build an efficient partition model based on Distribution

For the efficiency of the 2nd point mentioned above, after reading the data distribution regular file, totalorderpartitioner will determine whether the key is of the binarycomparable type.

The meaning of binarycomparable is "byte comparable", O. a. h. io. text is of this type, because the two text objects can be compared by byte. If the corresponding bytes are not equal, the size of the two texts can be determined immediately.

First, it is not the binarycomparable type. In this case, totalorderpartitioner uses binary search to determine which interval the key belongs to, and then determines which CER it belongs to. the time complexity of each query is O (logr ), r indicates the number of reducers.

If the key is of the binarycomparable type, totalorderpartitioner constructs trie Based on cut points. Trie is a more efficient data structure for searching. This data structure is suitable for the string type of keys. For example, the default depth of trie in totalorderpartitioner is 2, trie is constructed using 2 + 1 prefix. Each parent node has 255 subnodes, corresponding to 255 ASCII characters. The time complexity of the search is O (M), M is the depth of the tree, and the space complexity is O (255s-1). We can see that this is a space-for-time solution, when the tree depth is 2, a maximum of 255x255 reducers can be allocated, which is sufficient in most cases.

as you can see, the efficiency of partition using trie is higher than that of binarysearch. Two searches can be executed at a time, but when processing hundreds of millions of records, their gap is obvious. Appendix a good article about the trie tree: http://hi.baidu.com/ecchi/blog/item/84bcdc3ff832a5c37d1e71bf.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.