MapReduce Global Ordering

Source: Internet
Author: User
1, 1TB (or 1 minutes) sort of Championship
As the framework of distributed data processing, how fast is the data processing capability of the cluster. Perhaps 1TB sequencing can be one of the criteria for measurement.

1TB sorting is the sort of data for 1TB (1024GB, about 10 billion rows of data). In 2008, Hadoop won the first place in the 1TB ranking benchmark, and it took 209 seconds to sort the 1TB data. Later, the 1TB sort was replaced by a 1-minute sort, and the 1-minute sort refers to as many sorts as possible within a minute. In 2009, in a 1406-node Hadoop cluster, the 500GB was sequenced in 59 seconds, while in 1460-node clusters It took only 62 seconds to sort the 1TB data.

Is it impressive to have such amazing data processing power? Oh

Let's take a look at the sequencing process.


2, the process of sequencing

1TB of data. 10 billion data. What kind of data are they? Let's look at a few:
Java Code    .t^#\|v$2\          0aaaaaaaaaabbbbbbbbbbccccccccccddddddddddeeeeeeeeeeffffffffffgggggggggghhhhhhhh   75@~? ' wduf          1iiiiiiiiiijjjjjjjjjjkkkkkkkkkkllllllllllmmmmmmmmmmnnnnnnnnnnoooooooooopppppppp   w[o| |:N&H,           2qqqqqqqqqqrrrrrrrrrrssssssssssttttttttttuuuuuuuuuuvvvvvvvvvvwwwwwwwwwwxxxxxxxx   ^Eu) <n#kdP           3yyyyyyyyyyzzzzzzzzzzaaaaaaaaaabbbbbbbbbbccccccccccddddddddddeeeeeeeeeeffffffff   +l-$ $OE/zh           4gggggggggghhhhhhhhhhiiiiiiiiiijjjjjjjjjjkkkkkkkkkkllllllllllmmmmmmmmmmnnnnnnnn   LsS8) |. zld          5ooooooooooppppppppppqqqqqqqqqqrrrrrrrrrrssssssssssttttttttttuuuuuuuuuuvvvvvvvv   Le5awB. $SM      &nbSp;    6wwwwwwwwwwxxxxxxxxxxyyyyyyyyyyzzzzzzzzzzaaaaaaaaaabbbbbbbbbbccccccccccdddddddd    q__[fwhkfg          7eeeeeeeeeeffffffffffgggggggggghhhhhhhhhhiiiiiiiiiijjjjjjjjjjkkkkkkkkkkllllllll  ; l+!2rt~hd          8mmmmmmmmmmnnnnnnnnnnooooooooooppppppppppqqqqqqqqqqrrrrrrrrrrsssssssssstttttttt   M^*dDE;6^<           9uuuuuuuuuuvvvvvvvvvvwwwwwwwwwwxxxxxxxxxxyyyyyyyyyyzzzzzzzzzzaaaaaaaaaabbbbbbbb  

Describe: Each line is a piece of data. Each piece, consisting of 2 parts, is preceded by a key consisting of 10 characters, followed by a 80-character value.

Sort tasks: Order by key.

So where does 1TB of data come from? The answer is generated by the program, with a mapreduce job with only a map and no reduce, generating 10 billion rows of data on the cluster first. Then, on this basis, the sequenced MapReduce job is run again to test the cluster sorting performance.


3, the principle of sorting

First of all, people familiar with MapReduce know that sequencing is a natural feature of MapReduce. The MapReduce framework has sorted the data keys before the data reaches reducer.

Therefore, in this sort of job, no special mapper and reducer classes are required. With the default
Identitymapper and Identityreducer.

Since sorting is a natural feature, what is the difficulty of 1TB sorting? A: 10 billion rows of data are scattered on more than 1000 machines, mapper and reducer are identity, the difficulty is in the shuffle phase of MapReduce. The key is how to sample and how to write Partitioner.

Fortunately, this sort of source code is nearly included in the examples of Hadoop, let's analyze it below.


4. The process of sampling and partition

In the face of such a large amount of data, in order to partition more evenly. To "sample" first:

1) Random sampling of Math.min (splits.length) split (input shard), 10,000 samples for each split, a total of 100,000 samples
2) 100,000 sample sorting, according to the number of reducer (n), take out the interval average of n-1 samples
3) Write this n-1 sample to Partitionfile (_partition.lst, is a sequencefile), key is the sample, the value is Nullvalue
4) write partitionfile to Distributedcache

Next, formally execute the MapReduce job:
5) Each map node:
A. Build a "index tree" similar to B-number according to the N-1 sample:
* Each non-leaf node has 256 sub-nodes.
* The non-leaf node of the root node has 1 layers, plus the root node and leaf node, a total of 3 layers.
* Non-leaf nodes represent Key's "byte path"
* Each leaf node represents the first 2 bytes path of a key
* On the leaf node, the range of partition number is saved, how many reducer there are partition number

B. A key with the same prefix is assigned to the same leaf node.
C. On a child node, there may be multiple reducer
D. A key that is smaller than the first I, is assigned to the reducer, and the remainder is assigned to the last reducer.

6) for a key,partition process:

A. First choice of the 1th byte of key, find the 1th non-leaf node
B. Again according to the 2nd byte of key, the leaf node
C. Each leaf node may correspond to multiple samples (i.e. multiple reducer), and then compare each sample to determine which reducer to assign


5. The "Index tree" of the graphical partition

The above text description may be more difficult to understand, etongg students suggested that I draw a picture. All of them have the following text. Thanks to Etongg and everyone's attention to this post.

The "Index Tree" is intended to allow key to quickly find the corresponding reducer. The following figure is a schematic of the index tree I drew:




Make a little explanation of the above diagram:
1, for the sake of simplicity, I only draw a, B, c three nodes, the actual is 256 nodes.
2, this figure assumes that there are 20 reducer (subscript 0 to 19), then we finally get n-1 sample, 19 samples (subscript 18 is the last one)
3, the circle in the figure, represents the node on the index tree, the index tree total 3 layers.
4. The rectangle below the leaf node represents the sampled array. The red number represents the subscript for the sample.
5. Each node corresponds to a subscript range on the sampled array (more prepared, it corresponds to a range of partition number, each partition number represents a reducer). This range is marked on the way with blue text.


There is a sentence in the preceding article:
A key that is smaller than the first I, is assigned to the reducer, and the remainder is assigned to the last reducer

Here's a small correction that should be:
A key less than or equal to the I sample is assigned to the reducer, and the remainder is assigned to the last reducer.

Below starts Partition:
If key starts with "AAA", it is assigned to the "0" reducer.
If key begins with "ACA", it is assigned to the "4" reducer.
If key starts with "ACD", it is assigned to the "4" reducer.
If key starts with "ACF", it is assigned to the "5" reducer.

So

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.