MapReduce Global Ordering

Last Update:2018-07-26 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1, 1TB (or 1 minutes) sort of Championship
As the framework of distributed data processing, how fast is the data processing capability of the cluster. Perhaps 1TB sequencing can be one of the criteria for measurement.

1TB sorting is the sort of data for 1TB (1024GB, about 10 billion rows of data). In 2008, Hadoop won the first place in the 1TB ranking benchmark, and it took 209 seconds to sort the 1TB data. Later, the 1TB sort was replaced by a 1-minute sort, and the 1-minute sort refers to as many sorts as possible within a minute. In 2009, in a 1406-node Hadoop cluster, the 500GB was sequenced in 59 seconds, while in 1460-node clusters It took only 62 seconds to sort the 1TB data.

Is it impressive to have such amazing data processing power? Oh

Let's take a look at the sequencing process.

2, the process of sequencing

1TB of data. 10 billion data. What kind of data are they? Let's look at a few:
Java Code .t^#\|v$2\ 0aaaaaaaaaabbbbbbbbbbccccccccccddddddddddeeeeeeeeeeffffffffffgggggggggghhhhhhhh 75@~? ' wduf 1iiiiiiiiiijjjjjjjjjjkkkkkkkkkkllllllllllmmmmmmmmmmnnnnnnnnnnoooooooooopppppppp w[o| |:N&H, 2qqqqqqqqqqrrrrrrrrrrssssssssssttttttttttuuuuuuuuuuvvvvvvvvvvwwwwwwwwwwxxxxxxxx ^Eu) <n#kdP 3yyyyyyyyyyzzzzzzzzzzaaaaaaaaaabbbbbbbbbbccccccccccddddddddddeeeeeeeeeeffffffff +l-$ $OE/zh 4gggggggggghhhhhhhhhhiiiiiiiiiijjjjjjjjjjkkkkkkkkkkllllllllllmmmmmmmmmmnnnnnnnn LsS8) |. zld 5ooooooooooppppppppppqqqqqqqqqqrrrrrrrrrrssssssssssttttttttttuuuuuuuuuuvvvvvvvv Le5awB. $SM &nbSp; 6wwwwwwwwwwxxxxxxxxxxyyyyyyyyyyzzzzzzzzzzaaaaaaaaaabbbbbbbbbbccccccccccdddddddd q__[fwhkfg 7eeeeeeeeeeffffffffffgggggggggghhhhhhhhhhiiiiiiiiiijjjjjjjjjjkkkkkkkkkkllllllll ; l+!2rt~hd 8mmmmmmmmmmnnnnnnnnnnooooooooooppppppppppqqqqqqqqqqrrrrrrrrrrsssssssssstttttttt M^*dDE;6^< 9uuuuuuuuuuvvvvvvvvvvwwwwwwwwwwxxxxxxxxxxyyyyyyyyyyzzzzzzzzzzaaaaaaaaaabbbbbbbb

Describe: Each line is a piece of data. Each piece, consisting of 2 parts, is preceded by a key consisting of 10 characters, followed by a 80-character value.

Sort tasks: Order by key.

So where does 1TB of data come from? The answer is generated by the program, with a mapreduce job with only a map and no reduce, generating 10 billion rows of data on the cluster first. Then, on this basis, the sequenced MapReduce job is run again to test the cluster sorting performance.

3, the principle of sorting

First of all, people familiar with MapReduce know that sequencing is a natural feature of MapReduce. The MapReduce framework has sorted the data keys before the data reaches reducer.

Therefore, in this sort of job, no special mapper and reducer classes are required. With the default
Identitymapper and Identityreducer.

Since sorting is a natural feature, what is the difficulty of 1TB sorting? A: 10 billion rows of data are scattered on more than 1000 machines, mapper and reducer are identity, the difficulty is in the shuffle phase of MapReduce. The key is how to sample and how to write Partitioner.

Fortunately, this sort of source code is nearly included in the examples of Hadoop, let's analyze it below.

4. The process of sampling and partition

In the face of such a large amount of data, in order to partition more evenly. To "sample" first:

1) Random sampling of Math.min (splits.length) split (input shard), 10,000 samples for each split, a total of 100,000 samples
2) 100,000 sample sorting, according to the number of reducer (n), take out the interval average of n-1 samples
3) Write this n-1 sample to Partitionfile (_partition.lst, is a sequencefile), key is the sample, the value is Nullvalue
4) write partitionfile to Distributedcache

Next, formally execute the MapReduce job:
5) Each map node:
A. Build a "index tree" similar to B-number according to the N-1 sample:
* Each non-leaf node has 256 sub-nodes.
* The non-leaf node of the root node has 1 layers, plus the root node and leaf node, a total of 3 layers.
* Non-leaf nodes represent Key's "byte path"
* Each leaf node represents the first 2 bytes path of a key
* On the leaf node, the range of partition number is saved, how many reducer there are partition number

B. A key with the same prefix is assigned to the same leaf node.
C. On a child node, there may be multiple reducer
D. A key that is smaller than the first I, is assigned to the reducer, and the remainder is assigned to the last reducer.

6) for a key,partition process:

A. First choice of the 1th byte of key, find the 1th non-leaf node
B. Again according to the 2nd byte of key, the leaf node
C. Each leaf node may correspond to multiple samples (i.e. multiple reducer), and then compare each sample to determine which reducer to assign

5. The "Index tree" of the graphical partition

The above text description may be more difficult to understand, etongg students suggested that I draw a picture. All of them have the following text. Thanks to Etongg and everyone's attention to this post.

The "Index Tree" is intended to allow key to quickly find the corresponding reducer. The following figure is a schematic of the index tree I drew:

Make a little explanation of the above diagram:
1, for the sake of simplicity, I only draw a, B, c three nodes, the actual is 256 nodes.
2, this figure assumes that there are 20 reducer (subscript 0 to 19), then we finally get n-1 sample, 19 samples (subscript 18 is the last one)
3, the circle in the figure, represents the node on the index tree, the index tree total 3 layers.
4. The rectangle below the leaf node represents the sampled array. The red number represents the subscript for the sample.
5. Each node corresponds to a subscript range on the sampled array (more prepared, it corresponds to a range of partition number, each partition number represents a reducer). This range is marked on the way with blue text.

There is a sentence in the preceding article:
A key that is smaller than the first I, is assigned to the reducer, and the remainder is assigned to the last reducer

Here's a small correction that should be:
A key less than or equal to the I sample is assigned to the reducer, and the remainder is assigned to the last reducer.

Below starts Partition:
If key starts with "AAA", it is assigned to the "0" reducer.
If key begins with "ACA", it is assigned to the "4" reducer.
If key starts with "ACD", it is assigned to the "4" reducer.
If key starts with "ACF", it is assigned to the "5" reducer.

So

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

MapReduce Global Ordering

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

MapReduce Global Ordering

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support