This article mainly describes how to sort keys by Hadoop.
1. Partition
Partition distributes map results to multiple Reduce workers. Of course, multiple reducers can reflect the advantages of distributed systems.
2. Ideas
Since each partition is ordered internally, as long as the partitions are ordered, all partitions can be ordered.
3. Problems
With the idea, how to define the boundaries of partition is a problem.
Solution: hadoop provides a sampler to help us estimate the entire boundary so that the data distribution can be averaged as much as possible.
Build a Hadoop environment on Ubuntu 13.04
Cluster configuration for Ubuntu 12.10 + Hadoop 1.2.1
Build a Hadoop environment on Ubuntu (standalone mode + pseudo Distribution Mode)
Configuration of Hadoop environment in Ubuntu
Detailed tutorial on creating a Hadoop environment for standalone Edition
Build a Hadoop environment (using virtual machines to build two Ubuntu systems in a Winodws environment)