Requirements: A total of 6,428,632 raw data, analysis of the registration of different mailboxes, and according to the number of users from large to small sort.
Analysis: Hadoop comes with a sort, sorted by key value. To sort by value, you need to sort two times.
Steps:
1.JOB1: Statistics The number of users of different registered mailboxes, sorted by default key value, stored in HDFS system
2.JOB2: The output of the JOB1 is sorted two times, sorted by value from large to small
Result output:
The number of users above 1W has a total of 24 mailboxes:
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.