Massive data processing: Hash map + hash_map statistics + heap/quick/merge sort

Source: Internet
Author: User
Massive log data, extract a day to visit Baidu the most times the IP. since it is massive data processing, it is conceivable that the data to us must be huge. How do we get started with this massive amount of data? Yes, it's just a divide-and-conquer/hash map + hash statistics + heap/fast/merge sort, plainly, is the first mapping, then statistics, the last sort:
  1. Divide-and- conquer/hash mapping : For data is too large, memory is limited, can only be: the large file into (modulo mapping) small files, that is, 16 words policy: Large and small, conquer, reduce the scale, one by one solve
  2. hash_map Statistics: when large files are converted to small files, we can use regular hash_map (ip,value) for frequency statistics.
  3. Heap/Quick sort: After the statistics are done, sort (heap sort) to get the most number of IPs.

Massive data processing: Hash map + hash_map statistics + heap/quick/merge sort

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.