The application of the classical algorithm of the seven----1 billion data to take the largest 100 data

Source: Internet
Author: User

Give three kinds of ideas, for reference only.
1. Idea one: According to the idea of fast sorting, after each partition only consider a larger part than the axis, know that the larger part of the axis than more than 100 when the traditional sorting algorithm to sort, take the first 100.

Step1: Recursion for all data is divided into [a, b], (b,d] two intervals, (B,d] The number of the interval is greater than the number of [A, b) within the interval

Step2: Repeat the Step1 operation on (b,d) until the rightmost interval is less than 100. Note [A, b] interval does not divide

Step3: Returns the previous interval and returns the number of digits for this interval. Then the method is still divided into the left of the previous interval, divided into [a2,b2], (B2,D2] Two intervals, take (B2,D2) interval. If the number is not enough, continue step3 operation, if the number of more than 100 to repeat the Step1 operation, until the last to the right only 100 numbers.

Complexity of O (1 billion *100)

2. Idea two: First take out the first 100 numbers, maintain a 100 number of the smallest heap, traverse the remaining elements, in the process of maintaining a small top heap can be.
The steps are as follows:

Step1: Take the first M element (e.g. m=100) and build a small top heap. Maintain a small top heap of the nature of the steps, run time for O (LGM); build a small top heap run time for Mo (LGM) =o (M LGM);

Step2: Reads successive elements sequentially until the end. Every time an element is read, if the element is smaller than the top of the heap, it is discarded directly, if it is greater than the top element of the heap, the element is replaced with the top element, and then the minimum heap property is maintained. The worst case is the need to replace the smallest element of the heap at each time, so the cost of maintaining the heap is (n-m) O (LGM), and the last element in the heap is the largest of the previous 100. The time complexity is O (N LGM).

The complexity is O (1 billion lg100).
* Note: It is recommended to use this algorithm.

3. Adopt the local elimination method.
The steps are as follows:

Step1: Selects the first 100 elements, sorts them, and notes the sequence L.

Step2: Then one scan of the remaining element x, compared to the smallest element in the ordered 100 elements, if it is larger than the smallest one, then the smallest element is removed and x is inserted into the sequence L using the idea of inserting sort. Loop in turn, knowing that all the elements have been scanned.

Complexity of O (1 billion *100)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.