Use heap to find the first K maximum values and discuss program optimization (below)

Source: Internet
Author: User

After establishing a correct regression test, proceed. First, using performance tools for analysis, we found a little tragedy: efficiency has regressed. After unnecessary system calls are removed, the profile analysis result is as follows:

VII. Minor Improvements

It is time-consuming to generate 0.1 billion random numbers. It can be seen that rand () is time-consuming, but creatlistinternal is time-consuming. It can be inferred that it takes a lot of time to create a model. This eliminates the number of Modulo operations. Use (1 + rand () * (1 + rand () to generate a random number (1-65535) * (1-32768 ), you can randomly generate any number between 1-65535*32768. Of course, this is just a simple algorithm with repeated elements. You can also enable the compiler optimization option.

8. Focus on hotspot areas and reduce the number of comparisons

Focus on the hotspot area again without optimization. It can be found that almost all fastfindkthmax time is spent on fastmaxheapify. You only need to improve the number of comparisons of fastmaxheapify. In most cases where the node has left and right child nodes, the original implementation always requires two comparisons with the heapsize. But in fact, you only need to make one comparison and make some changes to the corresponding code, you can get a certain speed. The Code is as follows:

if (rch <= heapsize) {           if ((*(list+lch)) > temp) {                curr_largest = lch;           }             if ((*(list+rch)) > (*(list+curr_largest))) {               curr_largest = rch;           }       }       else {           if (lch <=heapsize && (*(list+lch)) > temp) {                curr_largest = lch;           }       }

IX. Impact of high-speed cache

In (part I), a blogger warned that high-speed cache also has an important impact. Thank you for your reminder! In view of your lack of knowledge in this area, leave it blank for the moment.

10. Return to the algorithm and compare ideas

To speed up, you still need to find better algorithm improvements. Is there any better algorithm? The algorithm in this article is a bit clumsy. First assign n numbers, then create the maximum heap for the n numbers, and then find the K largest number in turn. There are two other ideas:

1. Minimum heap. First, select k Number in N number to create the minimum heap of k elements. Next, for I = k + 1 to n: If I is smaller than the root element of the smallest heap, ignore it directly. If I is greater than the root element of the smallest heap, replace the root element of the heap and reconstruct the minimum heap. The correctness is as follows:. In the initial state, all elements in the heap are larger than those in the empty element; B. After each minimum heap reconstruction, the elements in the heap are always larger than all the elements replaced by C. When the loop ends, the elements in the heap are larger than those in the heap. Its efficiency is O (K + nlogk );

2. Divide and conquer. Divide and conquer is always an effective strategy. Divide N into B heap, N/B for each heap. Find the maximum number of B * O (N/B + klog (N/B) in descending order for each heap. Finally, locate the first K max number (O (B + (K-1) logb) in the sorted K max number (BK) of the B heap )). This algorithm is more effective for multi-processor and parallel execution machines, and its time is O (N/B + klog (N/B) + B + (K-1) logb) + C (n), C is the communication time. For processing large data volumes, parallel algorithms are a field worth studying.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.