Summary of the Python version solution for the maximum K number problem, and the python Solution
The TopK problem is to find the maximum number of K. This problem is very common, for example, to find the top 10 keywords from 10 million search records.
Method 1:
Sort first, and then extract the first k number.
Time Complexity: O (n * logn) + O (k) = O (n * logn ).
This method is simple and crude.
Method 2: Max heap
We can create a data container with a size of K to store the minimum K Number, traverse the entire array, and compare each number with the maximum number in the container. If this number is greater than the maximum value in the container, otherwise, replace the maximum value in the container with this number. This method is also very simple to understand. As for the container selection, many people first respond to the largest heap, but how to implement the largest heap in python? We can use the heapq library that implements the minimum heapq, because in an array, each number is reversed, the maximum number is changed to the minimum number, and the order of the entire number is changed, so we can reverse each number in the array, and then use the smallest heap to return the result. The Code is as follows:
import heapqdef get_least_numbers_big_data(self, alist, k): max_heap = [] length = len(alist) if not alist or k <= 0 or k > length: return k = k - 1 for ele in alist: ele = -ele if len(max_heap) <= k: heapq.heappush(max_heap, ele) else: heapq.heappushpop(max_heap, ele) return map(lambda x:-x, max_heap)if __name__ == "__main__": l = [1, 9, 2, 4, 7, 6, 3] min_k = get_least_numbers_big_data(l, 3)
Method 3: quick select
The quick select algorithm is similar to the quick select algorithm. The difference is that quick select only needs to go in one direction.
Time Complexity: O (n ).
def qselect(A,k): if len(A)<k:return A pivot = A[-1] right = [pivot] + [x for x in A[:-1] if x>=pivot] rlen = len(right) if rlen==k: return right if rlen>k: return qselect(right, k) else: left = [x for x in A[:-1] if x<pivot] return qselect(left, k-rlen) + right for i in range(1, 10): print qselect([11,8,4,1,5,2,7,9], i)