Search for the K-th largest number in the series, and the K-th number in the Series

Source: Internet
Author: User

Search for the K-th largest number in the series, and the K-th number in the Series
All rights reserved. Do not repost them for commercial use without permission. Please specify the source.

We first saw this problem as an introduction in the data structure and algorithm analysis book written by Mark. At that time, it was just a glimpse. Recently, in practical application, I only need to review this question when I look for the k-largest number in some columns. Now, I have not solved the problem for the moment, but this thinking process is very interesting and I will give you a taste here.

The specific problem is a bit complicated. I will not go into details here. The formal description of this problem is as follows:

Specify a finite unordered series as {an}, and find the k-th largest number from this series.

Input: series {an}, k

Output: number k

First, the simplest method is to use the ideas of Bucket sorting and Hash hash. Suppose the maximum number of columns is max, and the minimum number is min, first, we create an array with the length of max-min + 1, and then use the hash function as an-min. The processing of conflicts is counted, finally, scan the entire newly created array from the back to the front to obtain the number k. This seems to be able to solve the problem within the complexity of O (n). It is a classic way to change the time of space, but the specific situation is, in fact, the time complexity of this algorithm is O (n * (max-min), so the time complexity of this algorithm depends entirely on the difference between the maximum and minimum number of arrays. However, in actual data, the number of columns is very scattered. If it is particularly scattered, it is entirely possible that max-min is much greater than n, so this effect is very poor, at the same time, there are restrictions on the application of this algorithm. First, it can only be used when all the numbers in the series are positive integers, and there is no way for the number of serial numbers I need to find in reality, or there is no effective method. Secondly, the algorithm is based on a hypothesis and has known the maximum and minimum values of the series (although such values can be calculated at the same time, and the time complexity is O (n )).

The above solution involves in the first chapter of the book "programming Pearl". This sort of Bucket sorting method is used to the extreme and uses a bit array, in addition, it is required that the number in the sorting array does not have duplicates, and then the maximum number of digits is required in total. Then, locate a certain number and set the corresponding position to 1, after scanning 1, we can get the sorting result. In the above problem, we can get the result by scanning k to 1. Such a data structure is provided in C ++ and Java, is a bitset class.

The first method I came up with was to create an array (temporarily called a k array) with a length of k Based on the obtained k, and then read the data in sequence. Each time I read the data, compare each element in the k array in sequence. If there is a number greater than the number in the k array, insert it to the corresponding position, and perform the backward operation on the k array from the current position, discard the last number, so that a k ordered series is maintained. When the scanning of the processed series is complete, the last data in the k array is the data we need, the idea is relatively simple and the code will soon be completed.

Int find_k (int p [], int n, int k) {// k array int kList [k]; // initialization of the k array for (int I = 0; I <k; I ++) kList [I] = 0; for (int I = 0; I <n; I ++) {// compare the numbers in the k array for (int j = 0; j <k; j ++) {if (p [I]> kList [j]) {// for (int l = k-1; l> j; l --) kList [l] = kList [l-1]; // replace kList [j] = p [I]; break;} // end if} // end for j} // end for ireturn kList [k-1];}



What is the time complexity of doing so? I think it is O (kn), but after careful analysis, the initialization process of k array is ignored first. The time complexity is o (n * k * L ), here, L is the average number of mobile operations per trip. The average value is k/2, so the time complexity of this algorithm is o (n * k2), which is very poor, because k is smaller than n, such an algorithm is "good.

After my above efforts, I decided to look for the experience of my predecessors. I found a similar problem in the beauty of programming and its discussion. In this book, the above problems have been strengthened, the problem is to find the largest number of k (here, we recommend you read section 2.5 "the beauty of programming-Microsoft technology interview experience ), the solution to this problem is discussed from the original method at the beginning. To obtain the maximum number of k, first sort the number and then take the first k, and then sort the number with fast sorting or heap. the time complexity is o (N * logN ), so we started to gradually optimize the time complexity. The reason for optimization is that to obtain the maximum number of k, we only need to sort the number of k in order and optimize the time complexity. We can only sort the number after k.

The first method given in this article is based on the optimization of the fast rank. The method is as follows: first, a number is randomly located in the series, and the series is divided into two parts: Sa and Sb, sa contains the number of values greater than or equal to the axis value, and Sb stores the number smaller than the axis value. After Division, there are two situations:

1. If the Sa element is smaller than or equal to k, the k-th maximum number is all in Sa and the k-| Sa | Number of Sb elements

2. If the Sa element is greater than or equal to k, you need to find the maximum k number in Sa.

It is pointed out that the average time complexity is O (N * logK), which means you are not very familiar with it, based on this, I wrote the second code to search for the largest k Number:

Int find_k (vector find, int k) {if (find. size () <k) {return 0;} int p = find. at (0); // Sa and Sb arrays vector findA and findB; // do not place the axis values in the arrays on both sides of the left and right for (int I = 1; I <find. size (); I ++) {if (find. at (I)> = p) findA. push_back (find. at (I); elsefindB. push_back (find. at (I);} // end forif (findA. size () = (k-1) {return p;} // end ifelse if (findA. size ()> (k-1) {return find_k2 (findA, k);} // end ifelse {return find_k2 (findB, k-findA. size () ;}// end find_k2



This Code uses the vector class, or the list class or array can be directly implemented, mainly to know the number of data on both sides of the left and right, of course, the above implementation may be different from the time complexity analyzed by the previous algorithm, because the time overhead of array expansion should be taken into account when Vector is used, which is to make the code easy to implement, of course, the length of Vector can be specified at the beginning to reduce the implementation cost. If you are interested, you can use arrays to implement the above functions.

I am prompted that the problem is in this place. Previously, my time was too complex because I kept k numbers in order during the update process. In fact, this is meaningless, after all, all I need is the k number, so I only need to know the smallest number in my k array, that is, the number of k which is always saved. It can be unordered,, be sure to know the minimum value, which has a minimum value, and the whole is not necessarily ordered. In this way, the data structure comes up with the minimum value heap. First, declare the heap interface as follows, some unused operations are not defined:

Class minHeap {private: int * Heap; // array int size used to store Heap elements; // array size int n; // Number of Heap elements void shiftdown (int ); // public: minHeap (int * Heap, int num, int max) bool isLeaf (int pos) constint leftchild (int pos) constint rightchild (int pos) constint parent (int pos) constbool insert (const int); bool removeMin (int &); int getMin () ;}; // end class minHeap




Because the minimum number of stored k is the maximum k number, the minimum heap is used. When the heap needs to be operated, it is checked that the number in the series is greater than the minimum value, the heap will be updated, and the current minimum value should be deleted and a new value should be added (that is, the number of the last vertex should be eliminated ). For heap operations, the complexity of both insert and delete operations is O (logK), and K is the heap size. Therefore, the time complexity of the algorithm is reduced to O (N * logK) in the near future, and the optimization process ends. Let's look at the section "the beauty of programming", which coincides with its ideas. The book points out that such a solution can also solve the problem where the series to be searched cannot be fully read into the memory. Think it deeper, code it simple.


Sina Blog address:

Http://blog.sina.com.cn/u/1822488043

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.