The beauty of programming 2.5 -- find the maximum K number

Last Update:2013-12-08 Source: Internet

Author: User

Tags rewind

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Problem:
Select the maximum K number from the number of groups. Which algorithms are applicable when the number of groups is hundreds, millions, or tens of billions?

When the number is several hundred, use the sequential statistical method (see Chapter 9th of Introduction to algorithms ):
The algorithm is to recursively divide the input array. The data on one side is smaller than the selected number, and the data on the other side is greater than or equal to the selected number. However, unlike quick sorting, quick sorting will recursively process both sides of the Division, while sequential statistics will only process one side of the division. The expected time of the randomization algorithm is O (n ).
[Cpp]
# Include <iostream>
# Include <cstdlib>
Using namespace std;

# Define maxn103
Int A [MAXN];

Void select (int u, int v, int k)
{
Int s = rand () % (v-u + 1) + u;
Int a = A [s];
A [s] = A [u];
A [u] =;
Int I, j = u;
For (I = u; I <= v; I ++)
If (A [I]>)
{
Int tmp = A [++ j];
A [j] = A [I];
A [I] = tmp;
}
A [u] = A [j];
A [j] =;
If (j = k) return;
Else if (j <k)
Select (j + 1, v, k );
Else
Select (u, J-1, k );
}

Int main ()
{
Int n, k, I, j;
Cin> n> k;
For (I = 0; I <n; I ++)
Cin> A [I];
Select (0, n-1, k-1 );
For (I = 0; I <k; I ++)
Cout <A [I] <"";
Cout <endl;
}
When the number is several million, a large amount of data is not suitable for full loading into the memory, can tolerate multiple accesses, can use the binary mean method (usage is a bit strange, I do not like ):
Essentially, we use the binary idea to find the value of the K-th number. From [Min, Max], the algorithm gradually reduces the value range of K, and the time complexity is O (N * log (Max-Min )).
[Cpp]
# Include <iostream>
# Include <cstdlib>
Using namespace std;

Int binary (FILE * in, int v)
{
Rewind (in );
Int a, sum = 0;
While (fscanf (in, "% d", & )! = EOF)
{
If (a> = v) sum ++;
}
Return sum;
}

Void finded (FILE * in, int v)
{
Rewind (in );
Int;
While (fscanf (in, "% d", & )! = EOF)
{
If (a> = v)
Cout <a <"";
}
Cout <endl;
}

Int main ()
{
Int n, k;
Cin> n> k;
FILE * in = fopen ("dat.txt", "r ");
Int min, max;
Int;
Fscanf (in, "% d", & );
Min = max =;
While (fscanf (in, "% d", & )! = EOF)
{
If (a <min) min =;
If (a> max) max =;
}
While (max> min)
{
Int mid = (min + max)/2;
Int ns = binary (in, mid );
If (ns = k)
{
Finded (in, (min + max)/2 );
Break;
}
Else if (ns <k) max = mid;
Else min = mid;
}
}

When the number is several trillion, a large amount of data may not fit into the memory and cannot tolerate multiple accesses. All data can be accessed only once, the minimum heap method is recommended (this method is also recommended in the above case), but K is required to be smaller, otherwise the entire minimum heap cannot be stored in the memory.
The minimum heap with a capacity of K is used to store the maximum K number. The top element of the minimum heap is the smallest of the maximum K number. Each time a new element is considered, it is compared with the heap top element. Only when it is larger than the heap top element, it is used to replace the heap top element and update the minimum heap element. The time complexity is O (N * logK ).
[Cpp] www.2cto.com
# Include <iostream>

Using namespace std;

# Define maxn103

Int H [MAXN];

Void upshift (int s)
{
Int tmp = H [s];
While (s> 1 & H [s> 1]> tmp)
{
H [s] = H [s> 1];
S> = 1;
}
H [s] = tmp;
}

Void downshift (int n)
{
Int tmp = H [1];
Int I = 1, j = I <1;
While (j <= n)
{
If (j + 1 <= n & H [j + 1] <H [j]) j ++;
If (H [j] <tmp) H [I] = H [j];
Else break;
I = j;
J = j <1;
}
H [I] = tmp;
}

Int main ()
{
Int n, k, I,;
Cin> n> k;
For (I = 1; I <= k; I ++)
{
Cin> H [I];
Upshift (I );
}
For (; I <= n; I ++)
{
Cin>;
If (A> H [1])
{
H [1] =;
Downshift (k );
}
}
For (I = 1; I <= k; I ++)
Cout <H [I] <"";
Cout <endl;
}
Author: linyunzju

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More