Problem:
Select the maximum K number from the number of groups. Which algorithms are applicable when the number of groups is hundreds, millions, or tens of billions?
When the number is several hundred, use the sequential statistical method (see Chapter 9th of Introduction to algorithms ):
The algorithm is to recursively divide the input array. The data on one side is smaller than the selected number, and the data on the other side is greater than or equal to the selected number. However, unlike quick sorting, quick sorting will recursively process both sides of the Division, while sequential statistics will only process one side of the division. The expected time of the randomization algorithm is O (n ).
[Cpp]
# Include <iostream>
# Include <cstdlib>
Using namespace std;
# Define maxn103
Int A [MAXN];
Void select (int u, int v, int k)
{
Int s = rand () % (v-u + 1) + u;
Int a = A [s];
A [s] = A [u];
A [u] =;
Int I, j = u;
For (I = u; I <= v; I ++)
If (A [I]>)
{
Int tmp = A [++ j];
A [j] = A [I];
A [I] = tmp;
}
A [u] = A [j];
A [j] =;
If (j = k) return;
Else if (j <k)
Select (j + 1, v, k );
Else
Select (u, J-1, k );
}
Int main ()
{
Int n, k, I, j;
Cin> n> k;
For (I = 0; I <n; I ++)
Cin> A [I];
Select (0, n-1, k-1 );
For (I = 0; I <k; I ++)
Cout <A [I] <"";
Cout <endl;
}
When the number is several million, a large amount of data is not suitable for full loading into the memory, can tolerate multiple accesses, can use the binary mean method (usage is a bit strange, I do not like ):
Essentially, we use the binary idea to find the value of the K-th number. From [Min, Max], the algorithm gradually reduces the value range of K, and the time complexity is O (N * log (Max-Min )).
[Cpp]
# Include <iostream>
# Include <cstdlib>
Using namespace std;
Int binary (FILE * in, int v)
{
Rewind (in );
Int a, sum = 0;
While (fscanf (in, "% d", & )! = EOF)
{
If (a> = v) sum ++;
}
Return sum;
}
Void finded (FILE * in, int v)
{
Rewind (in );
Int;
While (fscanf (in, "% d", & )! = EOF)
{
If (a> = v)
Cout <a <"";
}
Cout <endl;
}
Int main ()
{
Int n, k;
Cin> n> k;
FILE * in = fopen ("dat.txt", "r ");
Int min, max;
Int;
Fscanf (in, "% d", & );
Min = max =;
While (fscanf (in, "% d", & )! = EOF)
{
If (a <min) min =;
If (a> max) max =;
}
While (max> min)
{
Int mid = (min + max)/2;
Int ns = binary (in, mid );
If (ns = k)
{
Finded (in, (min + max)/2 );
Break;
}
Else if (ns <k) max = mid;
Else min = mid;
}
}
When the number is several trillion, a large amount of data may not fit into the memory and cannot tolerate multiple accesses. All data can be accessed only once, the minimum heap method is recommended (this method is also recommended in the above case), but K is required to be smaller, otherwise the entire minimum heap cannot be stored in the memory.
The minimum heap with a capacity of K is used to store the maximum K number. The top element of the minimum heap is the smallest of the maximum K number. Each time a new element is considered, it is compared with the heap top element. Only when it is larger than the heap top element, it is used to replace the heap top element and update the minimum heap element. The time complexity is O (N * logK ).
[Cpp] www.2cto.com
# Include <iostream>
Using namespace std;
# Define maxn103
Int H [MAXN];
Void upshift (int s)
{
Int tmp = H [s];
While (s> 1 & H [s> 1]> tmp)
{
H [s] = H [s> 1];
S> = 1;
}
H [s] = tmp;
}
Void downshift (int n)
{
Int tmp = H [1];
Int I = 1, j = I <1;
While (j <= n)
{
If (j + 1 <= n & H [j + 1] <H [j]) j ++;
If (H [j] <tmp) H [I] = H [j];
Else break;
I = j;
J = j <1;
}
H [I] = tmp;
}
Int main ()
{
Int n, k, I,;
Cin> n> k;
For (I = 1; I <= k; I ++)
{
Cin> H [I];
Upshift (I );
}
For (; I <= n; I ++)
{
Cin>;
If (A> H [1])
{
H [1] =;
Downshift (k );
}
}
For (I = 1; I <= k; I ++)
Cout <H [I] <"";
Cout <endl;
}
Author: linyunzju