The beauty of programming 2.5 -- find the maximum K number

Source: Internet
Author: User
Tags rewind

Problem:
Select the maximum K number from the number of groups. Which algorithms are applicable when the number of groups is hundreds, millions, or tens of billions?
 
When the number is several hundred, use the sequential statistical method (see Chapter 9th of Introduction to algorithms ):
The algorithm is to recursively divide the input array. The data on one side is smaller than the selected number, and the data on the other side is greater than or equal to the selected number. However, unlike quick sorting, quick sorting will recursively process both sides of the Division, while sequential statistics will only process one side of the division. The expected time of the randomization algorithm is O (n ).
[Cpp]
# Include <iostream>
# Include <cstdlib>
Using namespace std;
 
# Define maxn103
Int A [MAXN];
 
Void select (int u, int v, int k)
{
Int s = rand () % (v-u + 1) + u;
Int a = A [s];
A [s] = A [u];
A [u] =;
Int I, j = u;
For (I = u; I <= v; I ++)
If (A [I]>)
{
Int tmp = A [++ j];
A [j] = A [I];
A [I] = tmp;
}
A [u] = A [j];
A [j] =;
If (j = k) return;
Else if (j <k)
Select (j + 1, v, k );
Else
Select (u, J-1, k );
}
 
Int main ()
{
Int n, k, I, j;
Cin> n> k;
For (I = 0; I <n; I ++)
Cin> A [I];
Select (0, n-1, k-1 );
For (I = 0; I <k; I ++)
Cout <A [I] <"";
Cout <endl;
}
When the number is several million, a large amount of data is not suitable for full loading into the memory, can tolerate multiple accesses, can use the binary mean method (usage is a bit strange, I do not like ):
Essentially, we use the binary idea to find the value of the K-th number. From [Min, Max], the algorithm gradually reduces the value range of K, and the time complexity is O (N * log (Max-Min )).
[Cpp]
# Include <iostream>
# Include <cstdlib>
Using namespace std;
 
 
Int binary (FILE * in, int v)
{
Rewind (in );
Int a, sum = 0;
While (fscanf (in, "% d", & )! = EOF)
{
If (a> = v) sum ++;
}
Return sum;
}
 
Void finded (FILE * in, int v)
{
Rewind (in );
Int;
While (fscanf (in, "% d", & )! = EOF)
{
If (a> = v)
Cout <a <"";
}
Cout <endl;
}
 
Int main ()
{
Int n, k;
Cin> n> k;
FILE * in = fopen ("dat.txt", "r ");
Int min, max;
Int;
Fscanf (in, "% d", & );
Min = max =;
While (fscanf (in, "% d", & )! = EOF)
{
If (a <min) min =;
If (a> max) max =;
}
While (max> min)
{
Int mid = (min + max)/2;
Int ns = binary (in, mid );
If (ns = k)
{
Finded (in, (min + max)/2 );
Break;
}
Else if (ns <k) max = mid;
Else min = mid;
}
}
 
When the number is several trillion, a large amount of data may not fit into the memory and cannot tolerate multiple accesses. All data can be accessed only once, the minimum heap method is recommended (this method is also recommended in the above case), but K is required to be smaller, otherwise the entire minimum heap cannot be stored in the memory.
The minimum heap with a capacity of K is used to store the maximum K number. The top element of the minimum heap is the smallest of the maximum K number. Each time a new element is considered, it is compared with the heap top element. Only when it is larger than the heap top element, it is used to replace the heap top element and update the minimum heap element. The time complexity is O (N * logK ).
[Cpp] www.2cto.com
# Include <iostream>
 
Using namespace std;
 
# Define maxn103
 
Int H [MAXN];
 
Void upshift (int s)
{
Int tmp = H [s];
While (s> 1 & H [s> 1]> tmp)
{
H [s] = H [s> 1];
S> = 1;
}
H [s] = tmp;
}
 
Void downshift (int n)
{
Int tmp = H [1];
Int I = 1, j = I <1;
While (j <= n)
{
If (j + 1 <= n & H [j + 1] <H [j]) j ++;
If (H [j] <tmp) H [I] = H [j];
Else break;
I = j;
J = j <1;
}
H [I] = tmp;
}
 
Int main ()
{
Int n, k, I,;
Cin> n> k;
For (I = 1; I <= k; I ++)
{
Cin> H [I];
Upshift (I );
}
For (; I <= n; I ++)
{
Cin>;
If (A> H [1])
{
H [1] =;
Downshift (k );
}
}
For (I = 1; I <= k; I ++)
Cout <H [I] <"";
Cout <endl;
}
Author: linyunzju

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.