Finding the K-large number in N random order numbers, the time complexity can be reduced to O (N*LOGN) o (n) O (1) O (2)
Answer: B
The so-called "first" K-number problem refers to the problem of finding the number of first (former) K from large to small order in the chaotic ordinal group of length n (n>=k).
Note: You only need to get the maximum number of K in the title, but not the number of subsequent n-k
Possible condition Limitations:
Requires minimal time and space consumption, large amounts of data, data to be sorted may be floating-point numbers, etc.
method One: Sort all elements, then remove the first k elements, do not promote the use of
idea: use the fastest sorting algorithm to select a quick or stacked row
Time complexity: O (N*LOGN) + O (K) = O (N*logn)
Features: all elements need to be sorted, K = 1 o'clock, time complexity is O (N*LOGN)
method Two: Simply order the first k elements, do not need to sort n-k elements, do not promote the use of
idea: use select sort or bubble sort, make K-th selection, can get the number of k large
Complexity of Time: O (n*k)
method Three: Do not sort the first K number + do not n-k the number of orders, you can use
idea: looking for the K-major element.
method: Use a similar quick sort, after performing a quick sort, select only part of each time to continue to perform a quick sort until the first K element is found, at which point the element after the array position is asked
Complexity of Time:
If the hub is randomly selected, the linear expected time O (N)
If you select the median median of the array as the hub, the worst-case time complexity O (N)
Using the idea of quick sorting, we randomly find an element x from the array s, dividing the array into two parts SA and SB. Elements in the SA are greater than or equal to X,SB elements less than x. In this case, there are two situations: 1. The number of elements in the SA is less than K, then the k-| in SB The sa| element is the largest number of K; 2. The number of elements in the SA is greater than or equal to K, and the number of K in the SA is returned. Using the partion thought of the Fast line T (n) = 2T (N/2) + O (1) time complexity is O (n) The method is available only if we can modify the input array, and the K number at the left of the array is the smallest K number (but this k number is not necessarily ordered), located in The numbers to the right of the number k are larger than the K numbers.
Here is the solution 3 #include <iostream> #include <stdio.h> using namespace std;
int Partition (int *l, int low, int.) {int temp = L[low]; int pt = L[low];
Sentinel while (low = high) {while (Low < high && L[high] >= pt) high--;
L[low] = L[high];
while (Low < high && L[low] <= pt) low++;
L[high] = L[low];
} L[low] = temp;
return low;
} void QSort (int *l, int low, int. high)//quick sort {int pl;
if (Low < high) {PL = Partition (l,low,high);
QSort (L, Low, pl-1);
QSort (L, pl+1, High);
}} void Findk (int k,int *l,int low,int high) {int temp;
Temp=partition (L,low,high);
if (temp==k-1) {cout<< "<<temp+1<<" large number is: "<<L[temp]<<endl;
} else if (temp>k-1) findk (k,l,low,temp-1);
else Findk (K,l,temp+1,high);
} int main () {int a[10]={15,25,9,48,36,100,58,99,126,5},i,j,k;
cout<< "before sorting:" <<endl;
for (i=0;i<10;i++) {cout<<a[i]<< ""; } cout<<endL
cout<< "Please enter the number you want to find for the K-large:" <<endl;
cin>>k; FINDK (k,a,0,9);
Finding the number of K-large does not require all sorts QSort (a,0,9);
cout<< "After sorting:" <<endl;
for (i=0;i<10;i++) {cout<<a[i]<< "";
} cout<<endl;
System ("Pause");
return 0; }
method Four, we find the algorithm of linear search, suitable for data with small data volume
idea 1: Looking for the K-large element + counting sort + Array implementation
how to do this: Use the Count sort, open an array, record the number of occurrences of each integer, and then take the largest K from the big to the small.
Disadvantages:
1, some number has not appeared, still want to reserve a space for it, space waste more serious
2. Cannot handle floating point number
Idea 2: Looking for the K-large element + counting sort + Map implementation
method: using the last map of the STL to save each element si occurrences, and then from the large to small scan to find the number of K
Time complexity O (n*logn) space complexity O (n)
Note:
1, can handle floating point number
2, can not use CMAP implementation, because CMAP can not be automatically sorted by key
3, map inside is by the red black tree realization, each insertion is LOGN, the total complexity is n*logn.
Here are two additional ideas that they don't count sort and class quick sort well here just to open the idea
method Five, the cardinal rank, does not advocate the use
idea: find the K-large element + cardinal sort once traversal, find the largest number for Vmax;, the smallest number is Vmin
To the interval [Vmin,vmax] into m block
The span between each cell is d= (vmax–vmin)/M
i.e. [Vmin,vmin+d], [vmin+d,vmin+ 2d],......
Scan all the elements, and count the number of elements in each cell, we can know which small section of the K-large element.
Then, to the community, continue to do the sub-block processing.
。。。。 Recursion down, always find an interval containing only the number of K
Time Complexity: O ((N +m) * log2 M (| V max-v min |/delta))
method Six, class two sub-search, do not promote the use of
idea: looking for the K-large element + class two to find the binary [Smin,smax] Find result x, the statistic x appears in the array, and the number of the whole array larger than X is the number of k-1 is the K-large number. The average time complexity is O (N*LOGN)
while (Vmax–vmin > Delta)
{
Vmid = Vmin + (vmax-vmin) * 0.5;
if (f (arr,n,vmid) >= K)
Vmin = Vmid;
else
Vmax = Vmid;
}
The pseudo code in F (ARR, n,vmid) returns the number of numbers in arr [0, ..., N-1] that are greater than or equal to Vmid.
Example
Results analysis: The results of the program run, get an interval (Vmin, Vmax), the interval contains only one element (or multiple equal elements) this element is the K-large element.
Note:
1. The value of delta is smaller than the minimum value of any two unequal element difference. If all elements are integers, the delta can take a value of 0.5.
2. The algorithm's time complexity O (N * LOG2 (| vmax-vmin| /delta))-I don't know how to count it.
method Seven, we want to traverse all the data as little as possible. Compared to a better algorithm, advocating the use of
idea: Maintaining a small Gan size k, the top element of the heap is the smallest of the largest k, that is, the K element
Process for each element in the array x, judging the size of the heap top
If x is smaller than the top of the heap, you do not need to change the original heap, because this element is smaller than the maximum number of K.
If x is larger than the top of the heap, replace the element y with the top of the heap with X. The time complexity of adjusting the heap is O (log2k).
Time complexity: O (N * log2 K), the algorithm only needs to scan all data once
Spatial complexity: An array of size k that only needs to store a heap with a capacity of K.
Note that in most cases, the heap can all be loaded into memory. If k is still large, we can try to find the largest K ' element first, then find the K ' + 1 to 2nd * k '
element, and so on (where the heap of capacity K ' can be fully loaded into memory). At this point, each to find the number of K ', and then iterate over the data
method Eight, you can directly to the original array to establish a large heap, take this priority queue before the K value. When the amount of data is small, you can consider
idea: In linear time, an unordered array can be built into a minimum heap, and then the number of first k in the heap is taken.
Time to build the heap is O (n), with each adjustment time O (log n)
Complexity O (n) +k*o (log n)
in the optimization, each adjustment does not need to adjust logn times, just adjust k times, this k and take the number of K is the same number
That is, after the heap is built, the first maximum value is taken directly. After taking the first maximum value, the Dagen has been destroyed, then the need to go down to the K-adjustment is good. After the 2nd maximum value is taken, the k-1 adjustment is followed, and so on. Note that each time the value is taken, the heap is not a large heap.
The original heap method, each adjustment l max is logn times, after adjustment is still a big root heap
The time complexity after optimization is O (n+k^2)
Evaluation: The time complexity of both methods is better than maintaining a small Gan size k, but the latter is a space complexity is still good, in memory just maintain a size of k heap, and the other two methods need to put the whole heap into memory, which is not very good for processing massive data efficiency , and the author July also in the program validation, in fact, these two algorithms in time difference is not very big.
extension Topics < Transfers to others, not yet summarized >
1. If you need to find the largest n number of different floating-point numbers of K. For example, the largest 3 different floating-point numbers in an array with 10 floating-point numbers (1.5,1.5,2.5,3.5,3.5,5,0,-1.5,3.5) are (5,3.5,2.5).
Solution: The above solution in addition to looking for the K large element + count sort + Array implementation are applicable
2. If it is to find a large number from K to M (0<k<=m<=n).
Answer: If the problem is regarded as a m-k+1, the previous solution is applicable. However, for similar problems such as the former K, it is better to use solution 5 or Solution 7, the overall complexity is low.
3. In search engines, each page on the network has "authoritative" weights, such as page rank. If we need to find the most weight of the K pages, and the weight of the page will be updated continuously, then how to change the algorithm to achieve rapid update (incremental update) and timely return to the most weighted K pages.
Tip: Heap sorting. Update the heap when each page weight is updated. Is there a better way to do it.
Answer: to achieve a rapid update, we can solve 8, (to the original array CV large heap), using the mapping of the binary heap, you can make the update operation to O (LOGN)
4. In practical applications, there is also an "accuracy" problem. We may not need to return the largest k elements in the strictest sense, and some errors are allowed at the boundary position. When the user enters a query, for each document D, it has a correlation weighting f (query, D) with this query. Search engines need to return to the user is the most relevant weight of the K page. If there are 10 pages per page, users will not care about the "accuracy" of the search results on page 1000th, and a slight error may be acceptable. For example, we can return to the 10th 001 page of relevance, not the NO. 9999 largest. In this case, how can the algorithm be improved to be faster and more efficient? The number of pages may be large enough for a machine to hold.
Hint: merge sort. If each machine returns the most relevant K documents, then the set of the most relevant K documents on all machines must contain the most relevant K documents in the Complete collection. Since the boundary situation does not need to be very precise, if each machine returns the best K ' documents, then K ' should be able to value to achieve the most relevant 90%*k documents we return are completely accurate, or ultimately return the most relevant K document accuracy more than 90% (most relevant k documents in 90% The above-mentioned correlations do rank in the top k), or the most relevant K-documents that ultimately return the worst relevance sort did not exceed 110%*k.
Answer: as the tip says, you can have each machine return the most relevant K ' documents, and then use the idea of merge sorting to get the most relevant k in all documents. The best case is that the K documents are evenly distributed across all machines, when each machine is just k ' = k/n (n is the total number of machines); Worst case, all the most relevant K documents appear only on one of the machines, and K ' needs to be approximately equal to K. I think it's a good idea to maintain a heap on each machine and then merge the top elements of the heap into a sort order.
5. As mentioned in 4th, for each document D, relative to the different keywords Q1, q2, ..., QM, respectively, have correlation weights f (d, Q1), F (d, Q2), ..., F (d, QM). If the user entered the keyword QI, we have obtained the most relevant k documents, and the known keyword QJ is similar to the keyword QI, the document with the weight of the two keywords closer, then the keyword QI of the most relevant k documents, to find QJ most relevant K documents have no help.
Answer: it must be helpful. When searching for the most relevant k documents in the keyword QJ, you can search for sections in the QJ related documents, and then in all documents in the Global search section.