This article is a summary of an English paper: Finding Repeated Elements. For more information, see Google.
The simple form of this problem is "finding repeated elements that appear more than n/2 ". Let's start with a simple problem and then expand it.
1. Search for repeated elements that appear more than n/2 times
The "Beauty of programming" has the same question: "Search for a poster". The specific idea is to delete two different elements each time, and the remaining elements are the required elements. The conclusion is as follows:
Known: n, m is a positive integer, n represents the length of the array, m is the number of elements that appear more than n/2, that is, m> n/2.
The following conclusions are provided:
(1) we use v to indicate that the number of occurrences is greater than n/2. When two different elements are deleted and one element is v, m decreases by 1 and n decreases by 2.
M-1> (n-2)/2
Proof: S-1> n/2-1 = (n-2)/2
(2) When two different elements are deleted and one element is not v, n is required to be reduced by 2.
Proof: m> (n-2)/2. This conclusion is clear.
The Code is as follows:
int find(int array[], int n){ int candidate; int count=0; for(int i=0;i<n;++i) { if(count==0) { candidate=array[i];count=1; } else { if(candidate==array[i]) ++count; else --count; } } return candidate;}
The exercises behind the beauty of programming are "Finding elements that appear more than n/4". The idea is to delete four different elements each time, and the remaining three are candidate elements, however, verify that the three elements meet the conditions. I will not explain it in detail. In fact, the method described in "the beauty of programming" is the "Multi-set" algorithm mentioned later in this article.
First, let's talk aboutNon-Multi-Union Algorithm. The variables used are as follows:
B []: array of Elements
I: array index
V: Save the value to be searched. That is to say, after traversing n elements, the number of occurrences of v is greater than n/2.
The maximum number of occurrences of c: v is twice, that is, the number of occurrences of v is less than or equal to c/2.
Initialization, I = 0, v = null, c = 0.
Traverse the array B []. When traversing to B [I], there are two situations:
(1) B [I] = v
Because B [I] = v, the number of occurrences of v increases by 1. To maintain the meaning of c, that is, to increase c/2 by 1, c needs to increase by 2.
(2) c = I
In this step, we want to prove that any element in B [0... I-1] does not appear more than I/2, then the v must be updated.
First, we will introduce the relationship between c and I in the previous traversal: c increases by 2 at a time, and I increases by 1 at a time. To enable c = I in this traversal, in the previous traversal, I must have increased by 1, while c has not. Therefore, the relationship between c and I is c = I + 1 before the previous traversal, and c is equal to I at the end of the previous traversal.
Next, let's look at the previous time: Let's make m represent the number of occurrences of v, so before the previous traversal, m> I/2 and m <=
C/2 = (I + 1)/2, there is a formula: I/2 <m <= (I + 1)/2. Note, in this case, if I is an odd number (c = I + 1 and c is an even number), m can only be equal to (I + 1)/2, that is, m = c/2.
However, the traversal faces c = I, in array B [0... in I-1], the number of occurrences of v is c/2, the number of occurrences of other elements except v is I-m = I-c/2 = I-I/2 = I/2, which indicates that B [0... i-1] if the number of occurrences of any element does not exceed I/2, then we must update v. The most possible element is B [I]. In fact, it is possible that B [I] is not.
Code:
const int N=9; int b[N]={2,3,3,2,2,2,1,1,2}; int v=-1, c=0, i=0; while(i!=N) { if(v==b[i]) { c+=2; } else if(c==i) { v=b[i]; c+=2; } ++i; } cout<<v<<endl;
If no v meets the condition, the value of v is the last number.
2. Search for elements that appear more than n/k times
Given 2 <= k <= n and array B [0... n-1], we need to find the elements that appear more than n/k times.
In question 1, the array is actually divided into two sets. One set contains only one element v (or null), and the other set contains elements whose occurrences are less than or equal to n/2. Similarly, we extend the array into two sets. A set contains elements that may appear more than n/k times, A set contains elements with occurrences less than or equal to n/k.
The declaration of related variables is as follows:
V: the number of occurrences of an element that may be greater than n/k
C: v cannot appear more than c/2
Set t: contains value pairs of shapes such as (v, c.
S: If the number of occurrences of an element is less than or equal to n/k, we set the maximum number of occurrences to s/k.
The algorithm contains two phases: first, select the set t, and then verify that the elements in t meet the conditions. Here we will only describe the first stage. The complexity of the second stage is O (n * log (t )).
I = 0, s = 0, t = {}; for (; I <n; ++ I) {find the element v = B [I] in t (v_j, c_j), j is its subscript; if such an element does not exist, j = 0; if (j = 0 & s + k <= I + 1) {s = s + k;} else if (j = 0 & s + k> I + 1) {put (B [I], s + k) add to set t;} else // j! = 0 {c_j = c_j + k;} I = I + 1; delete all elements that meet c = I in t. If an element is deleted, set s = I ;}
The following are three if statements:
(1) j = 0 and s + k <= I + 1
If the element v = B [I] is not found, it is necessary to determine whether the current Element B [I] should be placed in the set t. If B [I] should not be placed in set t, it indicates the maximum number of times that B [I] appears after this iteration (s + k)/k <= (I + 1) k, there are conditions: s + k <= I + 1.
(2) j = 0 and s + k> I + 1
If B [I] should be placed in set t, it indicates the maximum number of times that B [I] appears after this iteration (s + k)/k> (I + 1) k, there are conditions: s + k> I + 1.
(3) j! = 0
If you find the element v = B [I], update the value of c_j. Because the number of occurrences of v_j increases by 1, c_j/k also increases by 1, that is, c_j = c_j + k.
After processing B [I], the c value of some elements in set t may have expired. What are the expiration conditions? Because of the following conditions: c/k> = (number of occurrences of v)> I/k, that is, c/k> I/k, which is reduced to c> I. I is incremental, and c may not increase, so when c = I, c will expire. In this case, we need to delete such elements from t. Why do we need to update the value of s? I haven't figured it out yet.
The complexity of the first phase of the algorithm is O (n * log (t )). How many elements does t have at most? | T | the maximum value is k * log (k ). Therefore, the overall complexity of the algorithm is O (n * k * log (k )).