Search for repeated elements with occurrences greater than N/K-non-Multi-replica set algorithm

Source: Internet
Author: User

This article is a summary of an English paper: Finding Repeated Elements. For more information, see Google.

The simple form of this problem is "finding repeated elements that appear more than n/2 ". Let's start with a simple problem and then expand it.

1. Search for repeated elements that appear more than n/2 times

The "Beauty of programming" has the same question: "Search for a poster". The specific idea is to delete two different elements each time, and the remaining elements are the required elements. The conclusion is as follows:

Known: n, m is a positive integer, n represents the length of the array, m is the number of elements that appear more than n/2, that is, m> n/2.

The following conclusions are provided:

(1) we use v to indicate that the number of occurrences is greater than n/2. When two different elements are deleted and one element is v, m decreases by 1 and n decreases by 2.

M-1> (n-2)/2

Proof: S-1> n/2-1 = (n-2)/2

(2) When two different elements are deleted and one element is not v, n is required to be reduced by 2.

Proof: m> (n-2)/2. This conclusion is clear.

The Code is as follows:

int find(int array[], int n){    int candidate;    int count=0;    for(int i=0;i<n;++i)    {        if(count==0)        {             candidate=array[i];count=1;        }        else        {             if(candidate==array[i])                   ++count;              else                    --count;           }           }      return candidate;}

The exercises behind the beauty of programming are "Finding elements that appear more than n/4". The idea is to delete four different elements each time, and the remaining three are candidate elements, however, verify that the three elements meet the conditions. I will not explain it in detail. In fact, the method described in "the beauty of programming" is the "Multi-set" algorithm mentioned later in this article.

First, let's talk aboutNon-Multi-Union Algorithm. The variables used are as follows:

B []: array of Elements

I: array index

V: Save the value to be searched. That is to say, after traversing n elements, the number of occurrences of v is greater than n/2.

The maximum number of occurrences of c: v is twice, that is, the number of occurrences of v is less than or equal to c/2.

Initialization, I = 0, v = null, c = 0.

Traverse the array B []. When traversing to B [I], there are two situations:

(1) B [I] = v

Because B [I] = v, the number of occurrences of v increases by 1. To maintain the meaning of c, that is, to increase c/2 by 1, c needs to increase by 2.

(2) c = I

In this step, we want to prove that any element in B [0... I-1] does not appear more than I/2, then the v must be updated.

First, we will introduce the relationship between c and I in the previous traversal: c increases by 2 at a time, and I increases by 1 at a time. To enable c = I in this traversal, in the previous traversal, I must have increased by 1, while c has not. Therefore, the relationship between c and I is c = I + 1 before the previous traversal, and c is equal to I at the end of the previous traversal.

Next, let's look at the previous time: Let's make m represent the number of occurrences of v, so before the previous traversal, m> I/2 and m <=
C/2 = (I + 1)/2, there is a formula: I/2 <m <= (I + 1)/2. Note, in this case, if I is an odd number (c = I + 1 and c is an even number), m can only be equal to (I + 1)/2, that is, m = c/2.

However, the traversal faces c = I, in array B [0... in I-1], the number of occurrences of v is c/2, the number of occurrences of other elements except v is I-m = I-c/2 = I-I/2 = I/2, which indicates that B [0... i-1] if the number of occurrences of any element does not exceed I/2, then we must update v. The most possible element is B [I]. In fact, it is possible that B [I] is not.

Code:

   const int N=9;   int b[N]={2,3,3,2,2,2,1,1,2};   int v=-1, c=0, i=0;   while(i!=N)   {       if(v==b[i])       {           c+=2;                   }       else if(c==i)       {          v=b[i];          c+=2;               }       ++i;       }   cout<<v<<endl;

If no v meets the condition, the value of v is the last number.

2. Search for elements that appear more than n/k times

Given 2 <= k <= n and array B [0... n-1], we need to find the elements that appear more than n/k times.

In question 1, the array is actually divided into two sets. One set contains only one element v (or null), and the other set contains elements whose occurrences are less than or equal to n/2. Similarly, we extend the array into two sets. A set contains elements that may appear more than n/k times, A set contains elements with occurrences less than or equal to n/k.

The declaration of related variables is as follows:

V: the number of occurrences of an element that may be greater than n/k

C: v cannot appear more than c/2

Set t: contains value pairs of shapes such as (v, c.

S: If the number of occurrences of an element is less than or equal to n/k, we set the maximum number of occurrences to s/k.

The algorithm contains two phases: first, select the set t, and then verify that the elements in t meet the conditions. Here we will only describe the first stage. The complexity of the second stage is O (n * log (t )).

I = 0, s = 0, t = {}; for (; I <n; ++ I) {find the element v = B [I] in t (v_j, c_j), j is its subscript; if such an element does not exist, j = 0; if (j = 0 & s + k <= I + 1) {s = s + k;} else if (j = 0 & s + k> I + 1) {put (B [I], s + k) add to set t;} else // j! = 0 {c_j = c_j + k;} I = I + 1; delete all elements that meet c = I in t. If an element is deleted, set s = I ;}

The following are three if statements:

(1) j = 0 and s + k <= I + 1

If the element v = B [I] is not found, it is necessary to determine whether the current Element B [I] should be placed in the set t. If B [I] should not be placed in set t, it indicates the maximum number of times that B [I] appears after this iteration (s + k)/k <= (I + 1) k, there are conditions: s + k <= I + 1.

(2) j = 0 and s + k> I + 1

If B [I] should be placed in set t, it indicates the maximum number of times that B [I] appears after this iteration (s + k)/k> (I + 1) k, there are conditions: s + k> I + 1.

(3) j! = 0

If you find the element v = B [I], update the value of c_j. Because the number of occurrences of v_j increases by 1, c_j/k also increases by 1, that is, c_j = c_j + k.

After processing B [I], the c value of some elements in set t may have expired. What are the expiration conditions? Because of the following conditions: c/k> = (number of occurrences of v)> I/k, that is, c/k> I/k, which is reduced to c> I. I is incremental, and c may not increase, so when c = I, c will expire. In this case, we need to delete such elements from t. Why do we need to update the value of s? I haven't figured it out yet.

The complexity of the first phase of the algorithm is O (n * log (t )). How many elements does t have at most? | T | the maximum value is k * log (k ). Therefore, the overall complexity of the algorithm is O (n * k * log (k )).

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.