How to Select k elements from n elements with equal probability? This problem is a reservoir sampling. The algorithm can be described as follows:
Init : A reservoir with the size: K
For I = k + 1ToN
M = random (1, I );
If (M <K)
SwapThe
MTh ValueAndITh Value
End
Someone has provided proof on the Internet and first forwarded it:
[Switch]
Proof:
Each time it is selected based on the probability of K/I
For example, if K is 1000, the probability of 1001 being selected is 1001, and the probability of 1000/1001 being selected is 1002, which is consistent with our intuition.
The following proof:
Assume that the current element is I + 1. According to our rules, the probability of I + 1 being selected is k/I + 1, that is, the probability that the element I + 1 appears in the reservoir is k/I + 1.
Consider the first I element. If the probability that the first I element appears in the reservoir is k/I + 1, it indicates that our algorithm is correct.
This problem can be proved by induction: k <I <= N
1. when I = k + 1, the capacity of the reservoir is K. The probability of K + 1 elements being selected is k/(k + 1 ), at this time, the probability of the first k elements appearing in the reservoir is k/(k + 1). The conclusion is obvious.
2. Assume that when J = I, the conclusion is true. At this time, the probability of K/I is used to select the I element, and the probability of the first I-1 element appearing in the reservoir is k/I.
Verify that when J = I + 1:
That is to say, when K/I + 1 probability is used to select the I + 1 element, at this time, the probability of any first I element appearing in the reservoir is k/(I + 1 ).
The probability that the first I element appears in the reservoir is composed of two parts. ① The first I + 1 option shows that the element is in the reservoir, ② ensure that the I + 1 option is not replaced
①. 2 knows that before I + 1 is selected, the probability that any of the first I elements appear in the reservoir is k/I
②. Consider the probability of replacement:
First, you must replace the I + 1 element with the selected element (otherwise you do not need to replace it). The probability is k/I + 1, the second reason is that any one of the k elements in the randomly replaced pool is 1/K.
Probability of replacement of any of the first I elements = K/(I + 1) * 1/k = 1/I + 1
The probability of not being replaced is:1-1/(I + 1) = I/I + 1
① ②, Using multiplication rules
The probability that the first I element appears in the reservoir is k/I * I/(I + 1) = K/I + 1.
The proof is true
For the sampling problem, I recently saw some methods to summarize:
Problem: m elements must be extracted from 1, 2, 3. N in an equal probability manner.
1. Use the above reservoir for sampling
Void sample_pool (const int N, const int m) {int I, RD; int * x = new int [N]; for (I = 0; I <N; I ++) x [I] = I + 1; for (I = m; I <n; I ++) {RD = rand () % I; if (RD <m) Swap (X [I], X [RD]) ;}for (I = 0; I <m; I ++) cout <X [I] <"; Delete [] X; X = NULL;} // both space and time are O (N)
2, select m from N, You can first determine a, and then from the N-1 Under the selected m out.
void sample_rand(const int N,const int m){ int select = m,i,rd; int remain = N; for(i = 0; i < N ; i++) { rd = rand()%remain; if(rd < select) { cout<< i<<" "; select--; } remaining--; }}
The above method is very classic and was proposed by knuth in the art of computer programming. The extra space used is O (1) and the time is O (n ). The proof of its probability is also very simple. Simply push to discoverable, It is equal probability to select each element. In addition, in the end, only M elements will be selected. If no selection is made before, the remaining = select option will be selected.
3. When we regard sampling as a set, we need to select m different elements from N and store them in the set. The set can be used to complete the process.
Use the set in STL to complete this function.
void sample_set(const int N,const int m){set<int>s;while(s.size()<m){s.insert(rand()%n);}for(set<int>::iterator it = s.begin();it!=s.end();it++)cout<<*it<<" ";}
4. disrupt an incremental sequence.
For I = [0, n)
Swap (X [I], X [rand (I, n-1)];
Someone has proved that it is enough to disturb the first M.
void sample_shuf(const int N,const int m){int i, j;int *x = new int[N];for(i = 0 ; i <N; i++) x[i]=i+1;for(i = 0 ; i < m ; i ++){j = rand(i,n-1);swap(x[i],x[j]);}sort(x,x+m);Print(x,m);delete []x;x= NULL;}
Several questions about sampling:
1. Given a random number generator which can generate the number in rang () uniformly, how can u use it to build a random number generator which can generate number in range) uniformly?
Answer: using the reject sampling theorem
First, use the random generator between () twice and use it in a 5-in-5 format to form a () random generator: ([Gen] [Gen]) 5, each [] is a 5-digit in decimal format: x = Gen * 5 + gen. The value range is 6-30, A simple left movement can be converted to a value in the range of 1 to 25. Then, the () value is evenly allocated to 7, and 21 is a multiple of 7, therefore, you can perform a ing for each of the three (of course, you can also cut off the numbers after 7, but the range is too small and the efficiency is not high ), 1-3 -- "-6 --"-21 -- "7. This is equivalent probability. If a number between 22-25 is generated, two methods can be used to determine the result:
(1) reject sampling and re-calculate
(2) If a number between 22-25 is obtained, the result of this generator is directly used. Someone has proved that this method is of equal probability. The Metropolis algorithm.
In hexadecimal notation:--> Corresponding decimal:Minus 5 Translation
11 12 13 14 1567891012345
21 22 23 24 251112131415678910
31 32 33 34 3516171819201112131415
41 42 43 44 4521222324251617181920
51 52 53 54 5526272829302122232425
2. Generate a random permutation for a deck of cards
Answer:
From the back to the front, in step K, a 1-k Number J is randomly generated, and then the numbers at J and K are exchanged, it is easy to final that the arrangement is an equal probability arrangement.
For k = N: 1
J = rand (1, K)
Swap (j, k)
End
You can also perform this process from the past to the next, but the generated range is between K-n.
For k = 1: N
J = rand (k, n)
Swap (j, k)
End