[Link to this article]
Http://www.cnblogs.com/hellogiser/p/reservoir-sampling.html
The problem originated from Question 10 in programming Pearl Column 12, which is described as follows:
How cocould you select one of n objects at random, where you see the objects sequentially but you do not know the value of n beforehand? For concreteness, how wocould you read a text file, and select and print one random line, when you don't know the number of lines in advance?
(1) how to randomly extract a row from a file without knowing the total number of objects n?
Solution: select the first row, select the second row with the probability of 1/2, select the third row with the probability of 1/3, select the first row with the probability of 1/I, and so on. At the end of the process, the probability of each object being selected is 1/n.
P (I) indicates the probability that row I is selected when row I is in row I.
P (1) = 1
P (2) = 1/2
P (3) = 1/3
When the first row is selected, the selected probability for the second row is = the probability that the first row is selected * the second row is not selected * the probability that the second row is not selected * 3rd rows are not selected.
P (1) all = P (1) * (1-P (2) (1-P (3) = 1/3
P (2) all = P (2) * (1-P (3) = 1/3
P (3) all = P (3) = 1/3
Proof:
1. finally selected Probability: 1. Selected probability * 2. unselected probability * 3. unselected probability *... * N is not selected.
P (1) all = 1 * (1-1/2) (1-1/3 )*... * (1-1/n) = 1/n
Probability of m being selected: probability of m being selected * m + 1 unselected probability * m + 2 unselected probability *... * N: the probability that n is not selected (1 <= m <n)
P (m) all = 1/m * [1-1/(m + 1)] [1-1/(m + 2)] *… * [1-1/n] = 1/n
(2) How do I randomly obtain k numbers from an unknown or large sample space?
Give you a chain table with a length of N. N is large, but you don't know how big N is. Your task is to randomly retrieve k elements from the N elements. You can only traverse this linked list once. Your algorithm must ensure that there are exactly k elements, and they are completely random (with equal probability ).
Solution: first select the first k elements, from the k + 1 to the last element, with k/I (I = k + 1, k + 2 ,..., n) the probability of selecting the I-th element, and randomly replacing a previously selected element, so that k elements can be obtained at a traversal, and the selection can be completely random.
Proof:
N finally selected Probability: n selected probability * [(n + 1) unselected probability + (n + 1) probability of being selected * n probability of not being replaced]
P (n) all = k/n * [(1-k/(n + 1) + k/(n + 1) * (1-1/k)] = k/(n + 1)
[Reference]
Http://www.cnblogs.com/ttltry-air/archive/2012/08/10/2632215.html
[Link to this article]
Http://www.cnblogs.com/hellogiser/p/reservoir-sampling.html