68. Reservoir Sampling (Reservoir Sampling)

Source: Internet
Author: User

[Link to this article]

Http://www.cnblogs.com/hellogiser/p/reservoir-sampling.html

The problem originated from Question 10 in programming Pearl Column 12, which is described as follows:

How cocould you select one of n objects at random, where you see the objects sequentially but you do not know the value of n beforehand? For concreteness, how wocould you read a text file, and select and print one random line, when you don't know the number of lines in advance?

(1) how to randomly extract a row from a file without knowing the total number of objects n?

Solution: select the first row, select the second row with the probability of 1/2, select the third row with the probability of 1/3, select the first row with the probability of 1/I, and so on. At the end of the process, the probability of each object being selected is 1/n.

P (I) indicates the probability that row I is selected when row I is in row I.

P (1) = 1

P (2) = 1/2

P (3) = 1/3

When the first row is selected, the selected probability for the second row is = the probability that the first row is selected * the second row is not selected * the probability that the second row is not selected * 3rd rows are not selected.

P (1) all = P (1) * (1-P (2) (1-P (3) = 1/3

P (2) all = P (2) * (1-P (3) = 1/3

P (3) all = P (3) = 1/3

Proof:

1. finally selected Probability: 1. Selected probability * 2. unselected probability * 3. unselected probability *... * N is not selected.

P (1) all = 1 * (1-1/2) (1-1/3 )*... * (1-1/n) = 1/n

Probability of m being selected: probability of m being selected * m + 1 unselected probability * m + 2 unselected probability *... * N: the probability that n is not selected (1 <= m <n)

P (m) all = 1/m * [1-1/(m + 1)] [1-1/(m + 2)] *… * [1-1/n] = 1/n

(2) How do I randomly obtain k numbers from an unknown or large sample space?

Give you a chain table with a length of N. N is large, but you don't know how big N is. Your task is to randomly retrieve k elements from the N elements. You can only traverse this linked list once. Your algorithm must ensure that there are exactly k elements, and they are completely random (with equal probability ).

Solution: first select the first k elements, from the k + 1 to the last element, with k/I (I = k + 1, k + 2 ,..., n) the probability of selecting the I-th element, and randomly replacing a previously selected element, so that k elements can be obtained at a traversal, and the selection can be completely random.

Proof:

N finally selected Probability: n selected probability * [(n + 1) unselected probability + (n + 1) probability of being selected * n probability of not being replaced]

P (n) all = k/n * [(1-k/(n + 1) + k/(n + 1) * (1-1/k)] = k/(n + 1)

[Reference]

Http://www.cnblogs.com/ttltry-air/archive/2012/08/10/2632215.html

[Link to this article]

Http://www.cnblogs.com/hellogiser/p/reservoir-sampling.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.