Reservoir sampling-Reservoir sampling

Source: Internet
Author: User

The problem originates from topic 10 in programming Zhu Ji Nanxiong column 12, which is described as follows:

How could your select one of N objects at random, where you see the objects sequentially and do not know the value of n Beforehand? For concreteness, what would you read a text file, and select and print one random line, when you don ' t know the number of Lines in advance?

The problem definition can be simplified as follows: How do I randomly extract a row from a file without knowing the total number of rows?

The first thing that comes to mind is that we have done a similar problem? Of course, we can easily get a row number randomly by using the RAND function of the C run-up library when we know the number of file rows, but the current situation is not knowing the number of rows, so how do we ask? We need a concept to help us make guesses, so that the probability of taking out each row is equal, or random. This concept is the reservoir sampling (reservoir sampling).

With this concept, we have a solution: if you want to select the number of lines, define the row number to be fetched as choice, the first time directly with the first row as the Fetch row choice, and then the second time with a one-second probability to decide whether to replace the choice with the second line, The third time, with a probability of one-third, determines whether to replace choice with the third line ... and so on, the pseudo-code can be described as follows:

i = 0

While more input lines

with probability 1.0/++i

Choice = This input line

Print Choice

The trick of this approach is that the successful construction of a method makes it possible to finally prove that the probability of extracting each row is 1/n (where n is the number of file rows currently scanned), in other words , the probability of each row being taken is equal, and the random selection is completed .

If a row is to be selected, you need to select it and not be replaced .

The following is a selection of only one row, to ensure that each selection of a row is equal to the probability of occurrence. The proof is as follows:

Looking back at this question, we can extend it, that is, how to randomly fetch the number of k from an unknown or a large sample space?

Analogy can get the answer, that is, first put the first k number into the reservoir, to the k+1, we take the probability of k/(k+1) to decide whether to swap it into the reservoir, in exchange for random selection of a replacement, so always do, for any sample space n, the selection probability of each number is k/n. In other words, the probability of selecting each number is equal.

Pseudo code:

Init:a Reservoir with the Size:k

For i= k+1 to N

M=random (1, i);

if (M < k)

SWAP the Mth value and ith value

End for

The proof is as follows:

  

The reservoir sampling problem is a kind of question, summarizes here, and heartily sighs this method ingenious, but to this kind of thought produces the source still to find not enough, if can know why and how to think this solution, certainly will be more meaningful.

Reservoir sampling-Reservoir sampling

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.