Reservoir Sampling algorithm

Source: Internet
Author: User

Problem definition:

give you a list of the length of N. N is big, but you don't know how big N is. Your task is to randomly remove k elements from these n elements. You can only traverse this list once. Your algorithm must ensure that the extracted elements happen to have K, and that they are completely random (with equal probability of occurrence).

Reservoir sampling algorithm:

    

The algorithm is based on the probability of extracting the number of distinct k from a sequence and ensuring that the probabilities extracted from each of the numbers are k/n. The practice is:-
First, a reservoir of k elements is constructed, and the first k elements of the sequence are placed in the reservoir.
Then, starting with the k+1 element, the probability of k/n determines whether the element is replaced in the pool. When all the elements have been traversed, you can get a randomly selected K element. The degree of complexity is O (n).

Its pseudo-code is as follows:

Init:a Reservoir with the Size:k

For i= k+1 to N

M=random (1, i);
if (M < k)
SWAP the Mth value and ith value
End for

the probability that each number is taken is k/n:

      1. For the number of I (i<k), the probability of being selected in the first k step is 1, starting from step k+1, I is not selected by the probability of k/k+1, then read the number of Nth, the number of I (i<k) is selected probability = The probability of being selected * The probability that each step will not be swapped, that
        1 * k/k+1 * k+1/k+2 ... n-1/n = k/n

      2. The probability of being selected for the number of J (j>=k) is: The probability of being selected at the time of his appearance * the probability of not being swapped out after his appearance, namely:
        k/j * J/j+1 ... n-1/n = k/n

      3. Comprehensive evidence.

Reservoir Sampling algorithm

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.