Reservoir sampling-Reservoir sampling algorithm

Source: Internet
Author: User

    • Reservoir sampling--"programming Zhu Ji Nanxiong" Reading notes
    • 382. Linked List Random Node
    • 398. Random Pick Index

question: How to randomly select an object from n objects, the N objects are ordered, but you don't know the value of n before.

idea: If we know the value of n, then the problem can be simply a large random number rand ()%n to get an exact random position, then the object of the position is the object to be asked, the probability of the selection is 1/n.

But now that we don't know the value of N, this problem is abstracted from the problem of reservoir sampling, that is, a random selection of K objects from a list of n objects, n is a very large or unknown value. Normally, n is a very large value, too large to put all the objects in the list s into memory at once. Our problem is a special case of the reservoir sampling problem, namely k=1.

Solution: We always select the first object, choose the second with a probability of 1/2, select the third with a probability of 1/3, and so on, select the M object with the probability of 1/m. When the process is finished, each object has the same selected probability, that is, 1/n, as shown below.

Proof: The probability that the first M object is finally selected p= the probability of choosing m * the probability that all objects behind it will not be selected, i.e.

The corresponding reservoir sampling problem can be solved with similar ideas. First to read the first K objects into the "reservoir", for the first K+1 object, the probability of k/(k+1) to select the object, the probability of k/(k+2) Select the K+2 object, and so on, the probability of k/m to select the M-Object (m>k). If M is selected, an object in the reservoir is randomly replaced. Finally, the probability of each object being selected is k/n, as shown below.

Proof: The probability of the selected M object = The probability of choosing M * (the probability that the element is not selected after that) and the probability that the element is chosen thereafter * does not replace the probability of the first M object), i.e.

Reservoir sampling-Reservoir sampling algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.