Introduction to Reservoir algorithm

Source: Internet
Author: User

N-length data stream, to get K data randomly, n is very large (probably greater than your memory and disk capacity) and unknown, can only traverse once, how to obtain a completely random k data.

The method is:

1. Define a K-length array to store the first K data

2, the flow of data flow, when the input data stream data quantity is I (k<i<n), take a number 1 to I, if the generated number is less than k, then the number corresponding to the numbers in the array and I in Exchange.

After completing these two steps, you can achieve the purpose of extracting K random numbers in a data stream of length n.


The next step is to show that for N data, the probability of each data being taken is k/n.


Prove:

The probability of the first I data being put into an array is k/i, as evidenced by the mathematical induction method for inputting I data (k<i<n).

1, when i=k+1, easy to get the number of the first I was put into the array of probabilities are k/(k+1);

2, assuming that when I, all data is put into the array of probabilities are k/i.

3, prove that when i+1, all the data is put into the array of probabilities are k/(i+1)

For the first i+1 data, it is obvious that the probability of it being put into an array is k/(i+1)

For any one of the previous I data, it is put into an array with the probability of k/i (by 2), and the probability that it will remain in the array after entering the i+1 data should be "the probability that it is entered into the array and not displaced by the i+1 data".

"The probability of being displaced by the i+1 data" is this probability ((k/(i+1)) * (1/k) =1/(i+1)

"The probability of not being replaced by the first i+1 data" is 1-1/(1+i) =i/(1+i)

"The probability that it is entered into an array and is not displaced by the i+1 data" is k/i* (i/(1+i)) =k/(1+i)

So for i+1 data streams, the probability of each data being entered into an array is k/(1+i)

Proven success

Introduction to Reservoir algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.