Introduction to Reservoir algorithm

Last Update:2015-06-21 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

N-length data stream, to get K data randomly, n is very large (probably greater than your memory and disk capacity) and unknown, can only traverse once, how to obtain a completely random k data.

The method is:

1. Define a K-length array to store the first K data

2, the flow of data flow, when the input data stream data quantity is I (k<i<n), take a number 1 to I, if the generated number is less than k, then the number corresponding to the numbers in the array and I in Exchange.

After completing these two steps, you can achieve the purpose of extracting K random numbers in a data stream of length n.

The next step is to show that for N data, the probability of each data being taken is k/n.

Prove:

The probability of the first I data being put into an array is k/i, as evidenced by the mathematical induction method for inputting I data (k<i<n).

1, when i=k+1, easy to get the number of the first I was put into the array of probabilities are k/(k+1);

2, assuming that when I, all data is put into the array of probabilities are k/i.

3, prove that when i+1, all the data is put into the array of probabilities are k/(i+1)

For the first i+1 data, it is obvious that the probability of it being put into an array is k/(i+1)

For any one of the previous I data, it is put into an array with the probability of k/i (by 2), and the probability that it will remain in the array after entering the i+1 data should be "the probability that it is entered into the array and not displaced by the i+1 data".

"The probability of being displaced by the i+1 data" is this probability ((k/(i+1)) * (1/k) =1/(i+1)

"The probability of not being replaced by the first i+1 data" is 1-1/(1+i) =i/(1+i)

"The probability that it is entered into an array and is not displaced by the i+1 data" is k/i* (i/(1+i)) =k/(1+i)

So for i+1 data streams, the probability of each data being entered into an array is k/(1+i)

Proven success

Introduction to Reservoir algorithm

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Introduction to Reservoir algorithm

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support