Q: There is a large and large input stream, large to no memory can be stored down, and only input once, how to get a random m record from the medium probability of this input stream.
A: To open a memory area to accommodate M records, for the nth record of the data stream, leave it in the probability of m/n (first M first in memory, starting from m+1), randomly replace one of M's existing records , This ensures that the probability of the final selection of each record is equal.
Interviewer perspective:
This topic needs to prove your answer in addition to the correct answer. This paper investigates the mastery of probabilistic stochastic problems and the application of inductive methods. Here is a simple proof:
There are already n records flowing in the data stream, in the memory of M Records, the hypothesis is equal probability obtained, each number hit probability is: m/n. For the n+1 record , the probability of m/(n+1) is selected,
1) If not selected, the in-memory m records are left, each number " original M " leaves its probability: m/n * (1-m/(n+1)) = m (n+1-m)/(n (n+1));
2) If selected, the new number of left to the probability of natural m/(n+1), and the original memory in the number of m left to m-1 number, each number " original M " left to the probability is: m/n* ((m-1)/(n+1)) = m (m-1)/(n (n+1)). In both cases, the sum of the probabilities is m (m-1) n (n+1) +m (n+1-m) n (n+1) =m/(n+1), which is the probability that the original selected number will continue to be selected. It is not difficult to conclude that the probability of each number being selected in memory is always m/n.
Turn http://www.ninechapter.com/%E4%B9%9D%E7%AB%A0%E7%AE%97%E6%B3%95%E9%9D%A2%E8%AF%95%E9%A2%9817-%E4%BB%8E%E8%BE% 93%e5%85%a5%e6%b5%81%e4%b8%ad%e9%9a%8f%e6%9c%ba%e5%8f%96%e8%ae%b0%e5%bd%95/