- Reservoir sampling--"programming Zhu Ji Nanxiong" Reading notes
- 382. Linked List Random Node
- 398. Random Pick Index
question: How to randomly select an object from n objects, the N objects are ordered, but you don't know the value of n before.
idea: If we know the value of n, then the problem can be simply a large random number rand ()%n to get an exact random position, then the object of the position is the object to be asked, the probability of the selection is 1/n.
But now that we don't know the value of N, this problem is abstracted from the problem of reservoir sampling, that is, a random selection of K objects from a list of n objects, n is a very large or unknown value. Normally, n is a very large value, too large to put all the objects in the list s into memory at once. Our problem is a special case of the reservoir sampling problem, namely k=1.
Solution: We always select the first object, choose the second with a probability of 1/2, select the third with a probability of 1/3, and so on, select the M object with the probability of 1/m. When the process is finished, each object has the same selected probability, that is, 1/n, as shown below.
Proof: The probability that the first M object is finally selected p= the probability of choosing m * the probability that all objects behind it will not be selected, i.e.
The corresponding reservoir sampling problem can be solved with similar ideas. First to read the first K objects into the "reservoir", for the first K+1 object, the probability of k/(k+1) to select the object, the probability of k/(k+2) Select the K+2 object, and so on, the probability of k/m to select the M-Object (m>k). If M is selected, an object in the reservoir is randomly replaced. Finally, the probability of each object being selected is k/n, as shown below.
Proof: The probability of the selected M object = The probability of choosing M * (the probability that the element is not selected after that) and the probability that the element is chosen thereafter * does not replace the probability of the first M object), i.e.
Reservoir sampling-Reservoir sampling algorithm