Problem:There is a chain table with an unknown length. K nodes must be randomly selected as samples (that is, if n nodes exist, the probability of each node being taken is k/n, N is unknown ). The linked list can be read only once in sequence. (Ensure the number of linked list nodes> K)
Ideas:First, retrieve the first K nodes. If you do not reach one node, you will randomly exchange them with the retrieved K nodes with a certain probability.
Specifically, the linked list is A, which is scanned from a [1] to a [K], and is taken as the first K nodes and recorded as B.
Continue to scan down. When scanning to a [I] (I> K), take it out with the probability of K/I, and exchange with a node in B (B has K nodes, and the probability of each node being swapped out is equal, that is, 1/K ); if a [I] is not retrieved (I-k)/I, continue to scan.
After scanning a, the obtained B is the request.
Proof:For any I (I> K), we prove that the probability of its final appearance in B is k/n (n is the length of a and can be any value ).
When a [I] is scanned, the probability of his removal is k/I, so that he is in B temporarily, however, a [I] may be replaced during subsequent scanning. When a [I + 1] is scanned, the probability that a [I] is still in B is (K/I) * [(1-k/(I + 1 )) + (K/(I + 1) * (k-1)/K)], where (1-k/(I + 1 )) is the probability that a [I + 1] is not retrieved at all, (K/(I + 1) * (k-1)/K) the probability that a [I + 1] is obtained but a [I] is not replaced. The formula above is k/(I + 1), and so on. After the last one is a [n], the probability that a [I] is still in B is k/n.
Pass.