Random reading of the data, how to ensure that the true random is not possible, because the computer's random function is pseudo-random.
But how to ensure the random sampling of the data without considering the random function of the computer?
1. Shuffle functions provided by the system
The C++/java provides a shuffle function that can disrupt the data inside the container and keep it randomly sorted.
C++:
1 template <classclass urng>2 void Shuffle ( Randomaccessiterator First, Randomaccessiterator last, urng&& g);
Java:
1 Static void Shuffle (list<?> List); 2 Static void Shuffle (list<?> List, Random rnd);
These functions shuffle the number of data in a random order, and cannot handle a variable amount of data flow.
2. Take a number in the sequence stream, how to ensure randomness, that is, the probability of extracting a data is:1/(number of data read)
Assuming that the n number has been read, the number that is now reserved is ax, and the probability of taking it to Ax is (1/n).
For the number of n+1 an+1, take the probability of 1/(n+1) to an+1, otherwise still take ax. By analogy, the randomness of the data can be guaranteed.
The mathematical induction method proves as follows:
When N=1, obviously, take A1. The probability of taking A1 is 1/1.
Assume that when n=k, the data is taken to the ax. The probability of taking ax is 1/k.
When N=k+1, take an+1 with a probability of 1/(k+1), or still take ax.
(1) If ak+1 is taken, the probability is 1/(k+1);
(2) If Ax is still taken, the probability is (1/k) * (k/(k+1)) =1/(k+1)
So, for the next number of n+1 an+1, take the probability of 1/(n+1) an+1, otherwise still take ax. By analogy, the randomness of the data can be guaranteed.
The code is as follows:
1 //take a number in the sequence stream to ensure uniformity, that is, the probability of extracting the data is: 1/(number of data read)2 voidRandnum () {3 intres=0;4 intnum=0;5num=1;6Cin>>Res;7 8 inttmp;9 while(cin>>tmp) {Ten if(rand ()% (num+1)+1>num) Oneres=tmp; Anum++; - } -cout<<"res="<<res<<Endl; the}
3. The number of k in the sequence stream, how to ensure randomness, that is, the probability of fetching a certain data is:k/(number of read data)
Creates an array that stores the number of first k in the sequence stream in the array. (The so-called "cistern")
For the nth number of an, the probability of k/n takes an and randomly replaces an element in the "cistern" with the probability of 1/k; otherwise the "cistern" array does not change. By analogy, the randomness of the data can be guaranteed.
The mathematical induction method proves as follows:
When N=k is, it is clear that any number in the "cistern" is satisfied, and the probability of preserving the number is k/k.
Assuming that when n=m (m>k), any number in the "cistern" is satisfied, the probability of preserving the number is k/m.
When n=m+1, the probability of k/(m+1) to take an, and the probability of 1/k, randomly replace an element in the "cistern", otherwise the "cistern" array is unchanged. The probability of the number left in the array is:
Therefore, for the nth number of an, take an with the probability of k/n and randomly replace an element in the "cistern" with the probability of 1/k; otherwise the "cistern" array does not change. By analogy, the randomness of the data can be guaranteed.
The code is as follows:
1 //the number of n is taken in the sequence stream to ensure uniformity, that is, the probability of extracting the data is: n/(number of data read)2 voidRandknum (intN) {3 int*myarray=New int[n];4 for(intI=0; i<n;i++)5Cin>>Myarray[i];6 7 inttmp=0;8 intnum=N;9 while(cin>>tmp) {Ten if(rand ()% (num+1)+1<n) OneMyarray[rand ()%n]=tmp; A } - - for(intI=0; i<n;i++) thecout<<myarray[i]<<Endl; -}
Reservoir sampling reservoir sampling algorithm, classic sampling