Introduction to algorithm law p129 after-school question 5.3-7
Suppose we want to create a random sample of the set {1, 2, 3 ,..., N}, thatis, an M-element subset S, where0 ≤ m ≤ n, such that each M-subset is equally likely to be created. one waywocould be to set a [I] = I for I = 1, 2, 3 ,..., N, call randomize-in-place (a), and then take just the first marray elements. this method wocould make n callto the random procedure. if n is much larger than m, we can create a random samplewith fewer callto random. show that the following recursive procedurereturns a random M-subset S of {1, 2 ,..., N}, in which eachm-subset is equally likely, while making only M callto random:
RANDOM-SAMPLE(m,n)if m == 0 return ?else S = RANDOM-SAMPLE(m-1, n-1) i = RANDOM(1,n) if i ∈ S S = S ∪ {n} else S = S ∪ {i} return S
The translation is: n random probability samples M.
Proof Method 1: http://clrs.skanev.com/05/03/07.html
Proof Method 2: http://www.cnblogs.com/Jiajun/archive/2013/05/15/3080111.html
Two solutions are provided in the question.
Solution 1: Call randomize-in-place ()
/*** Creation Time: 9:46:51, January 1, August 13, 2014 * Project name: test * @ author Cao yanfeng * @ since JDK 1.6.0 _ 21 * class description: */public class randomsampletest {/*** @ Param ARGs */public static void main (string [] ARGs) {// todo auto-generated method stubint [] array = {1, 2, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}; int [] result = randomsample (array, 5); For (int I: result) {system. out. println (I) ;}} public static int [] randomsample (INT [] array, in T m) {randominplace (array); int [] result = new int [m]; for (INT I = 0; I <m; I ++) {result [I] = array [I];} return result;}/* pseudo code on the p126 page of Introduction to algorithms */public static void randominplace (INT [] array) {int n = array. length; For (INT I = 0; I <n; I ++) {int Index = random (I, n-1); If (array [I]! = Array [Index]) {array [I] ^ = array [Index]; array [Index] ^ = array [I]; array [I] ^ = array [Index] ;}} public static int random (int A, int B) {return new random (). nextint (B-A + 1) + ;}}
Solution 2: Implement pseudocode in the question
/**
* Creation Time: August 13, 2014 9:46:51
* Project name: Test
*@ AuthorCao yanfeng
*@ SinceJDK 1.6.0 _ 21
* Class description:
*/
Public classRandomsampletest {
/**
*@ ParamARGs
*/
Public static voidMain (string [] ARGs ){
//TodoAuto-generated method stub
Int[] Array = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 };
Explain list <integer> result =Randomsample(Array, 5 );
For(Integerinteger: result ){
System.Out. Println (integer );
}
}
Public staticInto list <integer> randomsample (Int[] Array,IntM ){
ReturnSample(Array, array. length, M );
}
Public staticUsing list <integer> sample (Int[] Array,IntN,IntM ){
If(M = 0 ){
Return newExplain list <integer> ();
}Else{
Explain list <integer> S =Sample(Array, N-1 m-1 );
IntI = array [Random(0, n-1)];
If(S. Contains (I )){
S. Add (array [n-1]);
}Else{
S. Add (I );
}
ReturnS;
}
}
/* Return the random number of [a, B] in the closed interval */
Public staticintRandom (IntA,IntB ){
Return newRandom (). nextint (B-A + 1) +;
}
}
Solution 3: Weight Assignment
The first random sampling method provided in randomized algorithms in section p1225.3 of Introduction to algorithms is the weighted method. However, the weight may be the same. This method is not recommended.
Solution 4: reservoir sampling
See the final expansion question.
**************************************** ***************************************
As mentioned in the question, if n data samples are selected, and N is much larger than m, solution 2 should be used to call only m times.Random ()Function. If n is not much different from M, use solution 1 to call n times.Random ()Function, but the method is simple.
**************************************** ***************************************
Expansion problems:Google interview questions: a data stream that contains endless search keywords (for example, keywords that people continuously enter during Google search ). How can we randomly select 1000 keywords from this endless stream?
Reference: http://blog.csdn.net/minglingji/article/details/7984445
This is also the question of "n random probability sampling m", but n is unknown. The method used is the reservoir sampling. That is, put the first 1000 data streams into an array with a length of 1000.Random(0,999), [1000/1001] the probability that each number in the closed interval is selected is. Then for each number of N> 1000, the probability of each number selected in the closed interval of [0,999] is 1000/n. Here, random is called N-M. The process is simulated below.
/*** Creation Time: 9:46:51, January 1, August 13, 2014 * Project name: test * @ author Cao yanfeng * @ since JDK 1.6.0 _ 21 * class description: */public class randomsampletest {/*** @ Param ARGs */public static void main (string [] ARGs) {// todo auto-generated method stubint [] array = {1, 2, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}; int [] result = reservoirsample (array, 5); For (int I: result) {system. out. println (I) ;}/ * reservoir sampling */public static int [] reservoirsample (INT [] array, int m) {int [] reservoir = new int [m]; for (INT I = 0; I <array. length; I ++) {if (I <m) {reservoir [I] = array [I];} else {int temp = random (0, I ); if (temp <m) {reservoir [temp] = array [I] ;}}return reservoir ;}}