[Introduction to algorithms-012] n random equi probability samples m

Source: Internet
Author: User
Introduction to algorithm law p129 after-school question 5.3-7

Suppose we want to create a random sample of the set {1, 2, 3 ,..., N}, thatis, an M-element subset S, where0 ≤ m ≤ n, such that each M-subset is equally likely to be created. one waywocould be to set a [I] = I for I = 1, 2, 3 ,..., N, call randomize-in-place (a), and then take just the first marray elements. this method wocould make n callto the random procedure. if n is much larger than m, we can create a random samplewith fewer callto random. show that the following recursive procedurereturns a random M-subset S of {1, 2 ,..., N}, in which eachm-subset is equally likely, while making only M callto random:

RANDOM-SAMPLE(m,n)if m == 0    return ?else    S = RANDOM-SAMPLE(m-1, n-1)    i = RANDOM(1,n)    if i ∈ S        S = S ∪ {n}    else        S = S ∪ {i}    return S

The translation is: n random probability samples M.

Proof Method 1: http://clrs.skanev.com/05/03/07.html

Proof Method 2: http://www.cnblogs.com/Jiajun/archive/2013/05/15/3080111.html

Two solutions are provided in the question.

Solution 1: Call randomize-in-place ()
/*** Creation Time: 9:46:51, January 1, August 13, 2014 * Project name: test * @ author Cao yanfeng * @ since JDK 1.6.0 _ 21 * class description: */public class randomsampletest {/*** @ Param ARGs */public static void main (string [] ARGs) {// todo auto-generated method stubint [] array = {1, 2, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}; int [] result = randomsample (array, 5); For (int I: result) {system. out. println (I) ;}} public static int [] randomsample (INT [] array, in T m) {randominplace (array); int [] result = new int [m]; for (INT I = 0; I <m; I ++) {result [I] = array [I];} return result;}/* pseudo code on the p126 page of Introduction to algorithms */public static void randominplace (INT [] array) {int n = array. length; For (INT I = 0; I <n; I ++) {int Index = random (I, n-1); If (array [I]! = Array [Index]) {array [I] ^ = array [Index]; array [Index] ^ = array [I]; array [I] ^ = array [Index] ;}} public static int random (int A, int B) {return new random (). nextint (B-A + 1) + ;}}

Solution 2: Implement pseudocode in the question

/**

* Creation Time: August 13, 2014 9:46:51

* Project name: Test

*@ AuthorCao yanfeng

*@ SinceJDK 1.6.0 _ 21

* Class description:

*/

Public classRandomsampletest {

 

/**

*@ ParamARGs

*/

Public static voidMain (string [] ARGs ){

//TodoAuto-generated method stub

Int[] Array = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 };

Explain list <integer> result =Randomsample(Array, 5 );

For(Integerinteger: result ){

System.Out. Println (integer );

}

}

Public staticInto list <integer> randomsample (Int[] Array,IntM ){

ReturnSample(Array, array. length, M );

}

Public staticUsing list <integer> sample (Int[] Array,IntN,IntM ){

If(M = 0 ){

Return newExplain list <integer> ();

}Else{

Explain list <integer> S =Sample(Array, N-1 m-1 );

IntI = array [Random(0, n-1)];

If(S. Contains (I )){

S. Add (array [n-1]);

}Else{

S. Add (I );

}

ReturnS;

}

}

/* Return the random number of [a, B] in the closed interval */

Public staticintRandom (IntA,IntB ){

Return newRandom (). nextint (B-A + 1) +;

}

}

Solution 3: Weight Assignment

The first random sampling method provided in randomized algorithms in section p1225.3 of Introduction to algorithms is the weighted method. However, the weight may be the same. This method is not recommended.

Solution 4: reservoir sampling

See the final expansion question.

**************************************** ***************************************

As mentioned in the question, if n data samples are selected, and N is much larger than m, solution 2 should be used to call only m times.Random ()Function. If n is not much different from M, use solution 1 to call n times.Random ()Function, but the method is simple.

**************************************** ***************************************

Expansion problems:Google interview questions: a data stream that contains endless search keywords (for example, keywords that people continuously enter during Google search ). How can we randomly select 1000 keywords from this endless stream?

Reference: http://blog.csdn.net/minglingji/article/details/7984445

This is also the question of "n random probability sampling m", but n is unknown. The method used is the reservoir sampling. That is, put the first 1000 data streams into an array with a length of 1000.Random(0,999), [1000/1001] the probability that each number in the closed interval is selected is. Then for each number of N> 1000, the probability of each number selected in the closed interval of [0,999] is 1000/n. Here, random is called N-M. The process is simulated below.

/*** Creation Time: 9:46:51, January 1, August 13, 2014 * Project name: test * @ author Cao yanfeng * @ since JDK 1.6.0 _ 21 * class description: */public class randomsampletest {/*** @ Param ARGs */public static void main (string [] ARGs) {// todo auto-generated method stubint [] array = {1, 2, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}; int [] result = reservoirsample (array, 5); For (int I: result) {system. out. println (I) ;}/ * reservoir sampling */public static int [] reservoirsample (INT [] array, int m) {int [] reservoir = new int [m]; for (INT I = 0; I <array. length; I ++) {if (I <m) {reservoir [I] = array [I];} else {int temp = random (0, I ); if (temp <m) {reservoir [temp] = array [I] ;}}return reservoir ;}}






Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.