Equal probability random sampling problem

Source: Internet
Author: User

1. Input contains two integers m and N, where m<n. The M random number in the output [0,n-1] requires that the probability of each number selection appear equal (that is, m/n), and the output is ordered sequentially.

consider the integer 0,1,2,..., n-1 in turn, and select each integer with an appropriate random test. By sequentially accessing integers, the output is guaranteed to be orderly. If m=2 and n=5, then the probability of each number chosen should be 2/5.

Analysis process: In the 0,1,2,3,4 five numbers

First encounter 0 o'clock, its choice probability should be 2/5, if selected, we start to test the second number 1, this time because 1 selected, so 1 the number of the selected probability becomes smaller, become 1/4, some people say this seems wrong, because the topic said that each number of the probability of the selection is the same big, and now? A 2/5, a 1/4, how can this be done? In fact, this is not the case, seriously think about it, the number 1 selected probability equals what?

Number 1 selected probability p (1) = number 0 selected probability * (1/4) + array 0 unchecked probability * (2/4) This is calculated under (2/5 * 1/4) + (3/5 * 2/4) = 8/20 = 2/5

select = m, remaining = n

For i in [0,n]:

if (rand ()% remaining) < select:

Print I

Select--

remianing--

The rules for code adherence should be to choose s from the remainder of R , and we select the next number with probability s/r. This probability is chosen in the same way as we have shown above. So at the end of the program will be printed out the m number, and each number of the chosen probability is the same, for m/n.

There are other solutions to this topic, which define the problem as a reservoir sample. First put the first k number into the reservoir, to the k+1, we k/(k+1) The probability of deciding whether to swap it into the reservoir, in exchange we can randomly pick a replacement position, so that until the sample space n traversal, the final reservoir is the result of sampling. The probability of such a method getting the result of each number being chosen is also k/n.

2. Problem extension: How to randomly select one of N objects (which can be seen in this N object, but not the value of N in advance)? For example, you do not know how many lines in a text, in which case you are asked to randomly select a row in the file, and require each line of the file to be selected in the same probability. In the case of knowing N, the total number of objects, who knows the probability is 1/n. But we don't know now.

Consider whether this is possible, always with the probability of 1/i to choose each of the objects traversed, such as from the three-way,...., N, each time traversing to X, always with the probability of 1/x to select it.

Always select the first row, and select the second row with probability 1/2, select the third row with 1/3, that is, set the result to results, traverse the first time result = 1, and the second one with the probability of 1/2 for result = 2, so that the probability of the substitution continues to traverse, The final result is your results. The probability of being chosen is 1/n.

X number of selected probabilities = x selected probability * (x+1 not selected probability) * (x+2 not selected probability) *......* (n not selected probability)

Probability of being chosen = 2/3 * 3/4 * 4/5 ... * (n-1/n) I think you know the answer? Right! It's 1/n. This allows you to select any object without knowing the size of N.

The reference pseudocode is as follows:

i = 0

While more input lines:

with Prob 1.0/++i

Choice = This.line

Print Choice

3. Rand (), known as the probability of p to produce 0, to 1-p the probability of generating 1, now requires the design of a new random function newrand (), so that it 1/n the probability of generating any number between 1~n.
It is possible to generate a new random function of 0 and 1 by a known random function rand (), and then the rand (), and then call K (the number of digits of the binary representation of the integer n) of the rand () function, to obtain a 01 sequence of length k, in which the integer formed by this sequence is the number between 1~n.
1): Computes the binary representation of an integer n the number of bits owned by K,k = 1 +log2n (log with 2 as base n)

2): Call the K-time rand () to generate a random number. Note: integers obtained from the resulting sequence are likely to be greater than N, and if they are greater than N, the resulting integer is not greater than n.

4. Given a function rand5 (), the function can randomly generate 1-5 integers with the same probability. It is now required to use the function constructor Rand7 (), making it possible to generate 1-7 integers with random probabilities.
Many people's first reaction is to use Rand5 () + rand ()%3 to implement the Rand7 () function, which can actually produce a random number between 1-7, but the probability of a number generation is not equal. RAND ()%3 generates a 0 probability of 1/5, while the probability of generating 1 and 2 is 2/5, so the probability of this method producing 6 and 7 is greater than the probability of generating 5.
The correct approach is to use the Rand5 () function to generate a number between 1-25 and then map 1-21 of them to 1-7, discarding 22-25. For example, (1,3), which is considered as 1 in Rand7 (), discards regeneration if there are 4 remaining.

Based on, Rand () produces [0,n-1], and Rand () is treated as a one-digit generator of n, then you can use Rand () *n+rand () to produce 2-bit n-digits, and so on, can produce 3-bit, 4-bit, 5-bit ... N-binary number. This random number generated in the form of N-decimal is guaranteed to be random, and conversely, using rand () to generate random numbers (such as RAND5 () + rand ()%3) is not guaranteed to be equal.

Topic 3: Given a function rand () can produce equal probability random numbers between 0 and n-1, asking how to generate random numbers of equal probabilities between 0 and m-1.

int random (int m, int n)
{
int k = rand ();
int max = n-1;
while (K < m)
{
K = k*n + rand (); Generate N Binary number
max = Max*n + n-1; Generate n-binary can represent a number that is exactly greater than or equal to M
}
Return k/(max/n); At this point K is less than m
}

5. How do I generate random numbers with the following probabilities? 0 out of 1 times, 1 appears 2 times, 2 appears 3 times, N-1 appears n times?

int random (int size)
{
while (true)
{
int m = rand (size);
int n = rand (size);
if (m + N < size)
return m+n;
}
}

This method is used to guarantee:

1 only 1

2 can have 1 1/0 2/2 0

3 can have 1 2/2 1/0 3/3 0

Equal probability random sampling problem

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.