[Programming Pearl] Chapter 1 Sampling Problems

Source: Internet
Author: User

I. Overview

Problem description: how to generate 0 ~ M random integers in n-1 (no duplicates)

Requirement: Output in order and ensure that each subset is selected equally.

1) provide the following code:

# Include "stdio. H "# include" stdlib. H "# include" time. H "Void getrandnumber (int m, int N) // select M random numbers in 0 -- n-1 {srand (Time (null); // This is a key int I, j; for (I = 0; I <n; ++ I) {If (RAND () % (n-I) <m) {printf ("% d ", i); m -- ;}}int main () {getrandnumber (5, 10); Return 0 ;}


The for loop ensures sequential output, and rand () % (n-I) ensures that the output probability meets the requirements.

Algorithm time complexity O (N)

2) unconventional approaches:

Write n numbers to equal pieces of paper and shake them evenly. Take out M pieces of paper and output M pieces of paper in order.


3) to solve the problem of algorithm time complexity, the following optimization schemes are proposed:

Given a set of S, insert an element each time. Before insertion, check whether the number in S reaches m and the random number is not in M.

# Include <iostream> # include <set> using namespace STD; void GetSet (int m, int N) // select M random numbers in 0 -- n-1 {srand (Time (null); // This key set <int> S; while (S. size () <m) // until S. insert (RAND () % N); set <int>: iterator I; for (I = S. begin (); I! = S. End (); ++ I) cout <* I <"" ;}int main () {GetSet (5, 10); Return 0 ;}


The C ++ template insert operation is completed within the O (logm) time, and the O (m) time is required to traverse the set. Therefore, the complete program requires O (mlogm) time.

4) another way to generate random numbers: disrupt the order of arrays containing 0-n-1, and then output the first M elements.

The better way is to disrupt the first M elements and then sort the output.

Or generate a random number greater than N in the range of 1-N, remove the duplicate, and output the previous m elements.

# Include <iostream> # include <algorithm> using namespace STD; void sort (int A [], int m) {for (INT I = 1; I <m; I ++) for (Int J = I; j> 0 & A [J-1]> A [J]; j --) Swap (A [J-1], A [J]);} void getshuf (int m, int N) // select M random numbers in 0 -- n-1 {srand (Time (null )); // This key is int I, j; int A [n]; for (INT I = 0; I <n; ++ I) A [I] = I; for (INT I = 0; I <m; ++ I) {swap (A [I], a [rand () % (n-I)]);} sort (a, m); for (I = 0; I <m; ++ I) cout <A [I] <";}int main () {getshuf (5, 10); Return 0 ;}


Ii. Exercise

1)

int bigrand(){      return RAND_MAX*rand() + rand();}int region(int l, int u)  //[l, u]{     ++u;      return l + rand() % (u - l);

2) What should I do if the probability of the selected M subset is equal?

Select a random number in the range of 1-N, M-1 is the selected subset (it is possible to reach the beginning, and then start from 0)


3)

When M <n/2,

A total of K attempts, then the number found in the previous k-1 is in the set, then only the K times are not in it, then the probability

P = (M/N) ^ (k-1) * (N-M)/n

The expectation is that the join K is 1 to infinity. According to the binary distribution, the expectation is equal

N/(n-m) <2

So we can see

4) refer to the 64-page introduction to the Chinese version of algorithm.

How many times does it take to collect n random coupons? Nlogn times


7) Output first and then recursive, instead of recursion first and then output

9) An algorithm is provided. In the worst case, only M random numbers are used. Instead of dropping the generated Random Number

# Include <iostream> # include <set> using namespace STD; void GetSet (int m, int N) // select M random numbers in 0 -- n-1 {srand (Time (null); // This key set <int> S; For (INT I = N-m; I <n; ++ I) {int T = rand () % (I + 1); If (S. find (t) = S. end () s. insert (t); else S. insert (I) ;}set <int >:: iterator J; For (j = S. begin (); J! = S. End (); ++ J) cout <* j <"" ;}int main () {GetSet (5, 10); Return 0 ;}

10) Question: How to randomly select an object from n objects, which are arranged in order, but you do not know the value of N before that?

Specifically, how can I read a text file without knowing the number of rows, select and output a row at random?

Answer: We always select the first line, and use the probability of 1/2 to select the second line, the probability of 1/3 to select the third line, and so on. At the end of the process, each row has the same selection probability (1/n, where N is the total number of rows of the file ):

I = 0 while more input lineswith probability 1.0/++ ichoice = this input line // if the previous selection is made, it will not break, but until the last one. Print choice

The first line is confusing: Why is the probability of selecting the first line always 1/n?

Probability = 1 * (1/2) * (2/3) * (3/4 )...... (N-1/n) = 1/n

Proof: When step I is selected (select row I), the probability of selecting this row is 1/I, then the probability of not selecting (I-1) /I for a document with N rows, it is necessary to prove that the probability of the final selection of line I is 1/N.

When row I is finally selected, the selection of the previous (I-1) step does not affect the final result, the probability of step I selection is 1/I, that is, select row I, (I + 1 ~ N) do not select the action in the step, that is, for any J (I + 1 <= j <= N), the probability of the current step is (J-1)/J, the final probability is: (1/I) * (I)/(I + 1 ))*... * (n-1)/n) = 1/n

Taking a document with only six lines as an example, the probability of selecting 2nd rows is: 1/2 * (2/3) * (3/4) * (4/5) * (5/6) = 1/6.


Extended: the original problem can be simplified to the following: how to extract one random number from N ordered objects with a medium probability: Sample (n, 1), where N is unknown;

If this problem is changed to: How to randomly extract m from N ordered objects with a medium probability, which is abbreviated as sample (n, m), where N is unknown;

Analysis: If n is known, sample (n, m) is a normal sampling problem. If n is unknown, can we solve the conversion problem based on the above algorithm?

Solution: Convert the sample (n, m) problem to the M sample (N *, 1) problem. More specifically, convert it to sample (n, 1 ); sample (n-1, 1); sample (n-2, 1 )....; sample (n-m + 1, 1) problem. Take a 6-line document as an example. Take any 2 rows as follows: First, select a row with the following Probability: 1 (1) 2 (1/2) 3 (1/3) 4 (1/4) 5 (1/5) 6 (1/6) Assume that 2nd rows are selected, and then the probability is modified as follows: 3 (1)
4 (1/2) 5 (1/3) 6 (1/4) 1 (1/5)


[Note]: the number of rows selected is 2nd. The probability of modifying the number of rows starts from 3rd, and the number of rows is excluded from the list to continue scanning, this ensures that one of the remaining five numbers is still extracted with an equal probability.

11) this question seems complicated, but it is actually very simple. You only need to pay attention to how to output 1, 2, 3. To win the game, you only need to output 1 or 2 first. There are two scenarios in the full arrangement of the three numbers. Therefore, the probability of winning is 2/6.

= 1/3

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.