Study Notes for Chapter 2 (sampling problem) of programming Pearl River

Source: Internet
Author: User
Study Notes for Chapter 2 (sampling problem) of programming Pearl River

-- 2013.01.17 (by: neicole)

Zero. Outline

1. Problem Description

2. Problem Improvement

3. solution 1-use Probability Calculation

4. solution 2-random insertion

5. solution 3-Internal out-of-order Extraction

6. My benefits

 

I. Problem description

1. Name: random sampling task

2. Input: List of constituency names, integer m

3. Output: List of randomly selected M constituency names

PS: 1. There are several hundred constituency names; 2. the constituency name is a string of no more than 12 characters;
3. m is usually 20 ~ 40

 

II. Problem Improvement

PS: the input data used in the final result of the program is only partial (sample). If you put all the input data into the memory and then calculate the result, it is likely to waste a lot of time space.

Question modification:

Input: M and N, so that 0 <m <n (m, n are integers)

Output: an ordered list of M random integers. (Random integers are not allowed to be repeated)

 

3. Solution 1-use Probability Calculation

1. pseudocode

Set: bigrand () returns a function greater than N.

Randint (I, j) can return a random integer uniformly selected in the range of I... J)

Code:

select = mremaining = nfor i = [0, n)  if (bigrand() % remaining) < select      print i      select—  remaining--

 

2. My incorrect thinking-Incorrect answers to this question

I have been thinking about this pseudocode for a long time. The first reaction is that in the pseudo code, n is the numerical value and M (although it seems to be the number in the question, it seems to the Program) it is also the value size, while bigrand () is also the value size, their size is the inclusion relationship, if you understand it as follows, such:

Add this algorithm. If (bigrand () % remaining) <select indicates that the size of the selected number is smaller than the value of select? If so, there is a big problem: [m,
N.

So where is the problem?

 

3. Commentary-Positive Solution

Yes, the SELECT statement is becoming smaller. However, what we cannot ignore is that the result output is not the SELECT statement but the variable I. the root problem is what the if statement really judges. When observing code, we can observe several of their variables at the same time to see if they are related. Set m = 2,
N = 5. Assume that we enter the loop statement in the pseudo-code. The variable changes as follows:

The core part of this algorithm is the if statement in the for statement. It can be seen that the I value is added to 1 every cycle, the Select value depends on the situation. If the bigrand () % remaining value meets the if condition, the Select value will be added with 1, otherwise it will not change. Does it happen that the remaing value decreases as I increases? What is the relationship between the three values? Let's look at it again:

Suppose we want to put the result into the M1 and M2 circles. In the initial state, M1 and M2 are empty and remaining is 5, in the future, the I values may still be 0, 1, 2, 3, and 4. In the 1st round, the loop ends, and four possible output values are left, this is also the same as the remaing value of 4. At this time, the output I values may still be 1, 2, 3, and 4, while in the 1st loop, if the IF (bigrand ()
% Remaining) <select condition, M2 or M1 will load the value 0 (that is, the output I); in this way, it enters the cycle of a round, you can know that, remaining is the number of remaining unextracted numbers. In each round of loop, I will increase by 1. It can be understood that when we select a random number, we will calculate the value from small to large to meet the conditions, looking back at the original question, we need to find the random numbers that are "ordered" and "not repeated". Here, the "random" can be filtered from small to large: whether or not we enter the if condition is a random result, and this random condition is associated with a "probability". We will consider this question from the perspective of probability,Each extracted result can reach the same "probability", that is, it can achieve "random". Note that when we observe that we have not entered the 1st round cycle, there are 5 expected numbers, and the variable select (that is, the total number of M1 and M2 in the figure) is 2, whether to enter the cycle, let's see if (logically) the random number is 0, and the probability of the random number 0 (in the result M1 or M2) is 2/5, that is, select/k = m1/K + m2/K. How can we ensure this 2/5? In the code, if
(Bigrand () % remaining) <select, it is known that bigrand () extracts a random number. the random number modulo 5, that is, bigrand () % remaining may return 0,
The probability of the results 1, 2, 3, 4 and 5 is equal, and the values smaller than select = 2 are 0 and 1, that is, from these equal results, there is a 1/5 (bingrand () % remaining = 0) probability (M1 can be put into) plus a 1/5 (bingrand () % remaining = 1) probability (M2 can be put ), A total of 2/5 probability can be entered into the if condition to execute the statement in the IF condition (put M1 or M2 into the output result), so that there is a 2/5 probability that the value 0 can be taken as the final result, the values of M1 and M2 can be obtained by using the equal probability. The output result is "random" and "ordered.

 

4. Solution 2-random insertion

As mentioned in the book, this idea comes from a student. He suggested "copy the list of selected area, use a paper cutter to cut copies into pieces of paper containing the name of the selected area, and then put the paper in a paper bag and shake it out, and then extract the required number of pieces of paper." This is a method that can be used in life. We often say that computers come from life, and it is true that this also reflects the topic of "Breaking conceptual barriers" mentioned in the book.

1. pseudocode

  initialize set S to empty    size = 0    while size < m do        t = bigrand() % n        if t is not in S          insert t into S          size++    print the elements of S in sorted order

2. Question Association and summary

This code looks very simple. I understand it as "Sampling back and then randomly extracting", that is, a random number is taken from a set, and then the number is put back into the set after each collection, if the same number is retrieved next time, ignore this result and continue to take the number until the number of samples meeting the conditions is obtained. Each time a number is obtained from the total number, the probability of each number being extracted must be equal. It meets the "random" condition of the question.

I think of a question that I used to do in high school. In my mind, it is easy to reflect this scenario: "There is a bag with N [0, 10) A white ball marked with a number is taken from the bag each time. After each ball is taken, the ball is put back. What is the probability of getting the number XX from the bag?" Although it is somewhat different from the question, this scenario is very similar. Contact life when solving the problem.

 

5. Solution 3-Internal out-of-order Extraction

1. pseudocode

For I = [0, n) Swap (I, randint (I, n-1) // randint (I, j) from I... functions of random integers uniformly selected within the range of J

 

2. Question Association and summary

The idea is very simple. It will disrupt the internal order of the set, and then retrieve the M number from the disrupted set, and then sort it. The result is obtained.

This random sampling method reminds me of the six-color scheme. We can figure out the number of M in the set directly, that is, random sampling. If we want to satisfy the question conditions, we can sort them again.

 

Sat. My benefits

This article discusses the "sampling problem". At first, I did not expect to optimize the problem to the problem of sampling from a numerical set, I didn't even think that I could think about this "random" problem from the perspective of "probability", and think about the phenomenon of life when I solve the problem in the book. The book can actually lead me to think about it, when solving the problem, you can first try to think about whether there is a alternative solution to your own problem, and then you can think about what else the solution can be. When thinking about the solution, you can also break the concept of a box, try another way of thinking.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.