Problem
An rand7 () API is provided to randomly generate numbers ranging from 1 to 7. rand7 is used to implement rand10. rand10 can randomly generate numbers ranging from 1 to 10.
Ideas
Simply put:
(1) through the (RAND n-1) % 10 + 1 method, you can find rand10, when n is a multiple of 10.
(2) Use (rand7-1) * 7 + rand7 to randomly generate 1-49, which is recorded as rand49.
(3) If rand40 can be calculated by rand49, that is, 1-40 can be randomly generated, rand40 % 10 can be used to obtain rand10.
(4) how to calculate rand40 through rand49? You can useReject samplingTo calculate: Use rand49 to generate a number. If it is 41-49, It is discarded and continues to be generated. If it is 1-40, it is returned. In this way, the number between 1-40 is generated randomly.
(5) the code can be written here, but why is this useful? The following describesFrom reject sampling to question.
Reject sampling (reject sampling)
First, I want to know several knowledge points:
In continuous distribution, the cumulative distribution function CDF is \ (f (x) = p \ {x <X \} \), indicating that the random variable X falls into the range \ (-\ infty, x.
The meaning of the probability density function PDF is embodied in points, $ \ int _ {A} ^ {B} f (x) dx = F (B)-f () = P {A \ Le x \ lt B} $.
In discrete distribution, the cumulative distribution function CDF is equally meaningful. \ (f (x) = p \ {x <X \} \) indicates the \ (X_k \) values smaller than X \) the sum of \ (P_k.
The probability mass function PMF indicates the probability of getting a single vertex. \ (f (x) = p \ {x = x \}\).
Reject sampling is based on the premise that sampling a random variable is equivalent to uniformly sampling from the area below the density function that the variable obeys.
The p (x) and Q (x) mentioned below refer to the probability density function in the continuous distribution. We use the probability density function to represent a probability distribution.
Suppose we want to sample the p (x) distribution, but for various reasons, it is difficult for us to sample p (x. However, another distribution, Q (x), can be sampled. Then we can obtain an approximate sampling on p (x) based on Q (x) Sampling. The specific method is as follows.
(1) Multiply Q (x) by a constant m greater than 1, so that M * Q (x) can overwrite p (x, as shown in (see references for the source image ).
Because the area of the probability density function is 1, Q (x) must be larger than P (x) In some cases, and smaller than Q (x) in some cases) large, with an area exceeding 1. In order to cover p (x), that is, it is bigger than P (x) everywhere, multiply by a constant m greater than 1, which is M * Q (x ).
(2) Obtain a sampling point X (I) from Q (x) and calculate an acceptance probability for X (I) $ \ alpha = \ frac {P (x_ I )} {MQ (x_ I)} \ (, a random value \) \ MU \ (, if \) \ Alpha \ Ge \ Mu $, is generated from uniform (0, 1, X (I) is used as a sampling point from p (x. Otherwise, the sampling continues.
Because the probability of getting this point from the Q (x) distribution is Q (x), not p (x). (In fact, for continuous distribution, the probability of a random variable being a single value is 0, but here we discuss sampling, from the area below the density function for even sampling, in this scenario, we think that this probability is a meaningful value instead of 0). To change it to p (x), we need to have an acceptable probability for selection, ideally, p (x)/Q (x) is selected as the receiving probability, because Q (x) multiplied by this acceptance probability is exactly equal to p (x ).
However, p (x)/Q (x) may be greater than 1. As mentioned above, p (x) is higher than Q (x) in some places, so Q (x) multiply an M to get a p (x)/(m * Q (x) in (), which can be used as the acceptance probability. This acceptance probability can be obtained from) is obtained in a uniform distribution. As we can see, the closer p (x) is to M * Q (x), the higher the acceptance probability. This is in line with our sampling intuition, because M * Q (x) it is proportional to Q (x) and M is given, so the closer p (x) is to M * Q (x), the closer p (x) is to Q (x ), at this time, their probability is relatively close. Of course, we need to accept a higher probability. In turn, when p (x) and M * Q (x) are much different, the gap between P (x) and Q (x) is also large. In this case, Q (x) A large probability of the sample being sampled will be rejected.
The probability of final sampling is \ (\ frac {p (x)} {m q (x)} Q (x) = \ frac {1} {m} p (x) \), where \ (\ frac {p (x)} {m q (x)} \) can be sampled from a uniform distribution, Q (x) is a known distribution that can be sampled. Through the two distribution samples, we can obtain an approximate p (x) sampling. Obviously, when we take M closer to 1, the approximate sampling is closer to the sampling on p (x. If p (x) is exactly the same as Q (x), M is 1, and sampling on Q is equal to sampling on p.
Back to this question
Rand7 generates 1-7 randomly, then 1-49 can be randomly generated using (rand7-1) * 7 + rand7 and recorded as rand49 ().
If rand40 can be calculated, that is, 1-40 is randomly generated, rand10 can be obtained by (rand40-1) % 10 + 1.
So how can we calculate rand40 when we know rand49? This scenario is similar to the aboveReject samplingScenario: We can sample from the Q (x) distribution and simulate an approximate sampling on p (x. You can also reject sampling for discrete distributions by writing the above probability density function into the probability quality function.
Use the distribution corresponding to rand40 as the p (x) above, then use the distribution corresponding to rand49 as the Q (x) above, and then take M as 49/40.
Sampling on Q (x) is equivalent to generating a number using rand49. The probability of generating each number is 1/49, and the multiplication M is 1/40. For each random variable, M * Q (x) is 1/40. P (x) is the distribution corresponding to rand40. When the random variable is 1-40, p (x) = 1/40, the acceptance probability is p (x)/(m * Q (x )) is 1. When the random variable is 41-49, p (x) is 0, so the acceptance probability is 0.
Number of exploitation denied
Rand49 is rejected when it reaches 41-49. Can it be used instead of 41-49?
Now we can consider the case where the number obtained is between 4-49 and the regular value of 40 is 1-9. Assume that the value is Val and the Val value is between 1-9, and then rand7 () is used (), build an even distribution of [1, 63], namely (Val-1) * rand7 () + 7. Based onReject samplingIn the uniform distribution of [1, 63], the value between 1 and 60 is collected, and then rand10 is obtained using % 10.
Then, in the preceding operation, if the number of samples is 61-63, first the 60 is reduced to 1-3, assuming that the value is Val and the Val value is between 1-3, then rand7 () is used (), build an even distribution of [1, 21], namely (Val-1) * rand7 () + 7. Based onReject samplingObtain the value from 1 to 20 and use % 10 to obtain rand10.
Then, in the above operation, if there is only one number when the number of samples is 21, this operation will be rejected directly and the sampling will continue according to the above series of operations.
Code
# The rand7() API is already defined for you.# def rand7():# @return a random integer in the range 1 to 7class Solution(object): def rand10(self): """ :rtype: int """ while True: val = (rand7() - 1) * 7 + rand7() if(val <= 40): break return (val-1) % 10 + 1
class Solution(object): def rand10(self): """ :rtype: int """ while True: val = (rand7() - 1) * 7 + rand7() if val <= 40: return (val-1) % 10 + 1 val -= 40 val = (val - 1) * 7 + rand7() if val <= 60: return (val-1) % 10 + 1 val -= 60 val = (val - 1) * 7 + rand7() if val <= 20: return (val-1) % 10 + 1
References
Monte Carlo sampling rejection-csdn
470. Implement rand10 () using rand7 () (reject sampling)