Ask for a mathematical formula: a random number that requires a controllable distribution to be generated?

Source: Internet
Author: User
For example: Generate a random value rand (range is 1-100). The probability of a small value appearing in Rand is high, but the probability of a large value is very low. Hope the great God gives a concrete formula. Random values are substituted with Rand for the line. Best can be implemented in code, such as PHP python, thank you!

Reply content:

Given a cumulative distribution function (cumulative distribution function, CDF), a uniformly distributed random number can be mapped to it as long as it can find its inverse function. This is called Inverse transform sampling 。 Random Sampling (Numpy.random)

Numpy.random has implemented a sample function from a well-defined distribution,
Like what
beta(a, b[, size])Draw samples from a Beta distribution.binomial(n, p[, size])Draw samples from a binomial distribution.chisquare(df[, size])Draw samples from a chi-square distribution.dirichlet(alpha[, size])Draw samples from the Dirichlet distribution.exponential([scale, size])Draw samples from an exponential distribution.f(dfnum, dfden[, size])Draw samples from an F distribution.gamma(shape[, scale, size])Draw samples from a Gamma distribution.geometric(p[, size])Draw samples from the geometric distribution.gumbel([loc, scale, size])Draw samples from a Gumbel distribution.
Assuming that the landlord takes a 1-100 integer, 100 of the probability is greater than 99 times, 99:98 big one times,
Then press 1 for the benchmark,
Sum sum= 1+2+ 4 + 2^99 =2^100-1
Then the random rand (1,sum), if between the 2^30-1 to 2^31 range, then is taken to 30. The simple formula is log (rand (1,sum))/log2 The ugly, there are mistakes to point out!


It happened that I met this problem and shared it with you.

Let's say we get the probability of each value, then we accumulate the probability of each value, and the result is uniformly placed on a number axis for each increment. Let's take 10 values to illustrate the problem, such as:



[I] represents the first value, the horizontal axis (0-0.1) represents the probability of [1], (0.5-0.75) represents the probability of [7], (0-0.75) represents [1] to [7] the probability of accumulation and.

Next, for example in MATLAB, use the rand () function to produce a random number with a range of 0-1 evenly distributed, the number generated in the false is 0.65, it falls in the interval (0.5-0.75), and this interval represents the probability of [7]. Thus, we decided to take [7] out as the result of this probability-based filter.

Code attached with MATLAB

% where the number of max_column values, prob represents each
The probability of the% value, Prob_array represents the cumulative probability and
%largest_cumulative_prob represents the cumulative maximum value of probability

For j=1:max_column-1
Prob_array (j+1) = Prob_array (j) +prob (j);
End

Largest_cumulative_prob=prob_array (Max_column);
%largest_cumulative_prob and Prob_array need to do
Percent normalization, not done in the code

Choice = rand () * LARGEST_CUMULATIVE_PROB;

low = 1;
High = Max_column;
while (High> (low+1))
Middle = (high+low)/2;
Middle = floor (middle);
if (Choice > Prob_array (middle))
low = middle;
else high = middle;

End
End
% finally remove low as the result of this screening
% is not explained here, and its thought is the same as in the picture above

This method is suitable for discrete numbers, for consecutive numbers I do not know how to deal with, personal conjecture may use probability density? Hope Big God answer! Method of Selection.
Do not worry about the number of calculations in particular: Do not worry about the number of calculations in particular:
Read a bit of Ndndsim source. The method of generating Zipf distribution inside is interesting. I hope I can give the main reference. Given cumulative distribution functions and ranges such as 1-50. Calculates the cumulative distribution function in an array of 1 to 50 function values. Then a random number is generated with a homogeneous distribution. Then compare the values of the 50 distribution functions in turn. Returns the subscript. Of course, the return subscript can vary with the requirements. 100/rand (1,100) rand (1,rand (1,rand (1,100)) so you can see.
  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.