How to implement these five kinds of powerful probability distributions in Python

Source: Internet
Author: User

The R programming language has become the standard of fact in statistical analysis. But in this article, I'll show you how easy it is to implement statistical concepts in Python. I'm going to use Python to implement some discrete and continuous probability distributions. Although I will not discuss the mathematical details of these distributions, I will give you some good information on how to learn these statistical concepts in a linked way. Before discussing these probability distributions, I would like to briefly say what is a random variable (variable). Random variables are the quantification of a test result.

For example, a random variable representing the result of a coin toss can be expressed as a

Python
12 X = {1 if face up , 2 If opposite face up }

A random variable is a variable that takes a value to a set of possible values (discrete or continuous) and obeys some kind of randomness. Each possible value of a random variable is associated with a probability. All possible values of a random variable and the probability associated with it are called probability distributions (probability distributrion).

I encourage you to study the scipy.stats modules carefully.

There are two types of probability distributions: discrete (discrete) probability distributions and continuous (continuous) probability distributions.

Discrete probability distributions are also called probabilistic mass functions (probability mass function). Examples of discrete probability distributions are the Bernoulli distribution (Bernoulli distribution), the two-item distribution (binomial distribution), the Poisson distribution (Poisson distribution), and the geometric distribution (geometric Distribution) and so on.

A continuous probability distribution is also known as a probability density function (probability density function), which is one that has successive values (for example, a value on a solid line). Normal distribution (normal distribution), exponential distribution (exponential distribution), and β-distribution (beta distribution) are all continuous probability distributions.

To learn more about discrete and continuous random variables, you can watch the Khan Academy video on probability distributions.

Two items distributed (binomial distribution)

The random variable x that obeys the two-item distribution is the number of successes in N Independent/non-trials, where the probability of success for each trial is p.

E (x) = NP, Var (x) = NP (1−P)

If you want to know how each function works, you can use the Help file command in the Ipython notebook. E (X) indicates the expected or average value of the distribution.

Type stats.binom? more information about the two-item distribution function binom .

Example of two distributions: what is the probability of throwing 10 coins, exactly two times, facing up?

Assuming that the probability of facing up in the test is 0.3, it means that on average, we can expect 3 times to be a coin facing upwards. I define all the possible results of tossing a coin k = np.arange(0,11) : You may have observed 0 heads facing up, 1 times facing upward, and 10 times facing upward. I use stats.binom.pmf the probability mass function to calculate each observation. It returns a list of 11 elements that represent the probability values associated with each observation.

You can use .rvs a function to simulate a two-item random variable, where size the parameter specifies the number of times you want to simulate. I asked Python to return 10,000 random variables with a two-item parameter of n and p. I will output the average and standard deviation of these random variables, and then draw a histogram of all the random variables.

Poisson distribution (Poisson distribution)

A random variable x that obeys the Poisson distribution, representing the number of occurrences of an event within a fixed interval of time with a ratio parameter (rate parameter) λ. The parameter λ tells you the rate at which the event occurred. The mean and variance of the random variable x are λ.

E (x) =λ, Var (x) =λ

Example of Poisson distribution: the rate at which a junction has been known to have an accident is 2 times a day, so what is the probability of 4 accidents in one day?

Let's consider this example of an average of 2 accidents per day. The implementation of the Poisson distribution is somewhat similar to the two-item distribution, where we need to specify the ratio parameter in the Poisson distribution. The output of the Poisson distribution is a sequence that contains the probability of 0, 1, 2, and 10 accidents. I used the results to generate the slices.

As you can see, the peak of the number of accidents is near the mean value. On average, you can expect the number of occurrences to be λ. Try different values for λ and N, and then see how the shape of the distribution changes.

Now I'm going to simulate 1000 random variables that obey the Poisson distribution.

Normal distribution (normal distribution)

Normal distributions are a continuous distribution whose functions can be valued anywhere on a solid line. A normal distribution is described by two parameters: the mean μ and variance σ2 of the distribution.

E (x) =μ, Var (x) =σ2

The value of the normal distribution can be from negative infinity to positive infinity. You can notice that I stats.norm.pdf get the probability density function of the normal distribution.

Beta distribution (Beta distribution)

β distribution is a continuous distribution of values between [0, 1], which is characterized by the values of two morphological parameters α and β.

The shape of the beta distribution depends on the values of α and β. Beta distributions were used extensively in Bayesian analysis.

When you set both the parameter α and β to 1 o'clock, the distribution is also known as the uniform Distribution (uniform distribution). Try different alpha and β values to see how the shape of the distribution changes.

Exponential distribution (exponential distribution)

An exponential distribution is a continuous probability distribution that represents the interval of time when an independent random event occurs. For example, the time interval for passengers to enter the airport, the time interval to call the customer service center, the time interval between the new Wikipedia entries, and so on.

I set the parameter λ to 0.5 and set the value range of X to $[0, 15]$.

Next, I simulated 1000 random variables under exponential distribution. scalethe parameter represents the reciprocal of λ. function np.std , the parameter ddof equals the standard deviation divided by the value of the $n -1$.

Conclusion (conclusion)

The probability distribution is like a blueprint for building a house, and a random variable is a summary of the test event. I suggest you take a look at Harvard's Data Science course lectures, and Professor Joe Blitzstein a summary of all the statistical models and distributions you need to know.

How to implement these five kinds of powerful probability distributions in Python

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.