Basic concepts of probability theory two basic laws in probability theory of discrete variables: addition and multiplication, the addition rule defines the direct relation between the random variable x and the condition variable Y. The multiplication rule defines the most important conditional probabilities in probability, which can also be called joint probabilities, because it describes the probability that events x and y occur simultaneously.
The above formula can be used to push out the conditional probability formula:
Then we can get the famous Bayesian formula, the Bayesian formula is widely used in the scientific community, which is also called the posterior probability, because it can be calculated after we know the prior probability of P (y=y).
If the two random variables x, y meet the following formula, then they are independent of one another:
If the three random variables x, y, and Z meet the following formula, then it is the conditional Independent:
The probability of a continuous random variable for a continuous random variable x falling in the interval (a, b) is:
Continuous random variables also have the following properties, probability nonnegative, and the probability that random variables occur in all ranges is 1.
The following is a CDF of continuous random variables (cumulative distribution function), which indicates the probability that the random variable x belongs to interval a to B.
Expectation and variance for discrete random variables, the expectation is equal to the weighted average of each random variable, and the weight is the probability of each random variable:
For continuous random variables, similar to discrete random variables, only the accumulation symbol is changed to integral:
Variance measures the dispersion of the entire random variable, and the larger the value of the random variable, the wider the range.
Covariance can be used to measure the relationship between two random variables and is an important part of the correlation coefficient:
Discrete stochastic distribution of Bernoulli distributionsBernoulli DistributionThe Bernoulli distribution is the simplest two-dollar distribution, with a random variable of 0 or 1, the distribution of which is expressed as follows:
When x=1, the probability is μ, when x=0, the probability is 1-μ. The expected and variance of the Bernoulli distributions are as follows:
Two item distributionsbinomialDistribution
The two-item distribution, which represents the N-Test, in which the probability of each occurrence is theta, which is used to calculate the probability of K-times occurring in n times. which
The usual combination formula represented by the above formula, K times in N times, and the mean and variance of two distributions are:
Multi-item DistributionmultinomialDistributionThe two-item distribution is a good way to simulate the result distribution of a double-complexion, but what if we have a multi-complexion in our hands? The polynomial distribution has helped us:
Indicates that the N-test, each side of the m1,m2 ... Mk times.
Especially when N=1, is the extension of the Bernoulli distribution before:
Poisson distributionPoisson DistributionThe Poisson distribution is expressed in the following form, and it is often used to simulate something that is less happening:
The distributed Gaussian distribution (normal distribution) is the greatest and most beautiful distribution in nature, and its expression is as follows:
Its CDF is as follows:
Its average value is as follows:
The variance is:
The following is an example graph of a normal distribution:
Degenerate PDFWhen the variance in the Gaussian distribution becomes very small, approaching 0 o'clock, the curve becomes unusually steep near the mean. whichδ is calledDirac delta function, it can be very good to show that when the Gaussian distribution of the Chinese poor very small situation.
Student T Distribution Gaussian distribution has an obvious problem is more sensitive to the anomaly, for example, the red curve is the real distribution curve, blue dotted line is Gaussian distribution, we can see the right, Gaussian distribution due to the effect of anomaly points deviation from normal distribution, and we can use another distribution, t distribution to better simulate the presence of anomalies:
The T distribution is expressed as follows:
μ represents the mean value of the entire distribution, and V is the degree of freedomdegrees Offreedom.
The T distribution has the following properties:
In particular, when v=1, the distribution is also known as:Cauchy orLorentz DistributionThe use of time to note that we usually need to v>2, when v=4 can be a good simulation of some distribution, but when the v>>5 will be more near the normal distribution, thereby losing the robustness of.
Laplace DistributionIn addition to the T-distribution, the Laplace distribution also has a long tail, which is expressed as follows:
It has the following properties, at the same time it also has a good tolerance to the anomaly, at 0 points than the Gaussian distribution has a higher probability.
Gamma DistributionThe gamma distribution is also a parameter distribution, as follows:
which
The gamma distribution has the following properties:
Here are some examples of gamma distributions:
There is a good article about the Gamma function: The magical gamma function: http://www.52nlp.cn/lda-math-%E7%A5%9E%E5%A5%87%E7%9A%84gamma%E5%87%BD%E6%95%B01
Beta distributionThe expression for the beta distribution is as follows:
Here are some examples of beta functions:
It is of the following nature:
Pareto DistributionThe Pareto principle must have heard it, is the famous long tail theory, Pareto distribution expression is as follows:
Here are some examples: the left image shows the Pareto distribution under different parameter configurations.
Some of the properties are as follows:
Referenceprmlmlap
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
cs281:advanced Machine Learning second section probability theory probability theory