p1-probability theory basis (Primer on probability theory)

Source: Internet
Author: User

2.1 Probability density function

2.1.1 Definition

Set P (x) as the probability density function of the random variable x in the interval [a, b], p (x) is a nonnegative function and satisfies

Note the difference between probability and probability density function.

Probability is the area of the corresponding area under the probability density function, as shown in the right, the formula is as follows

We use the probability density function to represent the possibility of all possible state x in the interval [a, b].

The conditional probability density function, set P (x|y) is the probability density function in which the condition y belongs to [r,s] under X (x belongs to [A, b]), and there is

The joint probability density function of n-dimensional continuous random variables is recorded as P (X), where x= (x1,..., xn), Xi belongs to [Ai,bi], and sometimes we also use symbols

To replace P (X).

Sometimes, it is even mixed with the joint probability density function as x and Y. In the n-dimensional example, there are

2.1.2 Bayesian rules and derivation

First, the factorization of a joint probability density function is

The Bayesian principle is obtained after re-finishing:

This formula can be used to derive the posteriori probability of the state under a given measurement condition-p (x|y). If we have a prior probability density function p (x) on the state, and a prior probability density function p (y|x) on the sensor model. By enlarging the denominator, like the next,

The origin of the denominator is through marginalization, as follows

, which is very time-consuming to explain in general non-linear situations.

Note that in Bayesian inference, p (x) is called a priori probability density function, and P (x|y) is called the posterior probability density function. In this way, all prior information is focused on p (x) and all the posterior information is focused on P (x|y).

The moment of 2.1.3 probability density function

The No. 0-order probability moment is always 1, and the first-order probability moment is called mean μ, which has the following

For the general matrix function f (X), its expectation is written as

But we wrote it down.

The second-order probability moment is called the covariance matrix σ:

Then the next two moments are called skewness and kurtosis (partial and peak states).

!!!!!!!!! The difference between probability-related information of vectors and probability-related information of random variables

2.14 Sample mean and covariance

Suppose we have a random variable x, and its probability density function p (x), we can get a sample from this probability density function, which can be expressed as

A sample is sometimes called an implementation of a random variable, and we intuitively think of it as a measurement.

If we want to get a sample of N, and want to estimate the mean and covariance of random variable x, we can do this using the sample mean and sample covariance:

Obviously, the denominator in the sample covariance uses N-1 instead of n as normalization, which is called Bezier correction.

2.1.5 statistical independence, and irrelevant

Two random variables x and y, when we say they are statistically independent, their joint probability density factorization is as follows:

If the following equation is established

, the variable is said to be irrelevant.

Independence must not be relevant, or vice versa. We will usually assume that the variables are statistically independent to simplify the calculation.

2.1.6 Shannon and Mutual information

We usually estimate the probability density function for some random variables, and then we want to quantify how determined we are, for example, the mean value of the probability density function.

One way is to look at the negative entropy or Shannon information, H, which is given by the following

We will use the Gaussian probability density function as a concrete expression below.

Another useful amount is mutual information, I (x, y), which is given in the form of a random variable between X and Y, as follows

Mutual information (Mutual information) is a useful information measure, which can be thought of as a random variable containing the amount of data about another random variable, or the uncertainty of a random variable due to another known random variable.

When both X and Y are statistically independent, there is

When x and Y are dependent, we have a useful relationship with us, as follows

2.17cramer-rao Nether and Fisher Information

Suppose there is a deterministic parameter θ, which affects the result of the random variable x. This can be obtained by writing the probability density function of x as dependent on θ, as follows

Further assuming that we get a sample from P (x|θ),

Well, the Cramér-rao lower bound (CRLB) says that the covariance of the deterministic parameter θ and unbiased estimator is bounded by the Fisher Information Matrix,

Unbiased estimation means that the nether means

So CRLB set a basic lower limit. After giving us the measurement, we determine how to estimate a parameter.

2.2 Gaussian probability density function

A Gaussian probability density function, given by the following form

μ is mean, covariance, Σ represents standard deviation, represents a Gaussian density function,

Dovigos density function, where the random variable x is n-dimensional, expressed as follows,

is a symmetric positive definite covariance matrix

2.2.2 Isserlis theorem

The moment of the Dovigos density function to calculate the mean and the amount other than the covariance is troublesome, but there are some specific examples that we will use later, which is worth discussing. We can use the Isserlis theorem to calculate higher order Gaussian random variables.

The theorem is as follows

There are four variables, which indicate the following

We can apply this theory to the useful results of the computational matrix representation.

Suppose there is, to evaluate an expression

P is a nonnegative integer, when p=0, there is, when p=1, there is

In scalars, therefore, the same method is used for p greater than 1.

We also consider the following example,

X1 The dimension of N1,X2 is N2, the following expression is calculated

Similarly, p is a non-negative integer, when p=0, there is, when p=1, there is

Similarly, there are

Finally, check it out, there

Further, we have

A is a square that is compatible with the above.

2.2.3 Combine Gaussian probability density functions, their factorization, and inference

The joint gauss of a pair of variables (x, y) can be written as

It also has the same probability representation, where the

We can use the Shure complement to solve the combined Gauss

It is important that P (x|y) and P (y) are Gaussian density functions, and if we know the value of Y (for example, measured), we can calculate the probability of x under the Y condition by P (x|y).

This is a basis for Gaussian inference: We start with a priori state about us, and then we use some measurements to narrow the scope of the transcendental State X, and in (2.46B) we see an adjustment to the mean and covariance to make it smaller.

p1-probability theory basis (Primer on probability theory)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.