2.1 Probability density function
2.1.1 Definition
Set P (x) as the probability density function of the random variable x in the interval [a, b], p (x) is a nonnegative function and satisfies
Note the difference between probability and probability density function.
Probability is the area of the corresponding area under the probability density function, as shown in the right, the formula is as follows
We use the probability density function to represent the possibility of all possible state x in the interval [a, b].
The conditional probability density function, set P (x|y) is the probability density function in which the condition y belongs to [r,s] under X (x belongs to [A, b]), and there is
The joint probability density function of n-dimensional continuous random variables is recorded as P (X), where x= (x1,..., xn), Xi belongs to [Ai,bi], and sometimes we also use symbols
To replace P (X).
Sometimes, it is even mixed with the joint probability density function as x and Y. In the n-dimensional example, there are
2.1.2 Bayesian rules and derivation
First, the factorization of a joint probability density function is
The Bayesian principle is obtained after re-finishing:
This formula can be used to derive the posteriori probability of the state under a given measurement condition-p (x|y). If we have a prior probability density function p (x) on the state, and a prior probability density function p (y|x) on the sensor model. By enlarging the denominator, like the next,
The origin of the denominator is through marginalization, as follows
, which is very time-consuming to explain in general non-linear situations.
Note that in Bayesian inference, p (x) is called a priori probability density function, and P (x|y) is called the posterior probability density function. In this way, all prior information is focused on p (x) and all the posterior information is focused on P (x|y).
The moment of 2.1.3 probability density function
The No. 0-order probability moment is always 1, and the first-order probability moment is called mean μ, which has the following
For the general matrix function f (X), its expectation is written as
But we wrote it down.
The second-order probability moment is called the covariance matrix σ:
Then the next two moments are called skewness and kurtosis (partial and peak states).
!!!!!!!!! The difference between probability-related information of vectors and probability-related information of random variables
2.14 Sample mean and covariance
Suppose we have a random variable x, and its probability density function p (x), we can get a sample from this probability density function, which can be expressed as
A sample is sometimes called an implementation of a random variable, and we intuitively think of it as a measurement.
If we want to get a sample of N, and want to estimate the mean and covariance of random variable x, we can do this using the sample mean and sample covariance:
Obviously, the denominator in the sample covariance uses N-1 instead of n as normalization, which is called Bezier correction.
2.1.5 statistical independence, and irrelevant
Two random variables x and y, when we say they are statistically independent, their joint probability density factorization is as follows:
If the following equation is established
, the variable is said to be irrelevant.
Independence must not be relevant, or vice versa. We will usually assume that the variables are statistically independent to simplify the calculation.
2.1.6 Shannon and Mutual information
We usually estimate the probability density function for some random variables, and then we want to quantify how determined we are, for example, the mean value of the probability density function.
One way is to look at the negative entropy or Shannon information, H, which is given by the following
We will use the Gaussian probability density function as a concrete expression below.
Another useful amount is mutual information, I (x, y), which is given in the form of a random variable between X and Y, as follows
Mutual information (Mutual information) is a useful information measure, which can be thought of as a random variable containing the amount of data about another random variable, or the uncertainty of a random variable due to another known random variable.
When both X and Y are statistically independent, there is
When x and Y are dependent, we have a useful relationship with us, as follows
2.17cramer-rao Nether and Fisher Information
Suppose there is a deterministic parameter θ, which affects the result of the random variable x. This can be obtained by writing the probability density function of x as dependent on θ, as follows
Further assuming that we get a sample from P (x|θ),
Well, the Cramér-rao lower bound (CRLB) says that the covariance of the deterministic parameter θ and unbiased estimator is bounded by the Fisher Information Matrix,
Unbiased estimation means that the nether means
So CRLB set a basic lower limit. After giving us the measurement, we determine how to estimate a parameter.
2.2 Gaussian probability density function
A Gaussian probability density function, given by the following form
μ is mean, covariance, Σ represents standard deviation, represents a Gaussian density function,
Dovigos density function, where the random variable x is n-dimensional, expressed as follows,
is a symmetric positive definite covariance matrix
2.2.2 Isserlis theorem
The moment of the Dovigos density function to calculate the mean and the amount other than the covariance is troublesome, but there are some specific examples that we will use later, which is worth discussing. We can use the Isserlis theorem to calculate higher order Gaussian random variables.
The theorem is as follows
There are four variables, which indicate the following
We can apply this theory to the useful results of the computational matrix representation.
Suppose there is, to evaluate an expression
P is a nonnegative integer, when p=0, there is, when p=1, there is
In scalars, therefore, the same method is used for p greater than 1.
We also consider the following example,
X1 The dimension of N1,X2 is N2, the following expression is calculated
Similarly, p is a non-negative integer, when p=0, there is, when p=1, there is
Similarly, there are
Finally, check it out, there
Further, we have
A is a square that is compatible with the above.
2.2.3 Combine Gaussian probability density functions, their factorization, and inference
The joint gauss of a pair of variables (x, y) can be written as
It also has the same probability representation, where the
We can use the Shure complement to solve the combined Gauss
It is important that P (x|y) and P (y) are Gaussian density functions, and if we know the value of Y (for example, measured), we can calculate the probability of x under the Y condition by P (x|y).
This is a basis for Gaussian inference: We start with a priori state about us, and then we use some measurements to narrow the scope of the transcendental State X, and in (2.46B) we see an adjustment to the mean and covariance to make it smaller.
p1-probability theory basis (Primer on probability theory)