The probability theory of machine learning preparatory knowledge (bottom)

Source: Internet
Author: User

Expected value and variance

The expected value E (X) of a random variable, also known as an average or mean, is calculated using the following formula, which is used to calculate the expected value of discrete random variables and continuous random variables, respectively:


Use the formula above to calculate the indicator variable (a random variable with either a value of 1 or 0) is available:

Here are two important theorems related to expectations, the first of which is the expected linear nature:

Regardless of whether the random variables are independent, the expected linear properties are established. The second definition is only true if the random variables are independent of one another:

Other important properties are expected: If C is constant then E (c) =c,e (CX) =ce (X).

Variance is used to measure the degree of dispersion of a distribution, and the variance is calculated using the following formula:

Typically, the relationship between representing the variance, using the standard deviation, and the variance is:

In the expectation of a known random variable x, the variance of x can be quickly computed by the following formula:


The above derivation process takes advantage of the expected linearity and, if C is constant, E (c) =c,e (CX) =ce (x) (here C is e (x)). Variance is not a linear function of a random variable, such as:


If the random variable x and y are independent of each other, then there is the following relationship:


The covariance of two random variables is defined as follows, and covariance represents the degree of correlation of two random variables:


Bernoulli, Poisson, and Gaussian distributions

The Bernoulli distribution is one of the most basic distributions, the random variable that obeys the Bernoulli distribution X can only take two values 0 and 1, usually using p to represent the probability that X is 1, namely p=p (x=1), q is the probability that x is the value 0, namely q= P (x=0) =1-p. Since x can only take values of 0 and 1, the usual term indicates whether the experiment was successful. By definition, the distribution of Bernoulli is:


You can also summarize the above formula as follows:. The expectation and variance of Bernoulli distributions are p and P (1-p), respectively, and the calculation process is as follows:


Poisson distribution is a very useful distribution for handling event occurrences, and is suitable for describing the probability distribution of the number of random events occurring per unit of time. such as the number of service requests received by a service facility within a certain period of time, the number of calls received by the telephone switch, the number of guests waiting at the bus station, the number of failures occurring in the machine, the number of natural disasters, the number of mutations in the DNA sequence, the decay of radioactive nuclei, etc.

The parameter λ of the Poisson distribution is the average occurrence of random events in the unit time (or per unit area), and the mass function of the Poisson distribution is:

Both the expectation and the variance are λ, and the calculation process is as follows:


An important formula is used in the above derivation process:.

Gaussian distribution, also known as normal distribution, is one of the most commonly used distributions, such as the ability to approximate a two-item distribution when the number of experiments is very large, or to approximate the Poisson distribution at high average incidence, and also to the large number theorem. The Gaussian distribution is determined by two parameters: the desired μ and variance σ2, with the following formula:


As an example of a Gaussian distribution, it is known from this graph that the desired decision determines the central position of the normal curve, and the variance determines the steep or flat degree of the normal curve. The smaller the variance, the steeper the curve, and the larger the variance, the more flattened the curve.



In machine learning, the Gaussian distribution of multivariable is often dealt with, and the Gaussian distribution of K-Vito variables can be expressed using parameters (μ,σ), where μ is the K-dimensional vector of expected value, Σ is the kxk covariance matrix, where Σii=var (Xi), σij= Cov (X i,x J). The probability density function of a multivariable Gaussian distribution is:


The probability theory of machine learning preparatory knowledge (bottom)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.