The relationship between probability statistics and machine learning
Probability problem is known as the whole case of the decision sample (whole push individual)
Statistical problem is reverse engineering of probability problem (individual pushing whole)
In machine learning supervised learning, the model was first trained according to the sample and the sample label (individual pushes the whole), then the sample label was predicted according to the model (the whole push individual).
- The statistics estimate is the distribution, and machine learning trains out the model, and the model may contain a lot of distributions.
- A core evaluation index of training and forecasting process is the error of the model.
- The error itself can be the form of probability, which is closely related to probability.
- The different definition of error has evolved into the definition of different loss function.
- Machine learning is the advanced version of probability and statistics. (not a strict argument)
Important statistics
- Expect
- Variance
- Moment
- covariance, correlation coefficients
Covariance applied to machine learning
Covariance is a measure of two random variables with the same direction change trend
-If CoV (x, y) >0, they change in the same direction;
-If CoV (x, y) <0, they change in the opposite direction;
-If CoV (x, y) =0,x and Y are not relevant;
If x is a characteristic of a sample, Y is a sample label, which can be selected according to the covariance characteristics of the feature and label, such as the color and quality of the sample is irrelevant, the covariance is 0, and the color feature can be screened when the model is trained to predict quality.
For n random variables (x1,x2,...... Xn), any two features can be obtained a covariance, thus forming a n*n matrix, that is, the covariance matrix, the covariance matrix is a symmetric matrix. The correlation coefficients can be calculated by the same two characteristics, thus forming a correlation coefficient matrix. covariance matrices and correlation coefficient matrices can all find correlations between features.
If the x1,x2 is a linear relationship, then the x2=ax1+b, then X1 and X2 can be mutually expressed, then X1 and X2 only one of them, the greater the correlation between features the greater the absolute value of the correlation coefficient, the correlation coefficient matrix can be used to filter the characteristics.
Estimating parameters with samples
- Moment estimation
Moment estimation method, also known as "moment method Estimation", is to use the sample moment to estimate the corresponding parameters in the whole. The simplest method of moment estimation is to estimate the total variance with the second-order sample center moment by estimating the overall expectation.
- Maximum likelihood estimation
In practice, due to the need of derivation, the likelihood function is often taken logarithm, the logarithmic likelihood function is obtained, if the logarithmic likelihood function can be guided, θ can be biased, by making the bias of 0 to find the logarithm likelihood function of the standing point, and then analyze whether the standing point is a maximum point.
Mathematical Statistics and parameter estimation in machine learning