1, sample complexity, model complexity, generalization (generalization) capability
Defined:
Sample complexity = number of samples
Model complexity: typically represented by the number of unknown parameters
Generalization Ability: Represents the predictive ability of the model, the quality of the representation model, and the generalization error.
2. Errors (Error), deviations (bias), variance (variance)
Definition: Assume that there are many sets of sample points, each set of sample points are fitted once the model, to obtain a number of groups of models. Selecting a feature value (not a training set) produces multiple predictions based on the number of models that are trained. The difference between the mean and true values of these predictions represents the deviation (bias) of the model, and the variance of the predicted value (variance) represents the variance of the model.
The chart is represented as follows:
Error definition: Represents the predictive ability of the model, and the characterization of the model is good or bad.
Error = Generalization error
Error = deviation + Variance
The relationship between model complexity and generalization error, deviation, and variance is shown in the following figure:
Reference: "Understanding the Bias-variance Tradeoff" http://scott.fortmann-roe.com/docs/BiasVariance.html#fnref:1
"Accurately measuring Model prediction Error" http://scott.fortmann-roe.com/docs/MeasuringError.html
3. Generalization error (generalization error), expected error (expected error), structured risk (structural risk), training error (training error), experience error (empirical Error), experience risk (empirical risk), confidence risk (incredible risk), traning optimism
Personal Understanding: Training Error = Experience Error = Empirical risk
Defined as follows:
The training error is the proportion of the model's wrong classification in the training sample, if we want to emphasize that it is dependent on the training set
Generalization error = expected error = structured risk
What is given here is a probability that the y in the sample generated by a particular distribution D (x, y) is different from the result generated by the prediction function H (x.).
Confidence risk = Training Optimism
Generalization error = Training error + traning optimism
Structured risk = empirical risk + confidence risk
The relationship between model complexity and generalization error, training error, optimism (confidence error) is shown in the following figure:
Wunda's machine learning program has proven that:
(1) "With the increase of sample size, the probability of training error approaching generalization error increases".
(2) "If it is necessary to ensure that the difference between the training error and the generalization error is within a given range, and the probability of occurrence is not lower, the required number of samples and the VC dimension size of the hypothesis set is linearly correlated." ”
Reference: http://blog.csdn.net/myarrow/article/details/50610411
4, over-fitting, under -fitting
The above two pictures indicate that the generalization error can get a minimum value when the model complexity takes an optimal value in the middle. When the complexity of the model is less than the optimal value, the deviation and training error have great influence on the generalization ability, and the variance and confidence risk have great influence on generalization ability when the model complexity is greater than the optimal value.
5. Model Comparison
Reference: "Accurately measuring Model prediction Error" http://scott.fortmann-roe.com/docs/MeasuringError.html
Like "Cross-validation."
6. Methods to improve generalization ability
(1) Increase the number of samples. The greater the number of samples, the less likely it is to cross-fit.
(2) Reduce the number of features. The fewer features, the less prone to overfitting. For example, Wunda in "machine learning" in the 10th Lesson "Feature Selection".
(3) Add a regular item. The structural risk minimization is used to suppress overfitting. For example: Soft margin in SVM
(4) Bayesian specification and regularization. The principle is also the use of structural risk minimization, suppression of overfitting.
7. Bayesian Statistics and regularization (Bayesian statistics and regularization)
(1) The default theta is randomly distributed instead of a fixed value, and the posterior distribution of theta is calculated below
The reason for the 4th equals sign in the above equation is that the acquisition of X is not related to theta.
(2) According to the first step of the theta distribution P (theta| S) to find out the probability of the sample point being predicted:
(3) MAP
Posterior distribution of theta P (theta| S) is difficult to ask, so change to P (theta| S) theta The maximum value and theta as a fixed value. This method is called the maximum posterior probability estimation (Maximum a posteriori) method, the formula is as follows:
So, map is just an improvement to the original method.
Reference Blog: http://http://blog.csdn.net/qrlhl/article/details/48135873
8, need to perfect the place:
(1) No explanation of the relationship between deviation and training error, variance and confidence risk
(2) No description of how the complexity of the sample affects deviations and variances, training errors, and confidence risk