Learning theory (Error theory) Learning notes

Source: Internet
Author: User

1, sample complexity, model complexity, generalization (generalization) capability

Defined:

Sample complexity = number of samples

Model complexity: typically represented by the number of unknown parameters

Generalization Ability: Represents the predictive ability of the model, the quality of the representation model, and the generalization error.


2. Errors (Error), deviations (bias), variance (variance)


Definition: Assume that there are many sets of sample points, each set of sample points are fitted once the model, to obtain a number of groups of models. Selecting a feature value (not a training set) produces multiple predictions based on the number of models that are trained. The difference between the mean and true values of these predictions represents the deviation (bias) of the model, and the variance of the predicted value (variance) represents the variance of the model.

The chart is represented as follows:


Error definition: Represents the predictive ability of the model, and the characterization of the model is good or bad.

Error = Generalization error

Error = deviation + Variance


The relationship between model complexity and generalization error, deviation, and variance is shown in the following figure:

Reference: "Understanding the Bias-variance Tradeoff" http://scott.fortmann-roe.com/docs/BiasVariance.html#fnref:1

"Accurately measuring Model prediction Error" http://scott.fortmann-roe.com/docs/MeasuringError.html


3. Generalization error (generalization error), expected error (expected error), structured risk (structural risk), training error (training error), experience error (empirical Error), experience risk (empirical risk), confidence risk (incredible risk), traning optimism

Personal Understanding: Training Error = Experience Error = Empirical risk

Defined as follows:

The training error is the proportion of the model's wrong classification in the training sample, if we want to emphasize that it is dependent on the training set

Generalization error = expected error = structured risk

What is given here is a probability that the y in the sample generated by a particular distribution D (x, y) is different from the result generated by the prediction function H (x.).

Confidence risk = Training Optimism

Generalization error = Training error + traning optimism

Structured risk = empirical risk + confidence risk


The relationship between model complexity and generalization error, training error, optimism (confidence error) is shown in the following figure:


Wunda's machine learning program has proven that:

(1) "With the increase of sample size, the probability of training error approaching generalization error increases".

(2) "If it is necessary to ensure that the difference between the training error and the generalization error is within a given range, and the probability of occurrence is not lower, the required number of samples and the VC dimension size of the hypothesis set is linearly correlated." ”

Reference: http://blog.csdn.net/myarrow/article/details/50610411


4, over-fitting, under -fitting

The above two pictures indicate that the generalization error can get a minimum value when the model complexity takes an optimal value in the middle. When the complexity of the model is less than the optimal value, the deviation and training error have great influence on the generalization ability, and the variance and confidence risk have great influence on generalization ability when the model complexity is greater than the optimal value.


5. Model Comparison

Reference: "Accurately measuring Model prediction Error" http://scott.fortmann-roe.com/docs/MeasuringError.html

Like "Cross-validation."


6. Methods to improve generalization ability

(1) Increase the number of samples. The greater the number of samples, the less likely it is to cross-fit.

(2) Reduce the number of features. The fewer features, the less prone to overfitting. For example, Wunda in "machine learning" in the 10th Lesson "Feature Selection".

(3) Add a regular item. The structural risk minimization is used to suppress overfitting. For example: Soft margin in SVM

(4) Bayesian specification and regularization. The principle is also the use of structural risk minimization, suppression of overfitting.


7. Bayesian Statistics and regularization (Bayesian statistics and regularization)

(1) The default theta is randomly distributed instead of a fixed value, and the posterior distribution of theta is calculated below

The reason for the 4th equals sign in the above equation is that the acquisition of X is not related to theta.

(2) According to the first step of the theta distribution P (theta| S) to find out the probability of the sample point being predicted:


(3) MAP

Posterior distribution of theta P (theta| S) is difficult to ask, so change to P (theta| S) theta The maximum value and theta as a fixed value. This method is called the maximum posterior probability estimation (Maximum a posteriori) method, the formula is as follows:


So, map is just an improvement to the original method.

Reference Blog: http://http://blog.csdn.net/qrlhl/article/details/48135873


8, need to perfect the place:

(1) No explanation of the relationship between deviation and training error, variance and confidence risk

(2) No description of how the complexity of the sample affects deviations and variances, training errors, and confidence risk

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.