Learning theory (Error theory) Learning notes

Last Update:2018-07-25 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1, sample complexity, model complexity, generalization (generalization) capability

Defined:

Sample complexity = number of samples

Model complexity: typically represented by the number of unknown parameters

Generalization Ability: Represents the predictive ability of the model, the quality of the representation model, and the generalization error.

2. Errors (Error), deviations (bias), variance (variance)

Definition: Assume that there are many sets of sample points, each set of sample points are fitted once the model, to obtain a number of groups of models. Selecting a feature value (not a training set) produces multiple predictions based on the number of models that are trained. The difference between the mean and true values of these predictions represents the deviation (bias) of the model, and the variance of the predicted value (variance) represents the variance of the model.

The chart is represented as follows:

Error definition: Represents the predictive ability of the model, and the characterization of the model is good or bad.

Error = Generalization error

Error = deviation + Variance

The relationship between model complexity and generalization error, deviation, and variance is shown in the following figure:

Reference: "Understanding the Bias-variance Tradeoff" http://scott.fortmann-roe.com/docs/BiasVariance.html#fnref:1

"Accurately measuring Model prediction Error" http://scott.fortmann-roe.com/docs/MeasuringError.html

3. Generalization error (generalization error), expected error (expected error), structured risk (structural risk), training error (training error), experience error (empirical Error), experience risk (empirical risk), confidence risk (incredible risk), traning optimism

Personal Understanding: Training Error = Experience Error = Empirical risk

Defined as follows:

The training error is the proportion of the model's wrong classification in the training sample, if we want to emphasize that it is dependent on the training set

Generalization error = expected error = structured risk

What is given here is a probability that the y in the sample generated by a particular distribution D (x, y) is different from the result generated by the prediction function H (x.).

Confidence risk = Training Optimism

Generalization error = Training error + traning optimism

Structured risk = empirical risk + confidence risk

The relationship between model complexity and generalization error, training error, optimism (confidence error) is shown in the following figure:

Wunda's machine learning program has proven that:

(1) "With the increase of sample size, the probability of training error approaching generalization error increases".

(2) "If it is necessary to ensure that the difference between the training error and the generalization error is within a given range, and the probability of occurrence is not lower, the required number of samples and the VC dimension size of the hypothesis set is linearly correlated." ”

Reference: http://blog.csdn.net/myarrow/article/details/50610411

4, over-fitting, under -fitting

The above two pictures indicate that the generalization error can get a minimum value when the model complexity takes an optimal value in the middle. When the complexity of the model is less than the optimal value, the deviation and training error have great influence on the generalization ability, and the variance and confidence risk have great influence on generalization ability when the model complexity is greater than the optimal value.

5. Model Comparison

Reference: "Accurately measuring Model prediction Error" http://scott.fortmann-roe.com/docs/MeasuringError.html

Like "Cross-validation."

6. Methods to improve generalization ability

(1) Increase the number of samples. The greater the number of samples, the less likely it is to cross-fit.

(2) Reduce the number of features. The fewer features, the less prone to overfitting. For example, Wunda in "machine learning" in the 10th Lesson "Feature Selection".

(3) Add a regular item. The structural risk minimization is used to suppress overfitting. For example: Soft margin in SVM

(4) Bayesian specification and regularization. The principle is also the use of structural risk minimization, suppression of overfitting.

7. Bayesian Statistics and regularization (Bayesian statistics and regularization)

(1) The default theta is randomly distributed instead of a fixed value, and the posterior distribution of theta is calculated below

The reason for the 4th equals sign in the above equation is that the acquisition of X is not related to theta.

(2) According to the first step of the theta distribution P (theta| S) to find out the probability of the sample point being predicted:

(3) MAP

Posterior distribution of theta P (theta| S) is difficult to ask, so change to P (theta| S) theta The maximum value and theta as a fixed value. This method is called the maximum posterior probability estimation (Maximum a posteriori) method, the formula is as follows:

So, map is just an improvement to the original method.

Reference Blog: http://http://blog.csdn.net/qrlhl/article/details/48135873

8, need to perfect the place:

(1) No explanation of the relationship between deviation and training error, variance and confidence risk

(2) No description of how the complexity of the sample affects deviations and variances, training errors, and confidence risk

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Learning theory (Error theory) Learning notes

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Learning theory (Error theory) Learning notes

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support