Overfitting and regularization

Last Update:2018-07-26 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

over fitting and under fitting

The main challenge of machine learning is that our algorithms must be able to perform well on previously unobserved new inputs, not just in the training set. The ability to perform well on previously unobserved inputs is called generalization.

Normally, when we train a machine learning model, we can use a training set to calculate some metric errors called training errors (training error) on the training set, with the goal of reducing the training error. So far, we're talking about a simple optimization problem. The difference between machine learning and optimization is that we also hope that the generalization error (generalization error) (also known as the test error) is very low.

The following are the factors that determine whether the machine learning algorithm works well: reducing the training error. Narrow the gap between training errors and test errors.

These two factors correspond to the two main challenges of machine learning: Under-fitting (underfitting) and overfitting (overfitting). Under-fitting means that the model cannot get enough low error on the training set. Over-fitting means that the gap between the training error and the test error is too large.

By adjusting the capacity of the model (capacity), we can control whether the model is biased towards overfitting or less-fitting. In layman's, the capacity of a model refers to its ability to fit various functions. A model with a low capacity can be difficult to fit into a training set. A high-capacity model may be over-fitted because it remembers the nature of the training set that does not apply to the test set. A model with insufficient capacity does not solve complex tasks. A high-capacity model solves complex tasks, but is likely to be over-fitted when its capacity is higher than the task requires.

One way to control the capacity of training algorithms is to select the hypothetical space (hypothesis spaces), which is the set of functions that the learning algorithm can choose as a solution. For example, the linear regression algorithm takes all the linear functions of its input as a hypothetical space. The hypothetical space of generalized linear regression includes polynomial functions, not just linear functions. Doing so increases the capacity of the model.

The model specifies the function families from which the learning algorithm can select functions when adjusting the parameters to reduce the training target. This is known as the representation capacity of the model (representational capacity). In many cases, choosing the optimal function from these functions is a very difficult optimization problem. In practice, the learning algorithm does not really find the optimal function, but only to find a function that can greatly reduce the training error. Additional limiting factors, such as imperfect optimization algorithms, mean that the effective capacity of the learning algorithm (effective capacity) may be less than the representation capacity of the model family.

Many of the early scholars presented a minimalist principle, which is now widely known as the Occam ' Srazor (c. 1287-1387). The principle states that we should pick the "simplest" one in the same way that we can interpret the known observed phenomena.

We must remember that while simpler functions are more likely to be generalized (the gap between training errors and test errors is small), we still need to choose a sufficiently complex hypothesis to achieve low training errors. Typically, when the model capacity rises, the training error decreases until its asymptotic minimum possible error (assuming that the error metric has a minimum value). Generally, the generalization error is a U-curve function about the capacity of the model. As shown in the following illustration:
regularization

Regularization is defined as ' modification of the learning algorithm--designed to reduce generalization errors
Rather than the training error '. Regularization of a learning function f (x;θ) f (x; θ) f (x;\theta), we can add penalty to the cost function called regularization (Regularizer). So as to prevent overfitting.
Regularization mode: Parameter norm penalty:
L2 L 2 l^2 parameter regularization: Ω (θ) =12| | w| | 22ω (θ) = 1 2 | | W | | 2 2 \omega (\theta) =\frac{1}{2}| | w| | ^2_2 L1 L 1 l^1 parameter regularization: Ω (θ) =| | w| | 1=∑i|w1| Ω (θ) = | | W | | 1 =∑i | W 1 | \omega (\theta) =| | W | | _1 = \sum\limits_{i}|w_1|. Norm penalty as restraint: Lagrange function, KKT noise robustness sparse representation Bagging dropout regularization and under-constrained problems

In some cases, regularization is necessary in order to correctly define machine learning problems. Many linear models in machine learning, including linear regression and PCA, depend on the inverse of the matrix x⊤x x⊤x x^⊤x. As long as the x⊤x x⊤x x^⊤x is singular, these methods will fail. This matrix is singular when the data generation distribution does not differ in some directions, or because there are few examples (that is, relative to the dimension of the input feature) and the variance is not observed in some directions. In this case, the regularization of many forms corresponds to the inverse x⊤x+αi x⊤x +αi X^⊤x+\alpha I. This regularization matrix can be guaranteed to be reversible.

For example, for an under-defined linear equation, a definition of pseudo-inverse x+:

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Overfitting and regularization

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Overfitting and regularization

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support