The treatment method of preventing over-fitting the foundation of the depth learning

Source: Internet
Author: User

Original address: The sky of a bird, http://blog.csdn.net/heyongluoyao8/article/details/49429629 the treatment method for preventing the fitting

As we all know, when data mining or machine learning models are established, because in statistical learning, the assumption that the data satisfies the independent distribution (i.i.d,independently and identically distributed), That is, the data that is currently produced can be used to speculate and simulate future data, so it is the use of historical data to build the model, that is, to use the generated data to train, and then use the model to fit future data. However, the assumption of the general independent distribution is often not tenable, that is, the distribution of data may change (distribution drift), and may not be enough current data to estimate the entire dataset, so it is often necessary to prevent the model from being fitted and improve the generalization ability of the model. The most common way to achieve this is regularization, which adds a regular item to the model's target function (objective function) or the cost function.
When training the model, it is possible to encounter insufficient training data, that is, when the training data can not estimate the distribution of the whole data, or when the model is overtraining (overtraining), it often leads to the fitting of the model (overfitting). As shown in the following illustration:

As you can see from the diagram above, with the development of model training, the complexity of the model will be increased, and the training error of the model in the training dataset will decrease gradually, but the error of the model in the validation set increases with the complexity of the model when the complexity of the model reaches a certain degree. At this time, there has been a fitting, that is, the complexity of the model increased, but the model in addition to the training set on the dataset is not work.
In order to prevent the fitting, we need to use some methods, such as: Early stopping, DataSet amplification (data augmentation), regularization (regularization), dropout and so on. Early stopping

The process of training the model is the process of learning and updating the parameters of the model, and the process of learning this parameter often uses some iterative methods, such as gradient descent (gradient descent) learning algorithm. Early stopping is a method of iterative truncation to prevent the fitting, that is, to stop the iteration before the model sets the iterative convergence of the training dataset. The  
Early stopping method is to compute epoch data for each epoch end (a validation set is a round traversal of all the training data), and when accuracy is no longer raised, Just stop training. This approach is very intuitive, because the accurary is no longer improved, it is useless to continue training, will only improve the training time. One of the key points of this approach is how to think that validation accurary no longer improves. Not to say validation accuracy a drop down then think no longer raise, because may pass after this epoch, accuracy lowered, but then epoch again let accuracy went up, so can not according to one or two consecutive lower judgment no longer improve. The general practice is that, in the course of training, record the best validation accuracy so far, when 10 consecutive epoch (or more) did not achieve the best accuracy, you can think that accuracy no longer improve. You can stop the iteration at this point (Early stopping). This strategy is also known as "No-improvement-in-n", N is the number of epoch, can be taken according to the actual situation, such as 10, 20, 30 ... DataSet Expansion

In the field of data mining, the phrase "sometimes more data than a good model" is popular. Because we are using the training data training model to fit the data in the future, another assumption is that the training data is distributed independently from the future data. Even though the current training data is used to estimate and simulate future data, more data is often estimated and simulated more accurately. As a result, more data is sometimes more excellent. But often conditions are limited, such as the lack of human and financial resources, and can not collect more data, such as in the task of classification, the need to mark the data, and in many cases are manually marked, so once you need to mark the amount of data, it will lead to inefficiencies and possible errors. So, often at this time, need to take some computational methods and strategies in the existing data set on the hands and feet to get more data.
Generally speaking, the expansion of the data machine needs to obtain more data that meet the requirements, that is, the existing data is independent with the distribution, or approximately independent distribution. Generally there are the following methods: Data from the source to collect more data to replicate the original data and coupled with random noise resampling estimates the data distribution parameters based on the current dataset, using the distribution to generate more data, etc. regularization method

Regularization means that when the objective function or cost function is optimized, a regular term is added after the objective function or the cost function, in general there are L1 regular and L2. L1 Regular
The L1 is based on the L1 norm, that is, the L1 norm and term of the parameter after the objective function, namely the absolute value of the parameter and the product of the parameter, namely:
c=c0+λn∑w|w|
Where C0 represents the original cost function, n is the number of samples, λ is the regular factor, weighing the proportion between regular term and C0 term. The following is the L1 regular item.
When the gradient is computed, the gradient of W changes to:
∂c∂w=

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.