Reprinted article: Norm Rule in machine learning (i) L0, L1 and L2 norm[Email protected]Http://blog.csdn.net/zouxy09Today we talk about the very frequent problems in machine learning: overfitting and regulation. Let's begin by simply understanding the L0, L1, L2, and kernel norm rules that are commonly used. Finally, w
About L1 Norm and L2 norm of content and diagram, feel has seen times, just read this Daniel blog http://blog.csdn.net/zouxy09/article/details/24971995/, At this moment finally understand that a lost, hurriedly "hot" record!See the difference between L1 norm and L2 norm from two aspects:1. Descent Speed:
The transmission gate of the original text: a brief talk on L0,l1,l2 norm and its application
Talking about L0,L1,L2 norm and its application
In mathematical branches such as linear algebra, function analysis, a norm (Norm) is a function that gives each vector in a vector space (or matrix) in length or size. For a 0 ve
L1 and L2 regularization items, also called penalty items, are designed to limit the parameters of the model and prevent the model from going over you and adding an entry after the loss function.
L1 is the sum of the absolute values of each parameter of the model
L2 is the square sum of each parameter of t
training. We often use it to determine the number of super-parameters (for example, based on the accuracy on validation data to determine the epoch size of early stopping, validation rate based on learning data, and so on). So why not just do this on testing data? Given these assumptions at testing data, as the training progresses, our network is actually overfitting our testing data at 1.1 points, leading to the final testing accuracy no matter what the reference. Therefore, the training data
example, according to the accuracy on validation data to determine the epoch size of early stopping, according to validation Data determines learning rate, etc.). So why not just do this on testing data? Because if we do this in testing data, then as the training progresses, our network is actually overfitting our testing data at 1.1 o ' all, resulting in the testing accuracy of the last one having no referential significance. Therefore, the role of training data is to calculate the gradient up
Git:https://github.com/linyi0604/machinelearningRegularization: Improve the generalization ability of the model on unknown data Avoid parameter overfittingRegularization commonly used methods: Increase the penalty for a parameter on the target function Reduce the impact of a certain parameter on the resultL1 regularization: Lasso The L1 norm Vector penalty is added after the objective function of the linear regression. X
Leads
The regularization item can take a different form. For example, in the regression problem, the loss function is the square loss, and the regularization term can be the L2 norm of the parameter vector:
Here, the L2 norm of the parameter vector w is represented.
A regularization term can also be a L1 norm of a parameter vector:
This represents the
? It is used to avoid overfitting, and in the course of training, we usually use it to determine some hyper-parameters (for example, according to the accuracy on validation data to determine the epoch size of early stopping, according to validation Data determines learning rate, etc.). So why not just do this on testing data? Because if we do this in testing data, then as the training progresses, our network is actually overfitting our testing data at 1.1 o ' all, resulting in the testing accura
L1, L2 paradigm and sparse constraintsSuppose that the objective function to be solved is:E (x) = f (x) + R (x)where f (x) is a loss function, used to evaluate the model training loss, it must be arbitrary convex function, R (x) is a normalized constraint factor, used to limit the model, according to the probability distribution of the model parameters, R (x) is generally:
this validation data is. It is actually used to avoid fitting, and in the course of training, we usually use it to determine some of the parameters (for example, according to the accuracy on validation data to determine the epoch size of the early stopping, according to validation Data determines learning rate, etc.). So why not do this directly on testing data. Because if you do this in testing data, then as the training progresses, our network is actually overfitting our testing data at 1.1 p
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.