Regularization and norm regularization in machine learning
Regularization and Norm regularization
This article introduces the definition of regularization, then introduces its rule application in machine learning, L0, L1, L2 regularization norm and kernel norm rule, and finally introduces the selection of regularization parameters.
Regularization (regularization) derives from the problem of ill-posed problems in linear algebra, and the general method for solving ill-posed problems is to approximate the solution of the original problem by solving the problem of "proximity" to the problem of a family and the original ill-posed question, which is called regularization method. How to establish an effective regularization method is an important part of the study of the problem in the field of inverse problem. The usual regularization method has Tikhonov regularization, various iterative methods and other improved methods based on variational principle, these methods are effective methods for solving ill-posed problems, and are widely used in the research of various inverse problems, and have been studied in depth.
In European space, two space equivalence is the ability to establish one by one mapping relationships, but in the real world, we get the data is often morbid (such as user evaluation data in a lot of empty values), then cannot establish such a mapping. And the idea that mathematics stresses the second, such as reducing the dimension or adding some constraint parameters, so that it can establish such a mapping. Detailed Mathematical Theory:
Http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/AV0405/UMAR/AVassign2.pdf and regularization (mathematics)
The regularization of machine learning comes from data training. If the number of training data children characteristics, easy to appear over fitting. Visually, the goal of preventing overfitting is to train the model not to rely too much on the characteristics of one dimension (or several dimensions). In the process of training (minimizing cost), when the weight of a certain dimension is too large, and the distance between the model and the real data is very small, the rule-based item can make the whole value larger, thus, In the course of training to avoid the choice of one dimension (or a few dimensions) the weight of the feature is too large , that is too dependent on a dimension (or a few dimensions) of the characteristics.
Regularization, in a sense, introduces the prior distribution of parameters, reduces the choice of parameters, and puts it in the generalized linear model, which is to make a trade-off of some basic functions (the parameters are not too large, then the corresponding basis functions contribute to the final result is very small, popular is not completely dependent on the data set, With some judgment and choice of ego)-------> Avoid overfitting.
As to the origin and principle of machine learning, we can refer to the deep learning course of wulide teacher, and emphasize that we can add regularization item after loss function to optimize problem minimization solution, which is very effective for training objective function of learning process, and can be referenced in detail:
(1) The meaning analysis of regularization and normalization
(2) regularization of the Stanford ml-regularization
(3) Manifold Regularization Learning notes
(4) regularization least squares
(5) regularization in linear algebra (regularization)
(6) the specific understanding of regularization
(7) The understanding of regularization in machine learning
(8) linear regression and regularization (regularization)
(9) regularization of machine learning exercises
(Ten) regularization of L1 and L2
(one) the shallow understanding of regularization and normalization
(a) Coursera Public Course notes: Stanford University machine Learning Seventh "regularization (regularization)"
Typical examples of regularization applications are the recent very hot sparse encodings, RELATED LINKS http://blog.csdn.net/abcjennifer/article/details/8693342
Http://blog.csdn.net/jwh_bupt/article/details/12070273#comments
The principle is very simple, the characteristic representation problem is transformed into the target optimization problem through the modeling, in view of the non-convex nature of the original optimization function, into the local optimal, difficult to solve problems, so put forward L1 regularization, cleverly obtained sparse term, become the basis function of feature expression. Related principles can be viewed: 09 Nips06-sparsecoding referenced in the article. PDF articles; Blogs can see the following links:
Sparsityand Some Basics of L1 regularization
http://freemind.pluskid.org/machine-learning/sparsity-and-some-basics-of-l1-regularization/
The following is a detailed explanation of the regular L0, L1, and L2 norms commonly used in machine learning. (from http://blog.csdn.net/zouxy09 bloggers, I did the relevant integration and understanding, if there is infringement bloggers, very sorry )
Specific principles can be viewed in linked documents:
L0, L1 and L2 norm of norm rule in machine learning click Open link L1 norm (sparse Encoding optimization principle) Click the Open link
From:
http://blog.csdn.net/xjbzju/article/details/6618064
http://blog.csdn.net/u010555688/article/details/25985013
Regularization and norm regularization in machine learning