L1, L2 paradigm and sparse constraints
Suppose that the objective function to be solved is:
E (x) = f (x) + R (x)
where f (x) is a loss function, used to evaluate the model training loss, it must be arbitrary convex function, R (x) is a normalized constraint factor, used to limit the model, according to the probability distribution of the model parameters, R (x) is generally: L1 paradigm constraints (model obeys Gaussian distribution), L2 paradigm constraints ( The model obeys the Laplace distribution); Other constraints are generally combined.
L1 paradigm constraints are generally:
L2 paradigm constraints are generally:
The L1 paradigm can produce relatively sparse solutions with certain feature selection capabilities, which is useful when solving high dimensional feature spaces, and the L2 paradigm is mainly designed to prevent overfitting.
Sparse constraints
In the article non-negative Matrix factorization with sparseness constraints, the L1 paradigm and the L2 paradigm are combined to form new constraints, using sparsity to represent the relationship between the L1 paradigm and the L2 paradigm ( Forwarding Time Note: The following formula, the root of the square should be squared sum):
When there is only one nonzero value in vector x, the sparsity is 1, and when all elements are 0 and equal, the sparsity is 0. n represents the dimension of vector x. Vectors of different sparsity are represented as follows:
L1, L2 paradigm and sparse constraints