I recently read about group sparsity and group structure. This article mainly summarizes some of the group sparsity obtained by one of the NMF applications. This theoretical article has not been cited much, but it has some influence on the use of group NMF in EEG. For more information, see reference. In general, group sparsity or simple sparsity is a good explanation for some physically meaningful things. We usually think that the basis of some things (basis, or say, feature) it is different from the base of some things, so you can divide the group by sample or feature. Now we have done almost the same thing about group, from group lasso to group PCA, to group NMF in this article. But I don't know if the percentage of visitors is low. Is it still ineffective ?! I can't help but talk about the performance of my current research environment. If I use something that is not doing well, no paper will be found ...... Don't talk nonsense, look at group NMF!
Comprehensions on group NMF
Rachel Zhang
1. Brief Introduction
1. whyuse group sparsity?
An observation that features or data items withina Group are expected to share the same sparsity pattern in their latent factorrepresentation.
The Group sparsity is similar to sparsity pattern in the Low-Rank representation of data items or features belonging to the same group.
2. differencefrom NMF.
As a variation of traditional NMF, group NMF considersgroup sparsity regularization Methods fornmf.
3. Application.
Dimension callback ction, Noise Removal inText Mining, bioinformations, blind source separation, computer vision. Group nmfenable natural interpretation of discovered latent factors.
4. What is group?
Different types of features, such as in CV: pixelvalues, Gradient Features, 3D pose features, etc.
2. relatedwork
5. relatedwork on group sparsity
5.1. Lasso (the least absolute shrinkage and selection operator)
L1-norm penalized Linear Regression
(1)
5.2. grouplasso
Group sparsity using L1, 2-norm Regularization
(2)
Where the SQRT (PL) Terms accounts for the varying group sizes
5.3. sparsegroup lasso
(3)
5.4. hierarchicalregularization with tree structure, 2010
R. jenatton, J. mairal, G. obozinski, and F. Bach. "Proximal Methods forsparse hierarchical dictionary learning". icml 2010
5.5. Thereare some other works focus on group sparsity on PCA
6. relatedworks on NMF
6.1 affinenmf: extending NMF with an offset vector. Affine NMF is used to simultaneouslyfactorize.
3. problem to solve
Consider a matrix X ε RM × n. Assume that the rows of xrepresent features and the columns of xrepresent data items.
7. In standard NMF, we are interested in discovering two low-rankfactor matrices W and h by minimizing an objective function:
Constrain W> = 0 and H> = 0
8. Group Structure and group sparsity
In this figure, (a) group points sample. For basis w, the sparsity of the coefficient h in A group is the same; (B) Group points feature, the Group sparsity is reflected in the structure of latent component matrices.
9. In group NMF, objective function:
That is, the L1, Q-norm of a matrix is the sum ofvector SCSI-norms of its rows.
So, | Y | 1. The penalty for Q is expected to get more 0 rows in Y, the better.
Here, B classes have different x (B) and H (B) for each class, so OBJ function wants to make H (B) have as many 0 rows as possible, exactly match our group sparsity.
4. Optimization Algorithms
With mixed-norm regularization, the minimization problembecomes more difficult than the standard NMF problem. we here propose twostrategies Based on the block coordinate descent (BCD) method in non-linearoptimization. in both algorithms, the L1, Q-normterm is handled via Fenchel duality.
10. Thefirst method-matrix-block BCD Method for (5)
In this method, a matrix variable is minimized ateach Step fixing all other entries.
(4.3) is solved by nonnegativity-constrainedleast squares (nnll)
Now considerthe problem in (4.4), it can be rewritten
The first item can be micro, the derivative is continuous, and the second item convex. You can use a convex optimization solution.
Algo 2 is a solution (variant of Nesterov's first order method) of (4.7). The main solution is (4.6) update by row, which can be seen as a solution:
But this non-negative Constraint can be eliminated by (4.9) (the solution is the global optimal solution of (4.8)
For proof, see reference [1]. And (4.9) can be solved by (4.11:
Where | · | Q * isthe dual norm of | · | Q.
When q = 2, |||||||||||| 2
Q = ∞, ||||||||||||| 1