From sparse representation to low rank representation (iv)
Determine the direction of research has been in the mad theory, recently read some articles, have some ideas, by the way also summed up the representation series of articles, because I just contact, may be some shortcomings, I would like you to correct.
The series of articles from sparse representations to low-rank representations include the following:
First, sparse representation
Ii. NCSR (nonlocallycentralized Sparse representation)
SAN,GHP (gradienthistogram preservation)
Iv. Group sparsity
Wu, Rankdecomposition
Iv. Group sparsity
This section is a sequel to the previous article, introducing the improvement of sparse representation
Group sparsity
Group sparsity or simple sparsity for something that has a physical meaning, we usually feel that the base of something (Basis,or say, feature) and the base of something is different, so you can group by sample, or feature sub group. Now the group has done almost everything from group Lasso to group PCA and GROUPNMF (nonnegative Matrix factorization). Jingukim et al. "Group sparsity in nonnegative Matrix factorization" link:
Http://www.cc.gatech.edu/~hpark/papers/GSNMF_SDM12_Final.pdf
1 Why use Group sparsity?
An observationthat features or data items within a group is expected to share the samesparsity pattern in their latent FA ctor representation.
A hypothetical group sparsity is a data item that belongs to the same group or a feature that has a similar sparsity pattern in the Low-rank representation.
2. Difference FROMNMF.
As a variation oftraditional NMF, Group NMF Considersgroup sparsity regularization methods for NMF.
3. Application.
Dimensionreduction, Noise removal intext mining, bio-informations, blind source Separation,computer vision. Group NMF enable natural interpretation of discovered latent factors.
4. What is group?
Different types Offeatures, such as in cv:pixel values, gradient features, 3D pose features,etc. The same feature form a group
5. Related Work Ongroup sparsity
5.1. Lasso (theleast Absolute Shrinkage and Selection Operator)
l1-normpenalized linear regression
5.2. Group Lasso
Group sparsityusing L1,2-norm Regularization
where the sqrt (pl) termsaccounts for the varying group sizes
5.3. Sparse Grouplasso
5.4. Hierarchical Regularizationwith Tree structure,2010
R. Jenatton, J. Mairal, G. Obozinski,and F. Bach. "Proximal Methods for sparsehierarchical dictionary learning". ICML 2010
5.5. There aresome other works focusing on group sparsity on PCA
6 NMF
By incorporating mixed-normregularization in NMF, it's based on L1,q-norm regularization. Regularizationby L1-norm is well-known to promote a sparse representation [31]. When Thisapproach was extended to groups of parameters, L1,q-norm had been shown toinduce a sparse representation at the LE Vel of groups.
Affine nmf:extending NMF with a offset vector. AFFINENMF is used to simultaneously factorize.
Problem to Solve
1) consider a matrix X∈RMXN. Assume that the rows ofxrepresent features and the columns of xrepresent data items.
2) in the standard NMF, we is interested in discovering-low-rankfactormatrices W and H by minimizing an objective functio N:
(4)
Constrain:w>=0 and H>=0
3) Group structure and groupsparsity
In the this figure, (a) of the group sample, for Basis W, a group within the coefficient h sparsity the same; (b) The Feature,group sparsity of the central group is embodied in the structure of latent component matrices.
As group structure can found in many other datamining problems, we proceed to discuss how group sparsity can be promote D by Employingmixed-norm regularization as follows.
4) Formulation with mixed-norm regularization
Suppose the columns of X∈RMXN is divided into B groups as X = (x (1),,x (B)), Accordingly, the coefficient matrix is divided to B groupsas h = (h (1),,h (b)), where H (b) ∈rkxn, in group NMF, formula (4) can be written as:
In order to obtain the group sparsity, the coefficient entry H was added to Mixed-norm regularization term, using L1,q-norm to obtain:
Where W's f-norm is to prevent the optimization process from getting bigger, it belongs to the tradeoff factor, control the strength of each regularization term.
The l1,q-norm of Y∈RAXC is defined by:
Among them, the focus of discussion.
That is, the l1,q-norm of a matrix isthe sum of vectors lq-norms of its rows.
So, | | y| | 1,q's penalty expects to get as many as 0 rows in Y as well as possible. Here, b classes, each class of X (b) and H (b) are different, so the obj function wants to have as many as 0 rows in H (b), just in line with our groupsparsity.
5) Block coordinate descent (BCD) method
Because of the mixed-norm regularization, the optimization problem is difficult to standard NMF problem, using block coordinate descent (BCD) method in Non-linear optimization, divided into BCD Method with matrix blocks and BCD methodwith vector blocks.
(4.3) is solved by non-negativity-constrained least squares (NNLL), nowconsider the problem in (4.4), it can Rewritt En by
The first one is differentiable, the derivative is continuous, and the second item is convex. Then a convex optimization can be used to solve.
Algo 2 is a workaround for (4.7) (variant of Nesterov's firstorder method), where the main need to address is (4.6) Update by row, which can be considered as a workaround:
And this nonnegative constraint can be eliminated by (4.9) (The solution is (4.8) the Global optimal solution)
The proof is shown in reference [1]. and (4.9) it can be solved by (4.11):
where| | | | q* isthe dual norm of | | | | Q.
q=2, | | | | q*=| | | | 2
q=∞, | | | | q*=| | | | 1
5) BCD Method Withvector blocks
That's, a vector variable is minimized at each step fixing allother entries.
Recent observations Indicatethat the Vector-block BCD method is also very efficient, often outperformingthe matrix-block B CD method. Accordingly, we develop the Vector-block BCD method for (5) as follows.
In the Vector-blockbcd method, optimal solutions to sub-problems with respect to each column of W and each rows of H (1), • ·· , H (b) aresought.
The solution of (4.14) is given as a closed form:
Sub-problem (4.15) is easily seen to being equivalent to
Which is a special case of (4.8). Remarkon The Optimizing methods:
Optimization Variables in (5): {W, H(1),...,H(B)}
Matrix–block BCD method:divides The variables into (b+1) blocks: W, H(1),..., H(B)
VECTOR–BLOCKBCD Method:divides the variables into K (b+1) blocks, represented by the columnsof W, H(1), ...,H(B)
Both methods eventually rely on dual problem (4.11).
6) Result
6. Reference
1.2011,multi-label Learning via structured decompostion andgroup sparsity, download link: http://arxiv.org/pdf/1103.0102.pdf
2. NMF Code http://www.csie.ntu.edu.tw/~cjlin/nmf/
3. "Algorithmsfor NMF" http://hebb.mit.edu/people/seung/papers/nmfconverge.pdf
4. Nmftoolbox http://cogsys.imm.dtu.dk/toolbox/nmf/
5. "Group nonnegative Matrix factorization for EEG classi?cation"
Not finished, to be continued, more please pay attention to Http://blog.csdn.net/tiandijun, Welcome to Exchange!
From sparse representation to low rank representation (iv)