The past and present of Sparse Coding (I)

Last Update:2018-12-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Sparse Coding comes from the neural science. In the computer science and machine learning fields, we generally start from the Sparse Coding Algorithm, where we are looking for base vectors (ultra-complete basis ), however, I think its source is also quite interesting. If you know the foundation, it is also quite confident to expand its application. Engineers and students in the fields of philosophy, neuroscience, computer science, and machine learning science want to understand how the human cerebral cortex processes external signals, what is the brain's "impression" on the outside. To solve this problem, philosophers and neuroscientists, who used equipment observation, computers, and machine learning, pushed down and simulated data Theory and Experiment Simulation. In the field of neural coding and neural computing, I can find that the earliest literature on sparse encoding was in 1996. Before that, the experiment observations and assumptions of life scientists were not mentioned, in 1996, Bruno from the College of psychology at Cornell University published a title in nature: "Emergence
Of simple-cell semantic tive fieldproperties by learning a sparse code for Nature images, the general idea is that the accept fields of simple cells in the preliminary visual of mammals have the spatial locality, directionality, and band-pass (they are selective for different structures at different scales ), it is similar to the basic functions of wavelet transform. At that time, the description of these properties mainly understood the characteristics of these visual cells from the statistical structure of natural image encoding, but most of them were not successful, then, Bruno proposed in the article that he successfully described the nature of the above cells through the maximized sparse encoding hypothesis, and then the sparse encoding went viral. Let's take a look at the core idea of this Article. Based on a basic assumption, the image is formed by Linear Combinations of some bases, as shown in (Formula 1:

(Formula 1)

Fai (X, Y) is a base function, and Alpha is a coefficient, which changes with different image changes. Effective coding aims to find a complete base function to generate image space, and requires that the coefficient be as independent as possible, so independence is to find the essential structure of the signal. At that time, we naturally thought of PCA, because PCA could find spatial axes in some statistical structures (similar to coordinate axes) to form a base function, but PCA was too sensitive to noise, that is, it can only be effective for some data similar to Gaussian distribution. Such clean data can be used to find spatial axes and ineffective for data with more complex distributions (such as the current manifold distribution, inspired by information theory, the joint entropy of correlation variables is smaller than the sum of individual entropy (When Alpha is independent of each other, the two are equal, and the difference between them is mutual information ), if the joint entropy of the image remains unchanged, one possible way to reduce the correlation between variables is to reduce the entropy of the individual. Therefore, based on Barlow's
Term, the author is looking for a Minimum Entropy code (Note: The Barlow's term book cannot find the source for a long time, so the idea is to calculate the independence to reduce coding redundancy). Based on the above, the author assumes that a natural image is a sparse structure, that is, any given image can be represented by a few Descriptors (bases) in big data. The authors began to look for the low-entropy method to make the probability distribution of each coefficient a single mode and the peak distribution at 0. The proposed sparse encoding solution can be minimized (formula 2:

(Formula 2)

The first item is the cost of keeping information (cost), as shown in Formula 3:

(Formula 3)

Of course, if the image reconstructed based on the base function is the same as the original image I (x, y), the cost is 0 (minimum ).

(Formula 2) The second item is the sparse value function. Lambda is a constant positive coefficient. It balances the importance of The Sparse Coefficient Alpha, which is similar to the constant C of SVM. The authors propose three sparse value functions, as shown in Figure 1:

(Figure 1)

We can be pleasantly surprised to find that the L1 regular expression is frequently used in the column, and the other two should be eliminated. In fact, lasso also began to appear at that time, only people didn't realize that L1 regular expressions can increase sparsity. As to why L1 regular expressions can increase sparsity, it is recommended to read this post by Dr. pluskid of MIT: http://freemind.pluskid.org/machine-learning/sparsity-and-some-basics-of-l1-regularization/

Pluskid has strong mathematical skills, and readers also need some skills. To continue with today's topic, we need to minimize formula 2. The parameter variable is only alpha, evaluate the derivation of it, and then use the Gradient Descent Method to iteratively update Alpha. After alpha is updated, we also need to continue to update the base function. The two steps are shown in the following figure (Figure 2:

(Figure 2)

The alpha and coefficients of the obtained parameters are shown in figure 3:

(Figure 3)

Where a is a base function, B is its coefficient, and C is to verify the characteristics of the accept field. D indicates that the coefficient is a single peak at 0. Through image display, the attributes of the competent wild signal of the primary visual cells are verified, the image information is maintained, and the encoding is sparse. So far, sparse encoding starts, derived from subsequent optimization versions and applications.

References:

[1] emergence of simple-cell semantic tive field properties by learning a sparse code for Nature images. 1996

[2] Sparse Coding with an overcomplete basisset: a strategy employed by V1? 1997

Reprinted please indicate the link: http://blog.csdn.net/cuoqu/article/details/8980853

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

The past and present of Sparse Coding (I)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

The past and present of Sparse Coding (I)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support