Noise Reduction Automatic encoder (denoising autoencoder)

Source: Internet
Author: User
Tags theano

Noise Reduction Automatic encoder (denoising Autoencoder) Origin: PCA, feature extraction ....

With some strange high-dimensional data appearing, comparison, voice, and traditional statistics-machine learning methods have encountered unprecedented challenges.

Data dimensions are too high, data is monotonous, noise is widely distributed, and the traditional method of "numerical games" is difficult to work. Data mining? No useful things have been dug out.

In order to solve the problem of high dimension, the PCA dimensionality reduction method of linear learning, PCA's mathematical theory is absolutely impeccable, but only the linear data effect is better.

Therefore, the search for simple, automatic and Intelligent feature extraction method is still the focus of machine learning. LeCun, for example, outlined the basic architecture of future machine learning models in a 1998 CNN summary paper.

Of course, CNN is the alternative, using convolution, drop sampling two methods from the characteristics of the signal data to extract a good feature. For general non-signal data, what to do??

Part I Automatic encoder (Autoencoder)

The automatic encoder is based on the fact that the original input (set to x) is weighted (W, B), mapped (Sigmoid), and then Y is returned to the inverse weighted map back to Z.

By iteratively Training The two groups (W, b), the error function is minimized, that is, as much as possible to ensure that Z is approximate to x, that is, the perfect reconstruction of X.

Then it can be said that the right to the first group of Rights (W, B) is successful, very good learning of the key features in input, otherwise it will not be reconstructed so perfect. The structure diagram is as follows:

From the perspective of the brain of the organism, it can be understood that learning and refactoring are like coding and decoding.

This process is interesting, first of all, it does not use data labels to calculate error update parameters, so it is unsupervised learning.

Secondly, the characteristics of samples are extracted simply and rudely by means of a double-hidden layer similar to neural networks.

This double-hidden layer is controversial, the original encoder did use two groups (W,B), but Vincent in a 2010 paper, found that as long as a single group of W can be.

That is W ' =WT, W and W ' are called tied Weights. The experiment proves that W ' is really just in soy sauce, there is absolutely no need to do the training.

Reverse reconstruction matrix reminds people of inverse matrix, if W-1=WT, W is an orthogonal matrix, that is, W can be trained to approximate orthogonal array.

Since W ' is a soy sauce, it's gone after training. Forward propagation with W, equivalent to the input pre-coding, and then imported to the next layer. So called Automatic encoder, not called automatic codec.

Part II noise-cancelling automatic encoder (denoising autoencoder)

Vincent in his 2008 paper, Autoencoder's improved version of--da was proposed. Recommended first to see this article paper.

The title of the paper is called "extracting and composing robust Features", translated into Chinese is "extract, encode a robust feature"

How can we make the features very robust? is to erase the original input matrix with a certain probability distribution (usually using a two-item distribution), that is, each value is randomly set to 0, so that some of the data's features are missing.

The Lost Data x ' calculates y, calculates Z, and iterates the z with the original x, so that the network learns the data of the broken (called corruputed).

This broken data is useful for two reasons:

one of them, through the comparison with non-broken data training, broken data training out of the weight noise is relatively small. noise reduction is therefore named.

The reason is not difficult to understand, because the erasure when accidentally the input noise to x dropped.

second, the breakage data reduces the generation gap of training data and test data to some extent. since part of the data was dropped by x, thus this broken data

Somewhat closer to the test data. (training, testing must have the same, of course, we ask for a different).

The robustness of this trained weight is improved. This is illustrated below:

The point is, is it really scientific to erase raw input in such a haphazard way?  Is that really okay? Vincent also explained from the cognitive perspective of the brain:

Paper said: Human beings have the ability to recognize the damaged image, which is caused by our higher associative memory sensation function.

We can remember in many ways (like, sound, or even root memory), so even if the data is lost, we can recall it.

In addition, it is from the perspective of feature extraction manifold learning (manifold learning) :

The broken data is equivalent to a simplified PCA, which makes the feature a simple dimensionality reduction pre-extraction.

Strange usage of part III automatic encoders

The automatic encoder is equivalent to creating a hidden layer, a simple idea is to add to the beginning of the depth network, as the primary filter of the original signal, the effect of dimensionality reduction, extraction features.

For the basic use of automatic encoders instead of PCA, refer to Http://www.360doc.com/content/15/0324/08/20625606_457576675.shtml

Of course Bengio in the 2007 thesis modeled DBN compared to RBM practices: As a parameter initialization value for each layer in a deep network, instead of a random small value.

That becomes the stacked autoencoder.

Of course, there is a problem with this approach, Autoencoder can be seen as a non-linear patch-enhanced version of PCA, the effect of PCA is based on the dimensionality reduction.

Think about CNN this structure, along with the layer of the advance, the number of neurons in each layer is increasing, if the use of Autoencoder to pre-training, it is not increased dimension? Is that really okay?

The experimental results given in paper suggest that the autoencoder effect is not bad, due to the fact that the ability of nonlinear networks is strong, although the number of neurons increases, but the effect of each neuron is attenuated.

At the same time, the stochastic gradient algorithm gives a good start to the follow-up supervised learning. On the whole, the increase of dimension is more beneficial than the disadvantage.

Part IV code and implementation

Specific reference http://deeplearning.net/tutorial/dA.html

There are a few points to note:

The ①cost function can be designed using the interleaving entropy (cross Entroy), which is used to design the cost function for data that defines the field in [0,1].

The derivative results of the likelihood function of logistic regression can be regarded as the special case of staggered entropy. Reference http://en.wikipedia.org/wiki/Cross_entropy

You can also use the least squares design.

There are multiple ②randomstreams functions because the shared version must be used to multiply the non-tensor amount.

So it's from theano.tensor.shared_randomstreams import randomstreams

Instead of from theano.tensor import randomstreams

Noise Reduction Automatic encoder (denoising autoencoder)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.