Unsupervised feature learning--unsupervised feature learning and deep learningCategory: Compression computer Vision machine learning random thoughts 2012-07-31 15:48 36848 people read reviews ($) Collection report
Directory (?) [+]
Unsupervised learning has been very hot in recent years, has been applied to computer vision, audio classification and NLP problems, through the machine unsupervised learning feature obtained results, Most of its accuracy are obviously superior to other methods for training. This article will focus on Andrew's unsupervised learning, combined with his video: Unsupervised feature learning by Andrew Ng to make an introductory introductory explanation.
Keywords: unsupervised learning,feature extraction,feature learning,sparse coding,sparse dbn,sparse matrix,computer Vision, Audio CLASSIFICATION,NLP
Unsupervised feature learning and deep Learning is a Stanford University machine learning Daniel Andrew Y Ng. In recent years he worked on a job in the main area of the year building high-level features Using large scale unsupervised learning the problem of establishing a high-dimensional learning unlabeled from only feature data is resolved through unsupervised detectors.
========================= the first part: The traditional method pattern recognition=========================
Usually, we do the pattern recognition is this:
For different categories of feature extraction is a necessary part of the computer to detection perception is this:
The following are the three types of questions, <object Detection><audio classification><nlp> a classic feature review:
Human visual systems, auditory systems should be said to be very complex, if you want to get what we see in the visual system (computer perception), there are two ways:
One way is to describe the features that our visual system extracts when observing object (such as the content of parts in 2D, 3D between different objects, what features let us see the difference between objects, the connection between object parts, etc.).
Another method is more general, whether we can dig out a general algorithm, it can reveal the formation of most perception (in other words, is to reveal a human eye from the recognition of the algorithm).
I don't know if I can explain this.
If not, refer to the following two paragraphs:
We can try to directly implement what the adult visual (or audio) system is doing. (e.g., implement features that capture different types of invariance, 2d and 3d context, relations between object parts, ... ).
Or, if there is a more general computational principal/algorithm this underlies most of the perception, can we instead try to Discover and implement that?
For the following audio, and the image is the same reason, we can use an algorithm to learn its feature, an image or a piece of audio description?
For the image, the most intuitive description method and is to use pixels, the traditional method for supervised learning, given a group of positive samples and a set of negative samples, by extracting feature training to learn, and to identify the test:
Unlike supervised learning, unsupervised learning learns feature in an image by training a number of labels and no label datasets (learning what kind of feature is motocycle, What kind of feature is car) ...
So, how to learn what are the feature? The following first introduced unsupervised learning in a method--sparse Coding, the reader can try and the previous I spoke of the compression sensing series combined to think about it.
================= Part II: Sparse coding--a unsupervised Learning algorithm=================
Sparse Coding is one of unsupervised learning algorithm and can be used for feature learning.
Here is my explanation for sparse coding, the notes I made ...
Use the example of sparse coding to illustrate.
For example, in the image of the feature extraction to do the edge detector generation, then the job here is images from the natural randomly pick some small patches, through these patches generated to describe their "base", That is, the right side of the 8*8=64 basis composition of the basis (specific selection of the basis of the method can refer to my two articles-compression perception and compression perception of the HelloWorld), and then given a test patch, We can follow the above formula through the linear combination of basis, and the sparse matrix is a, in a there are 64 dimensions, of which 3 are not 0, it is called "sparse".
There may be questions here, why the bottom as edge detector? What about the upper deck? Here is a simple explanation we will understand that the reason is edge detector because the edge in different directions can describe the entire image, so the edge of different directions is naturally the image of the basis ...
And the result of the basis combination on the upper layer is the upper layer of the combination basis ... (Please look down)
As shown in the following:
Other examples are the same: take a look at the following text (second article)
Shown are 20 base functions (such as a wavelet transform) learned from the audio on the label:
=================== Part III: Learning Features hierachy & Sparse dbn===================
The established automatic feature learning process is a sparse coding process that learns features from the bottom up:
Take sparse dbn:training on faces as an example, here from the bottom up, yes hierarchy input Image,model V1 (Edge Detector), model V2 (Object Parts), model V3 (Object Models), see my notes below:
Here is the explanation of the right, please look at the comparison:
The bottom 24 basis function shown in the figure is used for edge Detection, such as the top-left corner of the base for detecting 85° edge;
The middle 32 bases (Object Parts) are eye detector, nose detector ... It is based on the fact that a face can have these parts combinations;
The top 24 base of the upper layer is the face model.
==========================
Doing training on different objects is, the resulting edge basis is very similar, but the object parts and models will completely different:
When the training data is composed of 4 types of images, the feature extracted from the upper layer will be different, and the resulting object model will also contain 4 types of image-specific models:
Is the comparison of the accuracy rate of different algorithms in motion recognition:
Sparse DBN on audio Similarly, for a spectrogram, the feature process is extracted per layer as shown:
=================== Part IV: Technical issues--scaling up===================
One of the major problems with pattern recognition is feature extraction, and in the above image we can see the accuracy of the cross-validation of different algorithms in the case of different features numbers (validation in ML sixth), More visible feature, the more reference information given, the accuracy generally better. So, what are the methods for feature mining to make scaling up? Interested can study and communicate with each other ha!
=================== Part V: Learning Recursive Representations===================
In this part we mainly take NLP as an example to see how recursive semantic analysis, natural language composition:
First we look at the form of a word using a multidimensional vector (simplified to 2-D in the diagram):
Bottom line: The cat sat on the mat. The bottom-up of the feature learning, you can find that some neuron on the meaning of the arrow refers to the neuron is not made sense.
Training process:parsing a sentence
In this way recursively chooses make sense neuron to become the new neuron of this layer:
We set up the final sentence pattern after selecting meaningful neurons at each level:
Well, after talking about the parsing sentence problem of NLP, we look back at the image processing (IP), in fact, they have the same reason, are to find make sense of the small patch and then combine it, We get the upper layer of the feature, recursively upward learning feature. In this diagram, the above is NLP, the following is IP.
=================== Summary ===================
Finally, we make a summary of unsupervised feature learning:
features is learned by machines, not by human designation
• Find the hidden feature base under perception
sparse coding and deep learning have very good recognition rates on CV and audio recogization, almost the degree of state of art.
Reference:
Http://ufldl.stanford.edu/wiki/index.php/UFLDL_Tutorial
Deep learning
Sparse DBN (Deep belief Nets)
A tutorial on Deep learning
More learning materials about machine learning will continue to be updated, so stay tuned for this blog and Sina Weibo sophia_qing.
"Reprint" Unsupervised features learning--unsupervised feature learning and deep learning