Deep Learning (depth learning) Learning notes finishing (ii)

Source: Internet
Author: User

Deep Learning (depth learning) Learning notes finishing (ii)

Transferred from: http://blog.csdn.net/zouxy09


Because we want to learn the characteristics of the expression, then about the characteristics, or about this level of characteristics, we need to understand more in-depth point. So before we say deep learning, we need to re-talk about the characteristics (hehe, actually see so good interpretation of the characteristics, not put here a little pity, so it was stuffed here).

Iv. About features

The characteristic is the raw material of machine learning system, the influence to the final model is undoubted. If the data is well expressed as a feature, the linear model is usually able to achieve satisfactory accuracy. What do we need to consider about the characteristics?

4.1, the granularity of the characteristic representation

Learning algorithms in what granularity of the characteristics of the expression, only can play a role. In the case of a picture, pixel-level features have no value at all. For example the following motorcycles, from the pixel level, do not get any information at all, and they cannot be differentiated by motorcycles and non-motorcycles. And if the feature is a structural (or meaning) time, such as whether it has a handlebar (handle), whether it has a wheel (wheel), it is easy to distinguish between motorcycles and non-motorcycles, learning algorithm to play a role.

4.2. Primary (shallow) feature representation

Since the pixel-level feature representation method has no effect, then what kind of expression is used.

Around 1995, Bruno Olshausen and David Field Two scholars served as Cornell University, who tried to use both physiology and computer techniques to study visual problems.

They collected a lot of black and white scenery photos, from these photos, extract 400 small fragments, each photo fragment size is 16x16 pixels, you may want to mark these 400 fragments as s[i], i = 0,.. 399. Next, from these black and white landscape photos, randomly extract another fragment, the size is 16x16 pixels, you might want to mark this fragment as T.

The question they raised was how to pick a set of fragments from these 400 fragments, s[k], and, by superposition, synthesize a new fragment, and this new fragment should be as similar as possible to the randomly chosen target fragment, while the number of s[k] is as small as possible. To describe in a mathematical language is:

Sum_k (a[k] * s[k])--T, where A[k] is the weighting factor when stacking fragments s[k].

To address this problem, Bruno Olshausen and David Field invented an algorithm for sparse coding (Sparse Coding).

Sparse encoding is the process of repeating iterations, with each iteration divided into two steps:

1) Select a group of s[k] and adjust the a[k] so that sum_k (a[k] * s[k]) is closest to T.

2) Fix a[k], in 400 fragments, select other more appropriate fragments S ' [K], replace the original s[k], so that Sum_k (a[k] * S ' [K]) closest to T.

After several iterations, the best s[k] combination was selected. Surprisingly, the selected S[k] are basically the edge lines of different objects on the photo, these segments are similar in shape and differ in direction.

The algorithm results of Bruno Olshausen and David Field coincide with the physiological discoveries of David Hubel and Torsten Wiesel.

In other words, complex graphs are often made up of some basic structures. For example, a graph can be represented linearly by using 64 orthogonal edges (which can be understood as an orthogonal basic structure). For example, the X can be used in 1-64 edges three in accordance with the weight of 0.8,0.3,0.5. The other basic edge has no contribution, so they are all 0.

In addition, Daniel also found that not only the image exists in this law, sound also exists. 20 basic sound structures have been found in the voices they have never labeled, and the rest can be synthesized from these 20 basic structures.

 

4.3, structural characteristics of the expression

Small pieces of graphics can be composed of basic edge, more structured, more complex, conceptual graphics how to express it. This requires a higher level of feature representation, such as v2,v4. So V1 see pixel level is pixel level. V2 See V1 is the pixel level, this is the level of progressive, high-level expression by the combination of the underlying expression. The professional point is that it is the base basis. V1 basis is the edge, and then V2 layer is V1 layer of these basis combination, this time V2 area is also a layer of basis. That is, the result of the basis combination on the upper layer is the upper layer of the combination of basis ... (so there are Daniel said deep learning is "make a base", because the ugly, so the United States its name is deep learning or unsupervised Feature learning)

Intuitively speaking, is to find makes sense of the small patch and then combine it, get the upper layer of feature, recursively learning feature upward.

Doing training on different objects is, the resulting edge basis is very similar, but the object parts and models will completely different (then we can distinguish car or face is not much easier):

From the text, a doc indicates what it means. We describe one thing and use what to say is more appropriate. With a word, I do not think, the word is the pixel level, at least it should be term, in other words, each doc is made up of terms, but this means that the ability to express the concept is enough, probably not enough, need to go on a step, reached topic level, with topic, and then to the doc is reasonable. However, the number of each level is very large, such as the concept of Doc expressed->topic (thousand-million magnitude)->term (100,000 magnitude)->word (million levels).

When a person is looking at a doc, the eyes see word, by these word in the brain automatically cut word formation term, in accordance with the concept of organization, prior learning, get topic, and then carry on high-level learning.

4.4. How many features are needed?

We know the need to build a hierarchy of features, but how many characteristics of each layer?

Any method, the more features, the more reference information given, the accuracy will be improved. But the characteristics of many means that the computational complexity, exploration of the space, can be used to train the data in each feature will be sparse, will bring a variety of problems, not necessarily the more the better.

Well, to this step, finally can talk to deep learning. Above we talk about why there are deep learning (let the machine automatically learn good features, and eliminate the manual selection process. As well as a hierarchical visual processing system for reference people, we get a conclusion that deep learning require multiple layers to obtain more abstract feature representations. So how many layers are appropriate? What architecture to use to model it. How to conduct non-supervised training.

Continuation of

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.