Deep Learning (depth learning) Learning notes finishing Series (ii)--Features

Last Update:2016-07-31 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

[Email protected]

Http://blog.csdn.net/zouxy09

Zouxy

Version 1.0 2013-04-08

1) The Deep Learning Learning Series is a collection of information from the online very big Daniel and the machine learning experts selfless dedication. Please refer to the references for specific information. Specific version statements are also referenced in the original literature.

2) This article is for academic exchange only, non-commercial. So each part of the specific reference does not correspond in detail. If a division accidentally violated the interests of everyone, but also look haihan, and contact bloggers deleted.

3) I Caishuxueqian, finishing summary of the time is inevitable error, but also hope that the predecessors, thank you.

4) Reading this article requires machine learning, computer vision, neural network and so on (if not, it doesn't matter, no see, can read, hehe).

5) This is the first version, if there are errors, you need to continue to amend and delete. Also hope that we have a lot of advice. We all share a little, together for the promotion of the Motherland Scientific research (hehe, good noble goal ah). Please contact: [Email protected]

Third, about the characteristics

The characteristic is the raw material of machine learning system, the influence to the final model is undoubted. If the data is well expressed as a feature, the linear model is usually able to achieve satisfactory accuracy. What do we need to consider about the characteristics?

3.1, the granularity of the characteristic representation

What is the characteristic expression of the learning algorithm in a particle size that can play a role? In the case of a picture, pixel-level features have no value at all. For example the following motorcycles, from the pixel level, do not get any information at all, and they cannot be differentiated by motorcycles and non-motorcycles. And if the feature is a structural (or meaning) time, such as whether it has a handlebar (handle), whether it has a wheel (wheel), it is easy to distinguish between motorcycles and non-motorcycles, learning algorithm to play a role.

3.2. Primary (shallow) feature representation

Since the pixel-level feature representation does not work, what is the use of the representation?

Around 1995, Bruno Olshausen and David Field Two scholars served as Cornell University, who tried to use both physiology and computer techniques to study visual problems.

They collected a lot of black and white scenery photos, from these photos, extract 400 small fragments, each photo fragment size is 16x16 pixels, you may want to mark these 400 fragments as s[i], i = 0,.. 399. Next, from these black and white landscape photos, randomly extract another fragment, the size is 16x16 pixels, you might want to mark this fragment as T.

The question they raised was how to pick a set of fragments from these 400 fragments, s[k], and, by superposition, synthesize a new fragment, and this new fragment should be as similar as possible to the randomly chosen target fragment, while the number of s[k] is as small as possible. To describe in a mathematical language is:

Sum_k (a[k] * s[k])--T, where A[k] is the weighting factor when stacking fragments s[k].

To address this problem, Bruno Olshausen and David Field invented an algorithm for sparse coding (Sparse Coding).

Sparse encoding is the process of repeating iterations, with each iteration divided into two steps:

1) Select a group of s[k] and adjust the a[k] so that sum_k (a[k] * s[k]) is closest to T.

2) Fix a[k], in 400 fragments, select other more appropriate fragments S ' [K], replace the original s[k], so that Sum_k (a[k] * S ' [K]) closest to T.

After several iterations, the best s[k] combination was selected. Surprisingly, the selected S[k] are basically the edge lines of different objects on the photo, these segments are similar in shape and differ in direction.

The algorithmic results of Bruno Olshausen and David Field coincide with the physiological discoveries of David Hubel and Torsten Wiesel!

In other words, complex graphs are often made up of some basic structures. For example: A graph can be expressed linearly by using 64 orthogonal edges (which can be understood as an orthogonal basic structure). For example, the X can be used in 1-64 edges three in accordance with the weight of 0.8,0.3,0.5. The other basic edge has no contribution, so they are all 0.

In addition, Daniel also found that not only the image exists in this law, sound also exists. 20 basic sound structures have been found in the voices they have never labeled, and the rest can be synthesized from these 20 basic structures.

3.3, structural characteristics of the expression

Small pieces of graphics can be composed of basic edge, more structured, more complex, conceptual graphics how to express it? This requires a higher level of feature representation, such as v2,v4. So V1 see pixel level is pixel level. V2 See V1 is the pixel level, this is the level of progressive, high-level expression by the combination of the underlying expression. The professional point is that it is the base basis. V1 basis is the edge, and then V2 layer is V1 layer of these basis combination, this time V2 area is also a layer of basis. That is, the result of the basis combination on the upper layer is the upper layer of the combination of basis ... (so there are Daniel said deep learning is "make a base", because the ugly, so the United States its name is deep learning or unsupervised Feature learning)

Intuitively speaking, is to find makes sense of the small patch and then combine it, get the upper layer of feature, recursively learning feature upward.

Doing training on different objects is, the resulting edge basis is very similar, but the object parts and models will completely different (then we can distinguish car or face is not much easier):

From the text, what does a doc mean? We describe one thing, what is the appropriate way to express it? With a word, I do not think, the word is the pixel level, at least it should be term, in other words, each doc is made up of terms, but this means that the ability to express the concept is enough, probably not enough, need to go on a step, reached topic level, with topic, and then to the doc is reasonable. However, the number of each level is very large, such as the concept of Doc expressed->topic (thousand-million magnitude)->term (100,000 magnitude)->word (million levels).

When a person is looking at a doc, the eyes see word, by these word in the brain automatically cut word formation term, in accordance with the concept of organization, prior learning, get topic, and then carry on high-level learning.

3.4. How many features are needed?

We know the need to build a hierarchy of features, but how many characteristics of each layer?

Any method, the more features, the more reference information given, the accuracy will be improved. But the characteristics of many means that the computational complexity, exploration of the space, can be used to train the data in each feature will be sparse, will bring a variety of problems, not necessarily the more the better.

Well, to this step, finally can talk to deep learning. Above we talk about why there are deep learning (let the machine automatically learn good features, and eliminate the manual selection process. As well as a hierarchical visual processing system for reference people, we get a conclusion that deep learning require multiple layers to obtain more abstract feature representations. So how many layers are appropriate? What architecture is used to model it? How to conduct unsupervised training?

Summary:

1. Different particle size characteristics have different meanings.

In the case of a picture, pixel-level features have no value at all. If the feature is a structure (or meaning), such as handlebar, wheel, etc., the learning algorithm can play a role.

2. Complex graphics often consist of some basic structures. A graph can be represented linearly by a number of orthogonal edges (which can be understood as orthogonal basic structures). (Images, sounds have this structure). The most basic constituent unit is: the Edge line.

3. The concept of base (basis): The basic edge that makes up a small piece of graphics.

The level represented by low to high features is v1,v2,v3,v4 ... V1 See pixel level is pixel level. V2 See V1 is the pixel level, this is the level of progressive, high-level expression by the combination of the underlying expression. The professional point is that it is the base basis. V1 basis is the edge, and then V2 layer is V1 layer of these basis combination, this time V2 layer is also a layer of basis. That is, the result of the basis combination on the upper layer is the upper layer of the combination of basis ...

4. The choice of understanding features in the DL sense

Intuitively speaking, is to find makes sense of the small patch and then combine it, get the upper layer of feature, recursively learning feature upward.

Doing training on different objects is, the resulting edge basis is very similar, but the object parts and models will completely different (then we can distinguish car or face is not much easier):

5. Number of features per layer

In general, any method, the more features, the more reference information given, the accuracy will be improved. But the characteristics of many means that the computational complexity, exploration of the space, can be used to train the data in each feature will be sparse, will bring a variety of problems, not necessarily the more the better. May cause performance degradation.

What does raw/white mean?

6. Summary of DL:

Objective: To allow the machine to automatically learn the good characteristics, and eliminate the manual selection process.

idea: According to the human visual mechanism and the basic structure characteristic of the image, deep learning needs stratification, and Multi-layered iterations to obtain more abstract feature representations.

Reference documents:

1. Original link: http://blog.csdn.net/zouxy09/article/details/8775360, author [email protected]

Deep Learning (depth learning) Learning notes finishing Series (ii)--Features

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Deep Learning (depth learning) Learning notes finishing Series (ii)--Features

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Deep Learning (depth learning) Learning notes finishing Series (ii)--Features

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support